a. For each field, extract the substring that you want to preserve for analysis, e.g., I
can discard time part in the DATETIME or DATE field to cluster documents by
Dates; similarly, words that may not convey meaningful information and patterns
in the MESSAGE or TEXT field. Likewise, impute the missing values in the fields
using an assumption for function (e.g., average, median), such as LATITUDE.
b. Store the output in a CSV (or JSON) file and load data to weka)