Assignment: (Weka Practice) 10% Association Data Mining Your task for this project is to identify and perform an association rule mining task. This involves l. Selecting an appropriate data set 2. Preparing and preprocessing the data 3. Finding rules, including appropriate parameter setting 4. Determining which of the resulting rules are interesting 5. Figuring out how the interesting rules could be useful While you are on your own to select an appropriate data set, I will point you to one easy source: The [login to view URL] or UCI Machine Learning Repository. the latter contains many data sets, not all of which are appropriate for association rules, so you need to do some thinking. You are also welcome to identify data from other sources, especially those that you find personally of interest. Project Report The project report should contain the following: l. Data set description: What is in the data (provide data dictionary) and what preprocessing was done to make it amenable for association rule mining. Where choices were made (e.g., parameter settings for discretization, or decisions to ignore an attribute), describe your reasoning behind the choices 2. Rule mining process: Parameter settings, choice of algorithm (if you choose to implement something other than the WEKA-provided apriori, you can earn extra credit, but I don't expect it), and the time required 3. Resulting rules: Summary (number of rules, general description), and a selection of those you would show to a client. Besides that turn in (likely as a separate plain-text file) a complete listing of the rules found, and instructions (preferably machine-readable/executable)

for recreating your results. WEKA provides several ways to do this, from command-line scripts to Explorer your call. If you iterate over different attribute sets parameter settings etc., only turn in the rule list and scripts for your final iteration. You should include a description of the iterations, and why you needed to make changes from your initial choices in the project description. Scoring Scoring will be based on: Your reasoning behind choice of data set Preparation and preprocessing Rule generation Choice of interesting rules Evaluation use of rules Overall quality of report , including readability/clarity

