Introduction to moddling and optimisation
£20-250 GBP
Paid on delivery
I have a lot of data and I need to do:
Data set available from Moodle (data originate from UCI repository)
a) Summarise the data
What is the dimensionality of the data? What are the min, median, max, mean, standard
deviation and percentage missing data of each feature?
b) Impute missing values
Use replacement by mean and replacement by median to fill in missing values. Display the min, median, max, mean and standard deviation for the data with imputations. Justify which imputation method is more suitable.
c) Visualise/transform the data
Use the data as transformed from part 1 (mean centered, median imputed). a) Cluster the data
Apply your choice of clustering algorithm (out of k-means, FarthestFirst, HierarchalClusterer, EM) to create 10 clusters and explain the results. Justify your choice of clustering technique. Compare the cluster results to the Class1 attribute and calculate the accuracy. Include screen shots of the clustering options and the clustering results.
b) Apply PCA to reduce features
Implement principle component analysis to reduce the number of features. Justify a suitable choice for the number of principle components to use. Implement the same clustering technique as used in a) after PCA and calculate the clustering accuracy. Include screen shots of the PCA options, the PCA results and the clustering results.
c) Conclusions
Comment on the difference in performance (accuracy) between the clustering in parts a) and b) and explain why this occurred.
the data as transformed from part 1 (mean centered, median imputed) Train the classifiers using 2/3 of the data from step 1 and test the classifiers by applying them to the remaining 1/3 of the data from step 1. In this part you will be predicting the Class2 feature of the data (binary classification CTY or non-CYT) using the first 8 features (mcg-nuc).
a) Classification
Try using the following 5 classification algorithms: Naive Bayes, k-NN (k=5 and k=10), logistic regression and C4.5 Decision tree algorithms. What are the algorithms accuracies on the test data? Explain the results.
b) Ensembles
Create a stacker ensemble: Use the output for each of the previous classifiers as features into a new classifier of your choice (this may require changing your train/test split). Illustrate what is being done and give an example of how it works. How does the performance compare with each single classifier?
c) Conclusions
What are the potential issues/limitations with stacking?
Project ID: #7237038
About the project
Awarded to:
Hi I would like to work with [login to view URL] you think you need a good worker then you can hire me.......Thanks.
7 freelancers are bidding on average £136 for this job
Hello! We are extremely interested in this project and would love to work on this project, please reply back to discuss further. Looking forward to hear from you soon. Thanks! http://www.freelancer.com/u/xazoo.html
I have over 9 years of experience in Excel and Access VBA. I have developed several Tools, Automation, Dashboard, Database CRM and Online Tools using VBA I have done a similar project for many clients.
we are delivering same type of project on time , we will provide same type data as you want thanks you
I am a software engineer with MBA in Finance from Symbiosis University, Pune. I have worked on a lot of excel models and also know VBA coding, through my prior experience in Infosys as Senior System Engineer. I also h More