The following aspects will be worked upon in the phase II - Phase I already complete. (Simple Visualization of dataset using python)
Data Description and data handling - Each column data will be clearly articulated as to what is the datatype, missing/na values if any, distribution nature of values(if continuous), proportion of categories(if categorical variable), duplicates(if nominal variable), outliers(if continuous) and so on. There will be a tabular summary as well as data column 360 degree view for each variable column. Groupby and aggregation of data needs to be done for categorical data display in subsets.
Data Visualisation - Addition to histogram and boxplot, we will now display bivariate and multivariate charts which is between 2 or more continuous or categorical variables. A total of 5 visualizations shall be created in addition.
Data modelling - One sample, two sample and n-way ANOVA test of means shall be conducted for 1 or more categories of data within the dataset. Tukey-HSD test shall be conducted for post-hoc test in ANOVA. Linear and Logistic Regression shall be conducted for continuous and categorical variables accordingly. Measures of evaluation (R-squared, AUC, GINI, KS, Misclassification rate) shall also be displayed based on results. k-fold cross validation also to be conducted in terms of generalization.
13 freelancers are bidding on average ₹27698 for this job
Hi, I am experienced with python , machine learning, statistical modeling, would like to learn more about the task. can i get more details about the dataset in chat? thanks, g.
I am an excellent web and software developer with over 7+ years experience. I want to use this skills to provide you good work. Please provide more information on your project and get it done. Thanks.