This will be a fun (and fairly simple) project if, like me, you enjoy ripping data sets apart :)
I have a very simple but very large spreadsheet for which I need to calculate some pretty basic data. I just don't have the time to do it. I reckon it's a few hours work. I am attaching a very limited version of the spreadsheet. The full sheet will be identical in layout but with about 500,000 rows in each worksheet.
As long as you are expert at use of Pivots, data cleaning and basic linear and statistical analysis you should find this easy. There are four worksheets with several hundred thousands of rows (but very simple data)
The data is all about the responses to 4 test types. Each worksheet contains the test id (most tests comprise 30 questions), question id, question type and whether the response was right or wrong.
The number of times a question is answered varies significantly so I also need to know the confidence level and intervals that we could apply to the result for each question, and also to the whole analysis. It would also be useful to know the sample size we should need for each question to obtain 95%/5% and 99%/1% confidence levels/interval.
I want to pull out the following stats:
The difficulty for each question type (based on % of correct answers) ranked and then indexed by decile and then a histogram produced of the results for each of the test types.
I then want to apply the findings to the individual test results to see what the distribution of difficulty of questions is within each test, and then rank the tests according to their "difficulty" and provide an appropriately detailed histogram
I'll know if you are the right person for the job by the quality of questions that you ask from your review of the speadsheet. I have deliberately withheld a couple of items of information that will be needed to complete this task :)