The sinking of the Titanic is one of the most infamous shipwrecks in history. On April 15,
1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after
colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone
onboard, resulting in the death of 1502 out of 2224 passengers and crew. While there was
some element of luck involved in surviving, it seems some groups of people were more likely
to survive than others. We would want to build a predictive model to predict what people
are more likely to survive Titanic sinking? The data is grouped according to whether or not a
person survived (1=significant, 0=insignificant). Download the data from D2L, and the
following steps that will guide you how to build a data mining model:
A. For any dataset, we need to clean our data first before doing any data analysis. There
are 8 steps that we discussed in Topic 1 – Data Preparation needed to perform.
However, to simplify this step we will perform only data transformation and clean
our missing data:
i. There are 4 missing variables in our data set. Variable ‘Cabin’ missed 80%
values. Therefore, we won’t use it in our models. Replaced missing values of
variable ‘Embarked’ with the most common value, missing values of variable
‘Age’ and variable ‘Price’ with average values (0.5 point).
ii. Since variable ‘Sex’ and ‘Embarked’ are categorical variable, we will need to
transform them. Transform variable ‘Sex’ into dummy variable (value 0 and
1), and variable ‘Embarked’ into numeric variable (value 1, 2, and 3) (0.5
B. Next step, we will need to perform cross-validation by perform partitioning our data.
Use Analytic Solver’s standard data partition command to partition the data into a
training set (with 50% of the observations), validation set (with 30% of the
observations), and test set (with 20% of the observations) using the default seed of
12345. (1 point)
C. Perform discriminant analysis, logistic regression, k-nearest neighbor (with
normalized inputs), single classification tree (with normalized inputs and at least 4
observations per terminal node), and manual neural network (use normalized inputs
and a single hidden layer with 3 nodes) to create a classifier for this data. How
accurate is this procedure on the training, validation, and test data sets? (1 point).
9 freelancers are bidding on average $19 for this job
As I am working in shifts in my current job, I can give proper time to this project, Currently I am working as SAP Operator ,so I have work experience of 5 years.
I am looking for work as a freelancer. Doing graphic design, data entry, content writing, etc... You can trust me 100%….I will do the work you assign with great responsibility.
Hello! I am Ahmed, I saw your project and I can help you as a Data Entry to finish your project. I will be happy if you choose me to work with you. Thanks