This project has two parts:
Given a .cvs file (see uploaded filed [login to view URL]) for each unique VideoFileID the data would be vectored. Here is a psudeo code sample in R.
ld <- lapply(split(d[-1], d[["VideoFileId"]]), unlist)
ldNames <- Reduce(unique, lapply(ld, names))
[login to view URL](rbind, lapply(ld, function(x) x[ldNames]))
d2 <- [login to view URL](rbind, lapply(ld, function(x) x[ldNames]))
ldNames_sorted <- c(matrix(ldNames, ncol = (ncol(d) - 1), byrow = TRUE))
[login to view URL] = [login to view URL](VideoFileId = [login to view URL](d2), d2)
The second part of the project is to take this output file and write a python based supervised learning model. The model would allow for 80% of the data to be used for training and 20% for testing. The Model will be considered "complete" when the results return an Area under the ROC ( Receiver Operating Characteristic) Curve greater than 0.9.