This is about statistical analysis of a data collection as well as different data reduction methods, and in particular, dimensionality reduction through feature extraction. You are given two datasets, each containing a data table of 1000 vector with 100 attributes (i.e., dimensions) in two files with 500 samples for each file. Each dataset is given by two tables of 500 samples each. Both datasets are given as text table files where each dataset is represented as a 1000 x 100 matrix where each row of the matrix is a vector. You are further told that for each dataset, for all the samples (i.e., vectors) the component values of each vector follow the same distribution.
1. Determine the distributions of the two vector component values for both datasets. For each dataset, randomly pick up 10 samples and report the distribution parameters for each of the 10 samples.
2. Compute the norms for all the samples for both datasets. Then determine the distributions for the norms of both datasets, respectively, and report their distribution parameters.
3. Implement PCA and DCT methods and apply them for feature extraction to the two datasets, respectively. Report the reduced dimensionalities for the two datasets after the feature extraction for PCA and DCT, respectively.
4. Compare the feature extraction results between the two methods for the two datasets, respectively, and report your comparison conclusion.
You can use whatever programming language you are comfortable with.