R and MySQL -- Need within 15 hours

$10-30 USD

In Progress

Posted

about 7 years ago

$10-30 USD

Paid on delivery

1. Use twitter message table inside twitter database: • Use SQL to get one user id who posts the most tweets. You can only use one query to get the id. • Extract the msg date and the tweets from the database into R. • Remove the stop words, non-English words from all tweets. • Calculate the length (number of words) for each tweet. The output will become a data frame with 2 columns (date and tweet length). • Download spx data using R package (such as quantmod). The range of the date is the same as the date range in previous tweet data frame. Then, you should calculate daily return of spx. • Merge the stock data frame and tweet data frame together by date. And run linear regression using return as dependent variable, tweet length as independent variable. • Show the summary result of the linear regression. 2. • Write a R function to extract data from SQL database, accounting table. • The input variables are two vectors, year and sector. • The output variable is a data frame with 5 columns, sector, ticker, sales, size, ratio. • The function should take the input year and sector vector into a SQL query, pass it to database, and get the data. The data comes from accounting table, with year and sector equals to your inputs. If your inputs are more than 1 values, the selected data should have sector and year included in the input vector. • After you fetch the data, calculate the size as logarithm of totalassets, and ratio as totalliabilities/totalassets. Sales comes from the data directly. 3. • Use previous function to fetch the data. • Randomly pick up 3 sectors and 1 year from the database. You need to use SQL query to get all unique sectors and years, then use R function like sample() to select the sample. You should remove NULL value in sector and year. • Use the above sectors and year as input into the function to get the output. If the number of observation in one sector is less than 100, re-generate the data. • Split the data into training (50%) and testing (50%) dataset. • Pick up a classification method, train the model using training data set and predict the sector of testing dataset. You can pick up any classification methods that we learned or a new method. The sales, size and ratio are 3 input variable. Sector will be output label. • Write a R function to repeat the previous step (start from splitting data into training and testing sets). The input variable is N (equals to the repeating time). The output is prediction accuracy rate on testing dataset. In the function, you should run the classification N times with re-sampled datasets and get different accuracy rate. Then you should return the average accuracy rate as output. • Run the function with input N = 5 and show the result.

R and MySQL -- Need within 15 hours

$10-30 USD

$10-30 USD

About the project

Looking to make some money?

Benefits of bidding on Freelancer

About the client

Client Verification

Other jobs from this client

Similar jobs