Find Jobs
Hire Freelancers

R and MySQL -- Need within 15 hours

$10-30 USD

In Progress
Posted about 7 years ago

$10-30 USD

Paid on delivery
1. Use twitter message table inside twitter database: • Use SQL to get one user id who posts the most tweets. You can only use one query to get the id. • Extract the msg date and the tweets from the database into R. • Remove the stop words, non-English words from all tweets. • Calculate the length (number of words) for each tweet. The output will become a data frame with 2 columns (date and tweet length). • Download spx data using R package (such as quantmod). The range of the date is the same as the date range in previous tweet data frame. Then, you should calculate daily return of spx. • Merge the stock data frame and tweet data frame together by date. And run linear regression using return as dependent variable, tweet length as independent variable. • Show the summary result of the linear regression. 2. • Write a R function to extract data from SQL database, accounting table. • The input variables are two vectors, year and sector. • The output variable is a data frame with 5 columns, sector, ticker, sales, size, ratio. • The function should take the input year and sector vector into a SQL query, pass it to database, and get the data. The data comes from accounting table, with year and sector equals to your inputs. If your inputs are more than 1 values, the selected data should have sector and year included in the input vector. • After you fetch the data, calculate the size as logarithm of totalassets, and ratio as totalliabilities/totalassets. Sales comes from the data directly. 3. • Use previous function to fetch the data. • Randomly pick up 3 sectors and 1 year from the database. You need to use SQL query to get all unique sectors and years, then use R function like sample() to select the sample. You should remove NULL value in sector and year. • Use the above sectors and year as input into the function to get the output. If the number of observation in one sector is less than 100, re-generate the data. • Split the data into training (50%) and testing (50%) dataset. • Pick up a classification method, train the model using training data set and predict the sector of testing dataset. You can pick up any classification methods that we learned or a new method. The sales, size and ratio are 3 input variable. Sector will be output label. • Write a R function to repeat the previous step (start from splitting data into training and testing sets). The input variable is N (equals to the repeating time). The output is prediction accuracy rate on testing dataset. In the function, you should run the classification N times with re-sampled datasets and get different accuracy rate. Then you should return the average accuracy rate as output. • Run the function with input N = 5 and show the result.
Project ID: 13953395

About the project

2 proposals
Remote project
Active 7 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
Awarded to:
User Avatar
$83 USD in 1 day
4.5 (4 reviews)
3.0
3.0
2 freelancers are bidding on average $54 USD for this job
User Avatar
A proposal has not yet been provided
$25 USD in 1 day
0.0 (0 reviews)
0.0
0.0

About the client

Flag of UNITED STATES
New York, United States
5.0
3
Payment method verified
Member since Oct 6, 2014

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.