Completed

Google Scholar Web Scrape

I need to scrape search results data from google scholar and avoid getting blocked. There is a similar python based tutorial here [login to view URL] so this should be a quick and easy project.

This project actually has 2 parts.

The first part is collecting historic data from 2017-2021 for about 45 different searches. Here is an example search - [login to view URL]

I will provide a txt file with a list of the exact search parameters on each line which will look similar to:

"weight loss" OR "obesity"

"Lymphoma" OR "Lymph" OR "Lymphatic"

"Eye" OR "Cataract" OR "Retina" OR "Glaucoma" OR "myopia" OR "hyperopia"

Whatever solution you create should allow me to add to or modify the specific searches either in a text file or directly in the code.

IMPORTANT: The searches need to be filtered to only search the abstracts. The default is to search the entire document. The searches for the first part of this project are historical (from 2017-2021). Therefore, these searches will result in over 10,000 results per search. I need to avoid the limit of 1000 results per search result. I could filter by year but the results will probably still be over 1000 for any given year. So you need to find a solution to capture all historical data. See attached document with further explanation.

For the historical searches I need to capture the total results as listed near the top of the page as well as the "Vancouver Citation" data from a link that exists under each result (within the citation javascript link).

The data will need to be saved to a mysql table and it will be appended on a weekly or monthly basis with the second part of the project.

Again, the first part is to collect the historical data and save it in a mysql database table and the second part is to append that table with new results.

When sorting the google scholar results page by date, the results will be defined by the "XX days ago - " text which is seen when you sort results by date (see [login to view URL],47&as_vis=1&q=%22weight+loss%22+OR+%22obesity%22&scisbd=1). For this part, we would simply need the script to only grab results that meet the date criteria (for example if plan to run the script once a week we would need to scrape specific results tothat include one of the following texts: "1 day ago", "2 days ago", "3 days ago", "4 days ago", "5 days ago", "6 days ago", or "7 days ago")

Te attached pdf provides further explanation of the needed extraction.

The final code should capture the data and store the data in a predefined mysql database table. The code must be delivered as either a fully functioning .py file or .php file. This file will be run weekly as a cron.

You will need to work from your own server environment. When the project is complete you can send me a video showing the functioning script and sample output. Be sure to test the script running many searches because the script needs to avoid being blocked by google. You can incorporate a proxy if needed and I will purchase accordingly (the proxy cost should not be more than $5-$10/month).

I will create 2 milestones. The first milestone will be 50% of the total and will be released upon confirmation of the functioning script (video and sample output).

The final milestone will be released after the script has been delivered and is functioning on my server. When delivering the script you can leave the server section with dummy data. I will modify the script with the server details for the storage of the collected data.

Skills: Python, Web Scraping, Data Mining, PHP

See more: php web scrape script, google map web designing, storyboarding creating web based tutorial instructions, web scrape google maps latitude, web scrape google, google finance simple web scrape, google map web based route creator, google spreadsheet web scrape, google calendar web based php, web scrape google results spreadsheet, scrape html java based web, web scraping google scholar, google place web service data scrape, scrape google scholar r, web scrape using google sheets, how to web scrape google, web scrape google search results python, scopus vs web of science vs google scholar, google scholar web service

About the Employer:
( 39 reviews ) aldie, United States

Project ID: #31873729

Awarded to:

mananraja

hey, I read what you need. I have scraped from google scholar for a freelancer project. (can give you reference to that project as well) I am interested to talk more details in chat.

$100 USD in 2 days
(222 Reviews)
6.7

9 freelancers are bidding on average $115 for this job

hoisticdeveloper

Hi. I am a skilled coder with 8+ years experience in Web Scraping, Python, PHP and Data Mining. An expert developer, able to learn and adapt quickly to new technologies. I've checked the job details. You can check my p More

$155 USD in 7 days
(45 Reviews)
6.7
(148 Reviews)
6.7
vladilavsuhovoy1

Hey! I am skilled Python software engineer. I am familiar with Python and I have a lot of work experiences in Python, PHP, Data Mining and Web Scraping. I can start right away. I want to discuss for this project in d More

$150 USD in 5 days
(30 Reviews)
5.8
datascientist90

I am a Data Scientist with Machine Learning Expertise. Please take a look at my profile and reviews for references.

$40 USD in 7 days
(9 Reviews)
5.2
(9 Reviews)
4.3
Valuesolutions

Hello, I hope this finds you well. I have just seen your project requiring; PHP Python Web Scraping Data Mining I believe that my 10-year experience in this field is what you need right away. Avoid the headache of lo More

$400 USD in 7 days
(17 Reviews)
5.7
nguyentuan24

Hi Sir! I am professional Python software engineer. I have a lot of work experiences in Python,scraping website I done mana project about scraping website, video. I can start right now. Please contact me to discuss m More

$40 USD in 7 days
(12 Reviews)
3.5
ahmershah1

Hello there, First of all its not a copy paste purposal. I have seen your project's requirements carefully. I am ready to start your project immediately. Just finished two projects on other platform successfully. Let More

$50 USD in 7 days
(0 Reviews)
0.0