source code in Python for 2 web scrappers:
1. enter a logged system, goes through a list of entries, open each one and extract the data of the pages into a RDS
2. goes through a series of similar sites and retrieve the documents converting PDF to text. The sites store documents relating to decisions on cases of courts of law. Each "publication" is provided in PDF (one file per page).
Scrapper #2 Description:
# Project must be built using AWS Cloud.
# Function must be in Python, as a Lambda, exposed as a REST via API Gateway
# Receiving the PDF URL as a parameter
# Saving each page as an TXT file in a S3 bucket.
Ex: DOM-SP-AAAA-MM-DD-<page number>.txt
# Project must be delivered with a AWS CloudFormation so I can easily deploy in my account.
In the example:
Scrapper must goes through pdf pages inside the URL passed in the parameter
in the example
it has 84 pages (Páginas) that should be converted from PDF to text
in another link:
it has 70 pages (Páginas) that should be converted from PDF to text
17 freelancers are bidding on average $157 for this job
Hi,there. I'm python expert and have rich experience. As you can see my profile, I have done past works perfectly. Let's discuss more detail in this chat box. Thank you. From Ase Naritoshi.
hi there, I have done similar project in on two different we site using Scrapy. I can complete this as well. Please contact me on Pm for details. sincerely, Isaac
Hello! I am a python developer. I looked at your project and it seems interesting. I have all necessary skills required for this project. Ping me to discuss in detail.
Greetings sir, I am highly interested in your project.I have completed many projects like this. your 100% satisfaction is assured if you allow me to serve. first chat with me where we can talk about briefly.