Web scraping project for 1 main real estate sites:
1. You build 1 application for site that scrape all properties for each site
all not 5K or 10K or 50K
all what shown on web that is the criteria
or at least 90% of properties shown on site via search across all cities and zips showing on iste
2. the kind of scraping you use and the app must use method to overcome blocking scrape tech
you should use web proxies. You must a proxy service that we will pay that implement a list of proxies for web scraping
you app should include an area where you paste or us the list of proxies from a list and you use the app using random change in proxies for the web scraping
will show u sample script. But you need to be an expert in web scraping using web proxies and anti scraping tech!
If you are not pls do not bid.
3. Your apps can be run automatically on a set schedule on our dedicated scraping server
and will be loaded automatically to a temp table real estate site CMS
Automatically -with an option to download the xls results sscraped file manually
so 2 options: 1. manual download after extraction 2. auto upload to temp table in DB
Only new listings should be added after each schedule run on top of the Database
You need to make sure and create the interface/program code that will
"feed" oru cms with the correct info(that will show most recent properties first)
We will give you access to our site CMS and web pages as needed
we can run the apps daily, weekly or whenever and it will always update( no duplicates)
our site CMS and scripts
4. You will have an access to unlimited server..u can set any script on it
which has already Python on it
but you can use or install any other scripting you need
5. Your apps can not be stopped or shut down so you need to put effort
in using non ip detection means - web proxies
so we wont be shut down
or you write the app so it runs smart and no shut down.
we want someone to keep support for these 4 apps ongoing
6. we can pay when apps are working - release milestone first
and after about 1 week when app run for each site at least 3 times
in 2 days interval
7. we will send you the sites
8. your scrape data:
8.1 has to match all info from web site extracted/scraped and imported
to fit our CMS.. if you have more fields in scraped site you import what we have
and fill all the fields we have in our DB..Yes..we should have option to extract from your app
all your data into a spreadsheet and not directly to our CMS which reside on another server/site/DB
8.2 You need to program and write any API that push these data automatically into our CMS
with some flag in the app that shows when last data was run
again no duplicated and smart apps for the 4 sites that
will update our DB automatically . if some api needed to run
0r web service on our DB of site/cms that will accept and update the data
sent from your app
8.3 we should be able to login to our scraping server and see the interface for your 4 apps
and the button to start schedule an update to our cms
with progress tracking bar or some means to see How much left to finish the updates scrape
and the option for each to auto update our DB/CMS or download to xls spreadsheet
8.4 search for single family residence (SFR)+ Condo/Townhouse + 2-4 Multi Units across all 4 sites
remember: your app and web scrape should be run per auto schedule
We need someone reliable and trustworthy
and knows this stuff.
site is XXXXX
and we want from all the zips they have in US
we want also all type of
Single Family Residence, Condo/Townhomes....
Commercial properties such as income properties 2-4 units and 5+ units, office buildings, vacant lands, industrial, businesses for sale, etc....
REO's (bank sales), Short sales, conventional sales, etc...
** we have a sample extract file to show