I am looking for help digitizing data from several PDFs into Microsoft Excel.
Required skills: Ability to open PDFs and open/save Excel spreadsheets; Ability to read and understand English text; Speed and accuracy entering numerical data
Total time required: Estimated 70-94 hours (depending on your speed)
Specifically, I am trying to encode data on U.S. agriculture from the 1930 U.S. Census into an Excel spreadsheet. There are 47 statistics that I would like to record, each of which is observed for 1,057 counties (covering 12 different U.S. states). This data can be found in PDF tables that I have carefully organized and will provide to whoever is hired. There are a total of 9 tables per state; each of these tables has an identical format across states. There is therefore a grand total of 9 tables x 12 states = 108 PDF files. To see a sample table, click here: [url removed, login to view]
I have constructed a blank Excel spreadsheet with a separate worksheet for every state, which you can view here: [url removed, login to view] In each state worksheet, there is a row for the name of every statistic that I want to record and a column for every county in the state. In each cell of this spreadsheet, I’d like to have someone enter the value of the statistic for the county given by the column heading. Note that the name of the PDF with the data for a given row is provided in the left-most column of every worksheet. (This information is meant to make it easier to find the right data to enter into the spreadsheet.)
To clarify the requirements of this job, I provide an example of what is involved. For this example, let’s suppose we are entering data for North Dakota (ND) from the PDF file I have linked above ([url removed, login to view]). To see how it works, do:
1. Open the linked spreadsheet, go to the “ND” worksheet
2. Look at statistics in Column B, rows 4 to 18 of spreadsheet
3. Find the identifier v2t1 in left-most column of spreadsheet
4. Open the linked PDF table named [url removed, login to view]
5. Look at statistics on first page of PDF, compare to spreadsheet
The job is then to enter data on every requested statistic in rows 4 to 18 for every county in North Dakota, copying from the PDF into the designated cells in the spreadsheet. Note that the commas in long numbers don’t have to be copied, i.e. “1,000” should be entered as “1000”.
Once I have a few candidates, I would like to do a quick screening / practice exercise, asking you to look up the value for 4 or 5 different statistics for a particular county. The screening should not take more than a few minutes; if it does, then this job may not be the one for you.
I have myself hand-entered 10 or 12 statistics for all 1,057 counties to get a feel for what this job will be like. I found it to be most efficient to focus on a single statistic at a time and enter data for all 1,057 counties before moving on to the next statistic, but you might have a better idea. It helps to either print out the PDF tables or have them shown on a second monitor. My experience was that it took about 1 hour to enter data for each statistic in every county, though you should conservatively budget for each one to take 2 hours.
I will be paying 10% up-front, an additional 10% for every 6 statistics compiled for all 1,057 counties, and the final 10% when the job is completed. I will be spot-checking the work for typographical errors and allow for a small degree of error, which is probably inevitable for this type of job, but I am expecting highly accurate typing. I would like to see this job finished by October 31 (at the latest). I am generally available discuss the job and take questions that arise from 9:00 a.m. to 7:00 p.m. PST Monday through Friday. It may sometimes take me 2 or 3 hours to respond by email, but never more than a day, including on weekends.
This digitization is for academic research on the diffusion of farm tractors in the 1920s U.S.
Thanks to everyone who has bid thus far. I wanted to add a couple of points for clarification to my original job posting. I am new to Freelancer, so please forgive me for not offering this information earlier:
1. My budget is $250-$450. I did not realize it was possible to specify a custom range at the time of posting and ended up choosing the "$250-$750" category from a drop-down list. Sorry to all those who submitted bids above that range!
2. I have tried to OCR these PDFs without success, including with very high-powered OCR software like Tesseract. The quality and resolution of the image is simply not high enough to support that, which is why I came to Freelancer.
3. I would like to use Milestone payments, but I am open to configurations of the payments other than the one I described in the job posting. I know several bidders have suggested other configurations, and that's okay.
63 freelancers are bidding on average $394 for this job
Hi, I am expert and have very exclusive tools to make appropriate conversion from attached sample PDF file to like sample Excel file. Please check your PM for more details, thanks.
Hi.. Expert typist here. Interested in your project. I assure you 100% accurate and good quality work. Ready to start. have a look at [url removed, login to view]
Hi dgross03, excellent and professionally trained data entry specialist here. I provide error free data so you don't have to proofread it again. I am very fast, efficient and committed.