Closed

Convert PDF to text - repost

This project received 6 bids from talented freelancers with an average bid price of $207 USD.

Get free quotes for a project like this
Employer working
Project Budget
$30 - $250 USD
Total Bids
6
Project Description

We need to process existing PDF files - they must be OCRed and converted to text and then to clean HTML. Please find enclosed sample of what you will be dealing with. If you are interested in this job, please submit a processed result of the attached sample (2 pages).
Please also state what OCR software you use and what version. You MUST use an OCR software, no typing is allowed.

The result MAY NOT contain headers, footers and page numbers from the magazine, just clean text with only very basic formatting - such that would be suitable for publishing on a web site i.e. bold, italics, underline should be preserved, paragraphs, headings, basic tables, uppercase, lowercase, upper index and lower index should be preserved, every other formatting (text flow, columns, weird fonts, page numbers etc.) should be discarded. The desired result is a clean readable text with clean formatting.

The result must be submitted in clean HTML (such as that produced by [url removed, login to view] - you can use that if you wish or anything else you prefer) and separated into single concise articles.

The most important consideration - the result should be as typo-free as humanly possible. You are not supposed to check for grammatical correctness, just that the recognized text is the same as in the source file. You MAY correct an obvious typo in the original text, but don't have to.

EXTREMELY IMPORTANT:
1. Bid as if for 1000 pages of text.
2. Send your processed file ([url removed, login to view]) in ONE clean HTML, free of any errors and processed according to the requirements of the project.
3. There is no need to write us about what you can do and how much experience you have. We are NOT INTERESTED in any of that. Just deliver the processed sample, if you are interested in this job.

Please note that the whole point of this exercise is evaluating YOUR PERFORMANCE, so do not send the garbage right out of your OCR software - that is NOT HELPFUL and your bid and any messages will most likely be ignored. We need to see what kind of RESULT you can deliver (and your ability to stick to instructions provided), so try hard to impress us, if you are interested in this job - but not by WORDS, do it by your WORK.

We will select several providers (if any suitable can be found) for this job as we have many files to process. This can turn into a long-term job for you, if you can deliver.

Thank you for looking!

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online