Image Processing and Data Extraction through OCR

In Progress Posted 7 years ago Paid on delivery
In Progress Paid on delivery

Image Processing and Data Extraction through OCR

The project consists on writing code to extract information from a scanned guide (which we will provide to you as a pdf file) and store it in a spreadsheet. The guide follows a regular, structured format in which restaurants are listed in alphabetical with the following information:

Line 1:

• Business Name: In larger font and bold letters

• Cuisine Type: Following restaurant name, separated by two spaces, in a smaller font and italics

• Three different ratings and Cost: Aligned to the right of the line, each enclosed in a shadowed box. The cost is the last box and the amount is preceded by a dollar sign ($).

Line 2 (may be several lines): Fields in the second line are separated by |

• Neigborhood: In bold

• Address: Shows address and a short explanation of the location in parenthesis

• Phone Number: Formatted as xxx-xxx-xxxx

• Website: If available

If a business has several locations, other locations will be listed following the same format.

Line 3 (or more depending on length of previous field)

• Short description

A sample page is included in the attachment.

Out of the box OCR software has not produced satisfactory results for line 1 because of the boxes that enclose ratings and costs.

Of the information listed, we need to create a spreadsheet with the following fields:

Business Name, Type, Neighborhood, Address, Phone Number, each of the three Ratings, and Cost.

If the business has several locations, each location should be listed in a separate record/row.

The task for the developer consists in writing code that takes as an input the pdf file we will provide and produce as output a csv file with the information listed above. The freelancer can solve the difficulty created by the boxes enclosing the ratings and price as he/she sees fit (a feasible approach could be to first remove the boxes through image processing in order to be able to use standard OCR packages). We will provide several sample pages for testing purposes. The solution needs to be reliable and is expected to work on a file containing several hundred pages.

Data Mining Data Processing Imaging OCR

Project ID: #12981065

About the project

22 proposals Remote project Active 7 years ago

22 freelancers are bidding on average $391 for this job

semi786

Hi Sir/Mam, It is being my pleasure to introduce you to me. I have taken a look at your project description and I'm confident that we can work together. I am expert in data entry, Scrapping any data from pdf/OCR. Re More

$250 USD in 0 days
(323 Reviews)
8.0
mirniyazuddin92

Dear Sir/Ma'am, I am a Web research, Data Entry & Webs Scrapping expert. I checked and understood your requirements. I can handle this job very well to your appreciation. I can find and extract the informati More

$250 USD in 4 days
(234 Reviews)
7.4
AImobile

Dear sir. I have read your job posting and very excited. As you can see from my portfolio, I am an OCR and Image Processing expert. If you hire me, I'll satisfy you. I hope to work with you. Best regards.

$526 USD in 10 days
(19 Reviews)
7.1
Motiurlaw

Hi, I have seen your SAMPLE pdf file and it is quite clear. I will extract these all data manually one after another. OCR does not work 100% accurately. I ensure you 100% accuracy. I want to show you sample so you can More

$405 USD in 6 days
(227 Reviews)
6.9
diamond247

I have gone through the project details, and understand what you are trying to achieve. We have completed a number of similar a projects to yours. And we are experienced critical data research. ## A little about us More

$250 USD in 10 days
(106 Reviews)
6.6
ggopi

Hi, I'm Gopi, I have more experience in this field. I'm ready to start your project immediately. Please send me further details. Regards Gopi

$555 USD in 10 days
(55 Reviews)
5.7
mehedi276

"Rate and duration will be fixed on discussion" Dear Hiring Manager, I have gone through your job posting and become very much interested to work with you. I am an expert in these fields. I have already completed sev More

$515 USD in 10 days
(51 Reviews)
5.4
jayjossef132

I can transfer all the data in excel spreadsheet. I am highly organized and able to deliver a high quality results. Let me handle this project in a timely manner. Looking forward for your response to get an interview b More

$500 USD in 7 days
(48 Reviews)
5.2
shahiddar

Hello, My name is shahid from Kashmir Over the last 7 years, I have worked for several clients. Joined Freelancer with over 7 years rich experience in the field.I have successfully completed more than 1000 projects More

$250 USD in 10 days
(11 Reviews)
5.2
Saidul28

A proposal has not yet been provided

$277 USD in 5 days
(11 Reviews)
3.7
rumesh402

Hello, my name is Rumesh. I noticed your advertisement about the project and think I would make an excellent candidate for it. I read through the job details extremely careful and I'm absolutely confident that I ca More

$250 USD in 0 days
(2 Reviews)
1.8
jkodiyil

I have been using Tesseract to digitize scanned court documents, to OCR PDF with images, and to aid investigations. I can show you a demo app before you commit. I would need a larger set of the sample PDF to fine tune More

$333 USD in 10 days
(1 Review)
1.4
WETHINJP

Thank you for looking at this proposal. I've downloaded and reviewed your sample file and it looks like a Zagat style guide. There are several ways we can address this type of data, and my team and I are ready to assi More

$750 USD in 10 days
(0 Reviews)
0.0
nathanwilce

I have created OCR programs before. I will test the sample sometime later and message you the results. Please do message me so I am able to do this.

$555 USD in 10 days
(0 Reviews)
0.0
johanabradi86

Hello! My name is Johana Bracho. I'd love to help on your project. I am trained in the use of Excel, Word, PowerPoint, Microsoft Office, Open Office, statistical packages, and data analysis. A week ago, I finished a jo More

$277 USD in 10 days
(0 Reviews)
0.0