Find Jobs
Hire Freelancers

Text Extraction from HTML using Python

$30-80 USD

Completed
Posted over 14 years ago

$30-80 USD

Paid on delivery
The objective is to extract a section of TEXT from a document that either may or maynot have HTML based tags. The program should be in PYTHON. - The program should first download a file from a given URL. - The program should then strip the file of all HTML tags and retain only text. I prefer a non-REGEX based removal of HTML tags using sgmllib or other libraries. - Then it should use REGEX to extract a section of a document based on some specified rules. - It should then save the extracted text in a location on local hard drive. - It should delete the PARENT file that was downloaded URL. As an example, I have a document with the list of URLs. Let us suppose we want to extract the text in the section titled "Item 7. Management Discussion and Analysis" based on the rule. Please feel free to play with this test case and with the REGEX rules for extraction. Please note that the format of the document changes between the first and the last i.e. the first ones are not HTMLs.
Project ID: 3070246

About the project

5 proposals
Remote project
Active 14 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
Awarded to:
User Avatar
See private message.
$25.50 USD in 2 days
5.0 (6 reviews)
3.7
3.7
5 freelancers are bidding on average $46 USD for this job
User Avatar
See private message.
$68 USD in 2 days
5.0 (41 reviews)
6.7
6.7
User Avatar
See private message.
$25.50 USD in 2 days
4.8 (15 reviews)
3.7
3.7
User Avatar
See private message.
$42.50 USD in 2 days
4.5 (10 reviews)
2.9
2.9
User Avatar
See private message.
$68 USD in 2 days
0.0 (0 reviews)
0.0
0.0

About the client

Flag of UNITED STATES
Levittown, United States
5.0
10
Member since Nov 3, 2006

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.