Hire Freelancers

Text Extraction from HTML using Python

$30-80 USD

Completed

Posted

over 14 years ago

$30-80 USD

Paid on delivery

The objective is to extract a section of TEXT from a document that either may or maynot have HTML based tags. The program should be in PYTHON. - The program should first download a file from a given URL. - The program should then strip the file of all HTML tags and retain only text. I prefer a non-REGEX based removal of HTML tags using sgmllib or other libraries. - Then it should use REGEX to extract a section of a document based on some specified rules. - It should then save the extracted text in a location on local hard drive. - It should delete the PARENT file that was downloaded URL. As an example, I have a document with the list of URLs. Let us suppose we want to extract the text in the section titled "Item 7. Management Discussion and Analysis" based on the rule. Please feel free to play with this test case and with the REGEX rules for extraction. Please note that the format of the document changes between the first and the last i.e. the first ones are not HTMLs.

Project Management

Software Architecture

Software Testing

Project ID: 3070246

About the project

5 proposals

Remote project

Active 14 yrs ago

Looking to make some money?

Email address

Benefits of bidding on Freelancer

Set your budget and timeframe

Get paid for your work

Outline your proposal

It's free to sign up and bid on jobs

Awarded to:

Flag of UNITED STATES

See private message.

$25.50 USD in 2 days

5.0

(6 reviews)

3.7

3.7

5 freelancers are bidding on average $46 USD for this job

See private message.

$68 USD in 2 days

5.0

(41 reviews)

6.7

6.7

@octaviantheodor

Flag of BELGIUM

See private message.

$25.50 USD in 2 days

4.8

(15 reviews)

3.7

3.7

@realresultsmedia

Flag of ROMANIA

See private message.

$42.50 USD in 2 days

4.5

(10 reviews)

2.9

2.9

See private message.

$68 USD in 2 days

0.0

(0 reviews)

0.0

0.0

Post a project like this

About the client

Flag of UNITED STATES

Levittown, United States

5.0

10

Member since Nov 3, 2006

Client Verification

Other jobs from this client

Extracting text from HTML

SAS programming

Python code edit

Python XML parsing

Urgent Python program

Similar jobs

Python Expert for Online Code Deployment

$2-8 USD / hour

Laravel Developer - Write script to count stock

Python Unit Testing with Pytest

₹600-1500 INR

Social Network Application with Machine Learning

₹12500-37500 INR

Bug Fixing in Laravel Project (Create issue)

Custom Newsletter Website Development

Build me a dynamic form with zapier integration in wordpress

₹1500-12500 INR

₹200-250 INR / hour

Operational Management & SCM Improvement

₹1500-12500 INR

Signal Processing & Spectrum Analyzer Development in java

Technical Advice Needed for Project

Logic Error Resolution in Neural Network Code -- 2

₹600-1500 INR

Grid connected synchronous reluctance generator with damper windings

Single AI Proficient React/Laravel Developer to Join our team @ AdultAi

$8-15 USD / hour

Revamp Drupal-Based Case Management System into Modular, API-Driven Architecture

₹250000-500000 INR

Real-time Go Developer for Websocket

LG TV App Functionality Tester - YOU MUST HAVE A REAL LG webOS TV with Res:1280 X 720

Nanofluid Heat Transfer Literature Review

pepper robot programming

Botpress WhatsApp Chatbot for Reservations

Post a project like this

Thanks! We’ve emailed you a link to claim your free credit.

Something went wrong while sending your email. Please try again.

Loading preview

Permission granted for Geolocation.

Your login session has expired and you have been logged out. Please log in again.