Find Jobs
Hire Freelancers

Document parsing and text mining in Python

$15-25 USD / hour

Cancelled
Posted about 9 years ago

$15-25 USD / hour

Programmer for Tree Parsing/Text Mining Job Summary Seeking an experienced programmer for engagement in long-term freelance work. Strong tree parsing skills are essential. A background in NLP and experience with NLTK is preferred but not required. Pay is commensurate with experience and is hourly-based. As part of our hiring process, we ask that interested candidates successfully complete the tasks below to demonstrate basic competency. Project Background The SEC stores various text files they receive from companies on their Edgar website. The files typically contain detailed discussions of companies’ performance as well as financial data summarizing their performance. Attached is a random sample of 15 full .txt files from 5 different years with a file type of “10-K” from Edgar. You will find files which embed HTML, SGML, or XBRL code, in addition to tables, special characters, images, and other embedded files, such as PDF, etc. Tasks Extract the following sections from the 10-K using a tree parser: Management Discussion and Analysis (MD&A), Risk Factors, and Notes to the Financial Statements. Flatten each section extracted to raw text. That is, remove all code, tables, images, or embedded files. Write the raw text of each section to a separate .txt file. The filename for the raw text file should be that of its parent with a suffix for each section appended (e.g., “*[login to view URL]”, “*[login to view URL]”, and “*[login to view URL]”). Discuss any outstanding issues, questions, or concerns regarding the steps above. For example, discuss weaknesses in your approach to identifying section and sentence boundaries. Apply For full consideration, please upload your resume, output, and responses by April 15, 2015. We are an equal opportunity employer. Work permits or visas are not required.
Project ID: 7393520

About the project

8 proposals
Remote project
Active 9 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
8 freelancers are bidding on average $22 USD/hour for this job
User Avatar
A proposal has not yet been provided
$17 USD in 3 days
5.0 (43 reviews)
4.8
4.8
User Avatar
Hi! I am professional C/C++/C#/Java/Python developer. I can do this project with highest quality! Best regards, Szymszetinsl
$21 USD in 30 days
5.0 (2 reviews)
3.3
3.3
User Avatar
hi i can parse text from many type of files including .txt, csv, pdf, doc, docx, png, jpeg, psd, rst etc. i am ready to do the task . i could not see that link of text files ? could you give me the text file ?(link)
$17 USD in 20 days
5.0 (4 reviews)
3.2
3.2
User Avatar
Hi, I am a graduate research student doing research on network programming languages. My work on NPLs involves representing network topologies in graphs like tree data structures and running different algorithms on those data structures. I also have deep understanding of NLP as I have worked on lexicons, parsers and regular grammars. Besides, I have experience of 4 years in software development. I can deliver you the result with the quality you expect. I haven't found any attachment. Please provide the files. I shall upload the resume, output and responses soon after having the files. Thanks, Shahbaz
$22 USD in 20 days
0.0 (0 reviews)
0.0
0.0
User Avatar
Hello, I'm a freelance Python developer and I and very interested in being your developer for the job '"Document parsing and text mining in Python" I have worked on projects that required parsing files and I worked with pdf, doc, csv, docx and odf formats.I have also worked on two projects that involved data mining, getting to use libraries such as Numpy, Scipy, NLTK, Scrapy, Gensim, Requests and Matplotlib. Worth mentioning is that I performed some Natural Language Processing on the data and also semantic matching. Please refer to my portfolio for previous projects I have handled. I'm looking forward to hearing from you, Regards, Aurlus I. Wedava
$16 USD in 40 days
0.0 (0 reviews)
0.0
0.0
User Avatar
I can set this up in Python. - networkx for graph object to trace extractions. - pdf to text no problem. Based in Toronto. Though, I'm afraid I can't commit to a skills demo without a milestone or compensation.
$27 USD in 5 days
0.0 (0 reviews)
2.3
2.3
User Avatar
Great experience in NLP, text mining, contextual extraction, sentiment analysis. Using combination of advanced tools which are written by me and commercial software. Also could twice increase amount of work hr/week if needed. Also got several ready-to-work classification taxonomies in different subject domains from past projects. All my code is working now like client-server python software, sending text to server and receiving clean version, facts, categories. Also could do data mining on text/statistical/social graph information. P.S. If needed, could enable to work small team (they would be lucky to take a part in interesting project) to solve advanced statistical tasks using textual/numeric information, for example, parsing pharma data and searching if symptoms of seasonal illness correlate with prices or smith else, like retail customer segmentation, telecom/banking messages analysis, credit scoring models... P.P.S. By the way, file attached with data is not available now for testing... m.b. deleted by system... Could you please attach that file to test my skills?
$23 USD in 20 days
0.0 (0 reviews)
0.0
0.0

About the client

Flag of UNITED STATES
United States
0.0
0
Member since Mar 29, 2015

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.