In Progress

Text Filtering Tool Based on Article Needed

Here is some nice artivle I found:

[url removed, login to view]

The idea is to filter from text corpora sentences that are irrelevan in some text subject domain. For example it is not necessary to have sport comments in medical texts.

I would need an implelentation of such tool. For LM training I would like to use [url removed, login to view] tool.

In their solution they filter only monoligual data. I would also like the program to filter bi-lingual. For example if I want to filter [url removed, login to view] file based on [url removed, login to view] I remove poor lines from somedata.en. But also if I want to filter [url removed, login to view] and [url removed, login to view] files based on [url removed, login to view] I remove poor lines from [url removed, login to view] and corresponding line in somedata.fr. We can assume that one line represents one sentence in such files.

Skills: Data Processing, Java, Linux, Perl, Python

See more: article tool, on line processing, sport data, remove comments java program, rnnlm tool, rnnlm, data filtering, article java, text paper, java program remove comments program, remove comments java, text files filter, texts sport, based paper, program filter text, text data filter, Remove text, java lines program, java data pdf files, filter tool

About the Employer:
( 139 reviews ) Czestochowa, Poland

Project ID: #7018861

Awarded to:

SauraLancer

Hi, I'm expert and experienced in Natural Language Processing and Machine Learning preferably using Python. I've also participated in a National Level Machine Learning Programming Contest in India where I stood seco More

$250 USD in 10 days
(3 Reviews)
3.5

2 freelancers are bidding on average $225 for this job

linuswebplus

A proposal has not yet been provided

$200 USD in 3 days
(0 Reviews)
0.0