build a Statistical Machine Translation

This project received 2 bids from talented freelancers with an average bid price of RM1899 MYR.

Get free quotes for a project like this
Project Budget
RM825 - RM2475 MYR
Total Bids
Project Description

Basically, language model adaptation techniques can be referred to two main categories. The first category includes the techniques that based on the data selection where task-oriented corpus can be extracted and used to train and generate models for specific translations. While, the second category focuses on developing a weighting criterion to assign the test data to specific model corpus.

This research aims to introduce language model adaptation approach that combines both strategies of the previous two categories of language model adaptation.

At first, this approach applies data selection for specific-task translations by dividing the corpus into smaller and topic-related corpora using clustering process. Using the Europarl corpus WMT07 that includes bilingual data for English-Spanish, English-German and English-French, the experiments investigate the effect of different approaches for clustering the bilingual data on the language model adaptation process in terms of translation quality. The approaches used for clustering bilingual data are direct clustering, clustering based on the development set and clustering based on the test set. After defining the sub-corpora as a result of the clustering process, several language models can be built based on these corpora. Using a specific weighting criterion, a mixture of language models can be defined to assign any given data to the right language model to be used in the translation process. For this purpose, three different weighting criterions (based on the entire test set, based on the sentence level, and hybrid approach based on both the sentence level and the entire test set)

Skills Required

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online