Full text search on Amazon EC2

Closed Posted Jun 17, 2009 Paid on delivery
Closed Paid on delivery

I need a running Amazon EC2 image and EBS (Elastic Block Storage) connected to it, with the following functionality:

**Scope**

Project delivery is a web service (soap/rest), with the following capabilities:

1. Store a list of ~3M articles in Hebrew (utf8). Each article is up to 2000 words long, most are less than 500 words. Each article will be added by a single API call. Coder will not need to bother with the text, only with API needed to add articles, see below.

2. Each article will have a unique ID (integer).

3. Ability to perform a full text search for keywords and give back a list of IDs where these keywords appear.

4. Simple api (rest/xml) for search, add, edit, delete and view articles.

**Technology**

All software should be open source. Preferred scripting language is PHP, but this is not a restriction.

Full text search must use a **dedicated full text search** engine: Apache-Lucene, Sphinx, MySQL Full Text, etc., as this server may have many concurrent users querying for text.

## Deliverables

Coder must have experience with full text searching and indexing. It is not similar to standard database querying ("WHERE searched_colum LIKE "%keyword%") - this is something that should be avoided by all means!

Here is an example of Hebrew text phrase (written right to left), followed by keywords. When searching for these keywords - the result should always be positive (true).

Phrase:

????? ?????? ?? ?????? ?????? ?????? ???, ????? ????? ??????? ???? ?"? ?????? ?? 6 ?????? ????. ???? ???? ????"?, ??? ????

Keywords:

????

?????

??????

?"?

???"?

"??? ????"

Please note that the last three keywords contain double quotes. The first two are synonyms - the double quote is located before the left-most letter in Hebrew synonyms. The last one is an exact phrase search (ie - give me all articles where "??? ????" appears as a unit, but not articles that only ??? or ???? appear.

Amazon Web Services Database Administration Engineering MySQL PHP Project Management Software Architecture Software Testing SQL

Project ID: #3943628

About the project

Remote project Active Jun 30, 2009