Full text search on Amazon EC2
$100-500 USD
Paid on delivery
I need a running Amazon EC2 image and EBS (Elastic Block Storage) connected to it, with the following functionality:
**Scope**
Project delivery is a web service (soap/rest), with the following capabilities:
1. Store a list of ~3M articles in Hebrew (utf8). Each article is up to 2000 words long, most are less than 500 words. Each article will be added by a single API call. Coder will not need to bother with the text, only with API needed to add articles, see below.
2. Each article will have a unique ID (integer).
3. Ability to perform a full text search for keywords and give back a list of IDs where these keywords appear.
4. Simple api (rest/xml) for search, add, edit, delete and view articles.
**Technology**
All software should be open source. Preferred scripting language is PHP, but this is not a restriction.
Full text search must use a **dedicated full text search** engine: Apache-Lucene, Sphinx, MySQL Full Text, etc., as this server may have many concurrent users querying for text.
## Deliverables
Coder must have experience with full text searching and indexing. It is not similar to standard database querying ("WHERE searched_colum LIKE "%keyword%") - this is something that should be avoided by all means!
Here is an example of Hebrew text phrase (written right to left), followed by keywords. When searching for these keywords - the result should always be positive (true).
Phrase:
????? ?????? ?? ?????? ?????? ?????? ???, ????? ????? ??????? ???? ?"? ?????? ?? 6 ?????? ????. ???? ???? ????"?, ??? ????
Keywords:
????
?????
??????
?"?
???"?
"??? ????"
Please note that the last three keywords contain double quotes. The first two are synonyms - the double quote is located before the left-most letter in Hebrew synonyms. The last one is an exact phrase search (ie - give me all articles where "??? ????" appears as a unit, but not articles that only ??? or ???? appear.
Project ID: #3943628