Cancelled

Database architecture assistance/implementation for vertical search

We need a database solution to handle storage, querying and a sort of faceting for the following structure of documents:

[STORAGE]

Document = {

title: 'Document title',

content: '500-word paragraph here'

document_intervals: [

{start_date: "07/31/2013", end_date: "08/15/2013"}

],

tags: [

{

tag_id: 123,

tag_intervals: [

{start_date: "01/31/2013", end_date: "07/15/2013"},

{start_date: "06/25/2013", end_date: "11/30/2013"}

],

match_country: [Romania, France, Spain]

},

{

tag_id: 146,

matching_intervals: [] // this means TAG 146 matches this documents regardless of date

}

]

}

[SEARCHING]

The user should be able to search by TAG, TIME INTERVAL and fulltext search on CONTENT. Examples:

- "Show documents tagged with TAG 146"

- "Show documents that are tagged with TAG 123, but only if interval ("03/10/2013" - "03/10/2013") overlaps the tag's TAG_INTERVALS"

- "Show documents where interval ["02/01/2013" - "03/01/2014"] overlaps at least one of the DOCUMENT_INTERVALS"

- "Show documents matching "some words" inside CONTENT"

[NARROW SEARCH]

Once the documents are retrieved, we also need to show matching tags and dates, so the user can narrow his search (faceted search, but we don't necesarily need the item count for each tag)

Considerations for faceting:

- If the user's query contains a date interval, then, for the "narrow-your-search" menu we would only pull out tags that match the desired interval.

- Based on the DOCUMENT_INTERVALS of the matched documents we need to also show available days/months for further narrowing (which effectively means enabling/disabling clickable dates on a calendar).

- Even before any searching is done, a selection of available tags should be displayed for the user to start his search.

[TAG AUTOCOMPLETE]

There are about 2 million tags in the database. We also need a solution for the bi-lingual auto-complete search bar.

- the autocomplete only takes tags into consideration (all the suggestions are tags). If possible, only tags that actually match documents would appear.

- each tag may have aliases. "Alexandre Dumas", "Al. Dumas", "Dumas", "Alexander Dumas" would be added as aliases so that no matter how the user types the name, the autocomplete will show "Alexandre Dumas".

[YOUR JOB]:

- recommend (and possibly set up appropriate database engine(s))

- explain the data structures and algorithm(s) we have to implement so that search works as expected (we're fast learners and good students)

OR

- explain and implement this yourself (and also help out integrating your work in our existing system)

- make sure the proposed solution is scalable and fast

[NOT YOUR JOB]:

- user interface

- admin control panels for adding/updating documents and assigning tags

- visual design of any kind

- matching documents with tags

[ the above are NOT your job, just to be clear. they will be done by other awesome people, like yourself, but with a different skill set, we need you focused on the hard-core database stuff ]

As you can see, this can either be

- a consulting job (if you're a very experienced search/database guru that doesn't want to actually write code, just share your awesomeness)

- a database design/coding job (if you love PHP, you want to be directly involved in the coding and you're a fast, efficient coder)

Skills: Big Data, Database Administration, MySQL, NoSQL Couch & Mongo, SQL

See more: faceted, word matching algorithm, word match algorithm, where is guru, we search you, types of searching in data structure, types of searching algorithm, types of data structures in c, types of data structures, types of algorithm in data structure, types of algorithm, types data structures, storage guru, storage data structures, sort algorithm, solution guru, set data structures, search structures, searching in data structure, searching data structures, searching data structure, searching algorithm in data structure, searching algorithm in c, searching algorithm, search guru

About the Employer:
( 2 reviews ) Bucharest, Romania

Project ID: #4191895