quality web filter algo - index and classify

Avg Bid (USD)
Project Budget (USD)
$3000 - $5000

Project Description:
We need an algorithm that in real time scan any web page URL that is not in our
"Bad" List Db.
Your Algo

Needs to scan page content on the fly based on a set of search criteria( the heart fo your algo)

for example: You need to detect an adult web site

So you need to build and algo that can tell this is an adult site and classify page as adult

Idea: Scan page title, URL, Textual content, Images( some idea to define and read no Adult Images)
U can see how safe search google, bing classify bad images..So basically you look at the "Strict" level
and any URl of Web page of web site you can classify site in real time if It is an Adult Site or not Adult Site

We want to something close to 99.999

Your algo can work in 2 modes: as an offline crawaler that index and build for our company
all Adult web sites out there in DB under category adult. Run daily and add more adult web sites it finds

and also works in real time when user search for a URL- before we bring URL(if not in blocked list) we send your Algo
the URL and u preform some quick analysis to decide if Adult site or not
if adult - you send BLOCKED message to our calling script and also push the URL to the DB your crawler build offline on a daily basis

We want to build like that quickly massive adult database for quick access cashing without calling your algo in real time
so we can bring faster results.

Later we wnat to do this for 10 more categories like Gambling, P2p, Social Networking, etc

But this project is for 99.99% Adult detection including images and videos (like safe search results )
on page in real time and as offline process

We want someone who can truly gets it and build something great

Give us some thoughts , method, time to create and bid


related ideas/resources:

P.s. Possibly to use gOOGLe APi for indexing if helps- suggestion

Bayesian filtering - optional to implement


Wikipedia entry on Bayesian classifiers

we prefer it is written in JAVA or Perl

Skills required:
Algorithm, Java, PHP, Software Architecture, Web Security
About the employer:
Public Clarification Board
Bids are hidden by the project creator. Log in as the employer to view bids or to bid on this project.
You will not be able to bid on this project if you are not qualified in one of the job categories. To see your qualifications click here.

$ 5150
in 60 days
$ 5000
in 60 days
$ 4000
in 45 days
$ 4120
in 50 days
Hire softservicesvw
$ 4515
in 60 days
$ 4413
in 50 days
Hire zeke
$ 5000
in 30 days
Hire abhay78
$ 4725
in 90 days
$ 3300
in 40 days
$ 4429
in 42 days