quality web filter algo - index and classify

CLOSED
Bids
24
Avg Bid (USD)
$4325
Project Budget (USD)
$3000 - $5000

Project Description:
Hi
We need an algorithm that in real time scan any web page URL that is not in our
"Bad" List Db.
Your Algo

Needs to scan page content on the fly based on a set of search criteria( the heart fo your algo)

for example: You need to detect an adult web site


So you need to build and algo that can tell this is an adult site and classify page as adult


Idea: Scan page title, URL, Textual content, Images( some idea to define and read no Adult Images)
U can see how safe search google, bing classify bad images..So basically you look at the "Strict" level
and any URl of Web page of web site you can classify site in real time if It is an Adult Site or not Adult Site


We want to something close to [url removed, login to view]


Your algo can work in 2 modes: as an offline crawaler that index and build for our company
all Adult web sites out there in DB under category adult. Run daily and add more adult web sites it finds


and also works in real time when user search for a URL- before we bring URL(if not in blocked list) we send your Algo
the URL and u preform some quick analysis to decide if Adult site or not
if adult - you send BLOCKED message to our calling script and also push the URL to the DB your crawler build offline on a daily basis

We want to build like that quickly massive adult database for quick access cashing without calling your algo in real time
so we can bring faster results.

Later we wnat to do this for 10 more categories like Gambling, P2p, Social Networking, etc


But this project is for [url removed, login to view]% Adult detection including images and videos (like safe search results )
on page in real time and as offline process


We want someone who can truly gets it and build something great


Give us some thoughts , method, time to create and bid




Thanks!

related ideas/resources:


P.s. Possibly to use gOOGLe APi for indexing if helps- suggestion

Bayesian filtering - optional to implement

[url removed, login to view]

Wikipedia entry on Bayesian classifiers
[url removed, login to view]

we prefer it is written in JAVA or Perl

Skills required:
Algorithm, Java, PHP, Software Architecture, Web Security
About the employer:
Verified
Public Clarification Board
Bids are hidden by the project creator. Log in as the employer to view bids or to bid on this project.
You will not be able to bid on this project if you are not qualified in one of the job categories. To see your qualifications click here.


$ 5150
in 60 days
$ 5000
in 60 days
$ 4000
in 45 days
$ 4120
in 50 days
Hire softservicesvw
$ 4515
in 60 days
$ 4413
in 50 days
Hire zeke
$ 5000
in 30 days
Hire abhay78
$ 4725
in 90 days
$ 3300
in 40 days
$ 4429
in 42 days