quality web filter algo - index and classify

This project received 24 bids from talented freelancers with an average bid price of $4325 USD.

Get free quotes for a project like this
Project Budget
$3000 - $5000 USD
Total Bids
Project Description


We need an algorithm that in real time scan any web page URL that is not in our

"Bad" List Db.

Your Algo

Needs to scan page content on the fly based on a set of search criteria( the heart fo your algo)

for example: You need to detect an adult web site

So you need to build and algo that can tell this is an adult site and classify page as adult

Idea: Scan page title, URL, Textual content, Images( some idea to define and read no Adult Images)

U can see how safe search google, bing classify bad images..So basically you look at the "Strict" level

and any URl of Web page of web site you can classify site in real time if It is an Adult Site or not Adult Site

We want to something close to [url removed, login to view]

Your algo can work in 2 modes: as an offline crawaler that index and build for our company

all Adult web sites out there in DB under category adult. Run daily and add more adult web sites it finds

and also works in real time when user search for a URL- before we bring URL(if not in blocked list) we send your Algo

the URL and u preform some quick analysis to decide if Adult site or not

if adult - you send BLOCKED message to our calling script and also push the URL to the DB your crawler build offline on a daily basis

We want to build like that quickly massive adult database for quick access cashing without calling your algo in real time

so we can bring faster results.

Later we wnat to do this for 10 more categories like Gambling, P2p, Social Networking, etc

But this project is for [url removed, login to view]% Adult detection including images and videos (like safe search results )

on page in real time and as offline process

We want someone who can truly gets it and build something great

Give us some thoughts , method, time to create and bid


related ideas/resources:

P.s. Possibly to use gOOGLe APi for indexing if helps- suggestion

Bayesian filtering - optional to implement

[url removed, login to view]

Wikipedia entry on Bayesian classifiers

[url removed, login to view]

we prefer it is written in JAVA or Perl

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online