I am doing a measurement study.
A web-crawler is needed to be designed to collect data from YouTube.
Then I will do some simple analysis to observe the relationship of the data in a social network aspect.
(e.g. people in US like watching NBA games while those in UK like football, girls like cosmetics and boys like sports) << some kind like this
So, I need a web crawler that can crawl information like:
-No. of Views
-No. of Ratings
-No. of Comments
I think the above meta-data can be extracted from YouTube API and some of these are needed to scrape from the video's webpage.
Also, the crawler needs to retrieves information on YouTube users such as their personal information(nationality, gender, age, etc for relationship analysis),the number of uploaded videos and friends of each user from the YouTube API.
I wish the web crawler should run by breath first search.
I wish the crawling activitity will operate regularly say 5 days.
I wish the crawler can update the statistics of previously found videos to study the growth trend of the video popularity. (only retrieve the no. of views, ratings and comments as well as the rating for relatively new videos)
I wish a [url removed, login to view] will be generated after each crawl indicating the start and finish time, depth of the crawl and correspoding [url removed, login to view] videos and time used for each depth of crawling.
As this is a personal study and not for commercial, I do not expect to pay much on the crawler.