The website from which you can get to all the data that needs scraped is at http://www.pba.com/Seasons/.
I want the data from the 2002-2003 season to the 2011-2012 season. (This excludes the world bowling tour and the seniors). But includes all of the seasons for both men and women.
After clicking on any of the seasons you will see a list of tournaments from that year. Click on any of the tournaments. On the left you will see you will see the heading "Results." Underneath the heading Results their is link name "Tournament Scores." Note that for some of the tournaments their is no link named "Tournament Scores." The tournaments that do not have this link have no data that is useful to me. (For example for the 2002-2003 season only 18 of the 21 tournaments have the link.) Click on the "Tournament Scores" link. There is then a bar near the top of the screen that has round, bracket, Current Round, Rnd Avg LEaders, etc. Click on round or hold your curser over round. Underneath round there is usually qualifying rounds, round robin rounds, rounds of #, stepladder rounds, quarterfinals, semifinals, and final. The rounds of #, stepladder rounds, quarterfinals, semifinals, and final are head on head play and will be useful, were as the qualifying rounds and round robin rounds will not be. Click on one of the rounds of #, stepladder rounds, quarterfinals, semifinals, or final. A screen with a picture of two guys along with their score in a certain number of games will appear. This in not the information I want. At this point click on "Frame By Frame." This now contains the information I want. Below I will explain the specific data that I want scrapped from each of these pages. It should be noted that when you are on a rounds of #, quarterfinals, or semifinals round and you click on "Frame By Frame" that you only get the first heads up set in that round. In order to get the other heads up sets you will have to click on the matched players at the top of the screen. For a round of 32 there will be 16 matched players and there would be 16 screens that would need to be scrapped. Also the end of the URL will say RoundGroup=1 for the first heads up set and RoundGroup=16 for the last heads up set.
I haven't scrapped very much data before, but I think the below URL will be the URL that will be looped through to go to each page that you need. The RoundID=# is different for each round and the RoundGroup=# is different for each of the heads up sets that are played with in each of the rounds.
Lastly here is the information that I need from each page.
For each row of data I need season, whether it is a men or women's tournament, tournament name, round number (RoundID), game number (this runs from 1 to 7), first bowler's name (i.e. the bowler on the top in the pictures), second bowler's name (i.e. the bowler on the bottom in the pictures), the first bowler's score for each frame (this will be 10 different variables), the second bowler's score for each frame, what the first bowler bowled for each roll of the bowling bowl (this will be 21 different variables, 1 for the first part of frame 1, then 1 for the second part of frame 1, then 1 for the first part of frame 2, and so on until the last three are for the three parts of frame 10. An example of this is 9/ .X .X .X 8/ 9/ 62 .X 9/ XX9 where the number is how many pins were not down, / is a spare and .X means only on bowl occurred in that round and it was a strike.), and what the second bowler bowled for each roll of the bowling bowl.
If there are any questions let me know.