The main file is called [url removed, login to view]
I am trying to get it to create a new table with two fields: urlid and title. The name of this new table should be titles. I then want to output (this is the part that you worked on before) the frequencies for each URL. However instead of the url being in the first column of the output, I want to have the title being the first column of the output. Also, I want the output to be a tab delimited file (before you created a csv file).
I added lines 528 and 424 to try to get the table written but it is not working properly.
I am running the program by running the lines:
from searchengineWithTitleTable import *
pagelist=['[url removed, login to view];num=5&hl=en&ctz=-540&c2coff=1&as_epq=&as_eq=&as_drrb=q&as_qdr=a&as_mind=1&as_minm=1&as_miny=2000&as_maxd=17&as_maxm=3&as_maxy=2008&lr=lang_en&safe=active&ie=UTF-8&q=backache+OR+backpain&ui=blg&sa=N&start=800' ]
webcrawler = crawler("test.db")
[url removed, login to view]()
[url removed, login to view](pagelist)
Can you get this done for $25? I am sending the file callled searchengineWithTitleTable with this mail.
PS. I think there is a dependency on something called nn. You can remove references to that because that part has nothing to do with what I am doing. It is a neural network I think and is used for analysing the results.
two small addtions?
One is to make a "switch" in the output file as to whether to include
the field that contains the title and whether to include the field
with the URL in the output file and that both of these fields have
their field names at the top of their respective columns. So the
column with the urls will be called webaddress and the column with the
title will be called Titles and there is a switch in the output file
that will allow me to write one, both, or none of these to the output
Secondly, that there is a switch in the output file that allows me to
choose to between outputting a csv file or a tab delimited file.