I need a simple script that parses a file and counts some occurrences of keyword.
The file is very large (millions of lines) so the script should use readline for a file object or some other way, to avoid loading the whole file into memory all at once
The file contents are sections, with a single title line and 1 or more lines under each section title
section title lines (and only them) start with “@@@$”
sample file contents:
@@@$section_title_one
apple strudel
chocolate
@@@$section title 2
ice cream pie
@@@$another title
some
more
lines
the script should contain hardcoded a set of keywords, for developing it use kws = [“apple”,”banana”,”candy”]
the script should count how many lines in each section contain 1 or more keywords. and print out the titles of all of the sections which have at least 5 occurrences, ordered descending by number of occurrences, in the following format
[number of occurrences] - [section title]
e.g.
25 - section_title_ten
23 - title of some other section
....
the budget for this is $10, not more.
with your bid, at the top of the private message, please indicate in how many hours from the time I accept your bid can you deliver this.
thanks
HERE IS YOUR SOLUTION:
Using python version 2.7 i would solve this like that:
I can adjust my solution to another python version if you like
#!/usr/bin/python2.7
from collections import OrderedDict
# ADJUST FILE NAME HERE
source_data_filename = "[login to view URL]"
# ADJUST KEYWORDS HERE
kws = ["apple", "banana", "candy"]
res = {}
with open(source_data_filename) as f:
for line in f:
if [login to view URL]("@@@$"):
current = line[4:]
current = [login to view URL]('\r\n')
continue
for kw in kws:
if kw in line:
[login to view URL](current, 0)
res[current] += 1
result = OrderedDict(sorted([login to view URL](), key=lambda k: k[1], reverse=True))
for item in [login to view URL]():
if item[1] > 4:
print "%s - %s" % (item[1], item[0])
$10 USD in 0 day
5.0 (1 review)
0.0
0.0
25 freelancers are bidding on average $16 USD for this job
@@@$
Hi,
I am well equip to this kind of task, can handle it pretty well. In fact, I already done related to this job before. Let me know the best of your time so we can discuss further based on your requirements and we can move forward to the next step.
Thanks,
Joseph C Ocero
1-Day
Hi there!
I'm a Professional Software developer and Data Analyst for a well known Software House in Islamabad named PacSquare Pvt Ltd. If I get this job, I assure you that you'll get my best services in return. Please consider my proposal, and discuss additional Info for this specific Project. Thanks for Reading!
Best,
Maher
p.s. Budget is fixed, Please discuss if you agree. Thanks
Hello,
No. of hrs: 10 hrs.
I am having 10+ yrs of experience in software programming with various technologies like c++, .net, python and Perl. With expertise in various phase of development, I believe i can provide you the required script with ease.
Looking forward to hear back from you.
Thanks.
Sandip
Hello,
I am very intrested to do this project for you. I am studing software engineering and I have couple of years expirience in this feild, so it wont take much time to do it. This is very important to me because this is my first project and I would like to have a positive review on start. As you can see my bid is pretty low, that is because I really need this first job on freelancer and I would be very grateful if you could help me with that. I will do my best to satisfy your every need.
I can do it in 2-3 hours in total
Looking forward to working with you.
Update:
Hello, I was free to finish my solution for your project. It would be great if you could look it up and let me know what do you think. I can send you part of my main code and if you like it i will send you the whole code immediately.
Also, I have a suggestion. A code can be implement to ask from user to input keywords and then counts their occurrences.
As i said, I am new on this site and I need an opportunity to prove myself.
Thank you very much for understanding.
Hello Sir/Ma'am,
Your script is ready... I used line by line file reading. so no memory wastage. Line number - title are stored in dictionary. Right now it prints on console.... but if you want in text file, let me know.
Thank You..
Can give flexibility to choose the headers.
can give flexibility to choose number of occurrences.
can give ability to use more then one file as input.
can give ability to supply keywords as part of script or a different file.
can complete in 5 hours from the time project is awarded (with out considering delay in viewing the decision)
Hi,
I have 4 years of experience in python. I can compete this script in 2 - 3 hrs. I have the alghorithm ready if you are interested.
for loop: Read the file line by line.
search for string'@@@$'
for loop:read line and append to list till you find '@@@$'
count the number of occurence of the keyword
if: greater than 5
print output
exit(this continues for all sections)
Please let me know if we can work out it.