Closed

File parser to extract information and create report

This project was awarded to ShawarmaMan for $200 USD.

Get free quotes for a project like this
Employer working
Awarded to:
Project Budget
$50 - $400 USD
Total Bids
37
Project Description

Below is the description of the program I want to be implemented.

############### The algorithm ##########################
Initialization phase: Load the attributes from the “[url removed, login to view]” to the memory.
Repeat every “parsing_interval” seconds
Copy the next “oldest” file (say “[url removed, login to view]”) from the “inbox_dir” directory to the “work_dir” directory (the oldest file can be determined from the file name; the file name convention is described below)
Unzip the [url removed, login to view] file (located in the “work_dir” directory). It will result in a text file, say “[url removed, login to view]”
Parse the “[url removed, login to view]” as described in the section “Parsing Procedure below” (and add parsing results to the report file).
Delete both “[url removed, login to view]” and “[url removed, login to view]” from the “work” directory.

############## The Parsing Procedure #####################
While not EOF do
Read next line from the file “[url removed, login to view]”
If the value of the “Primary attribute” is equal to ANY of the possible values listed in the “[url removed, login to view]” then do the following
If the value of of each secondary attribute is equal to ANY of the possible values listed in the "[url removed, login to view]" then
Get ALL the values associated with the secondary attributes (loaded from the [url removed, login to view] )
Append the primary attribute name, primary attribute value, and all secondary attribute names and values to the “report_file” (If the report_file is not present, then create a new one)

############## The Config File ##########################
# time interval for parsing the next file in seconds
parsing_interval: 60

# Report lifespan in days
Report_life: 30

# Directory paths
Inbox_dir: C:\Users\jsmith\Documents\inbox_dir
Work_dir: C:\Users\jsmith\Documents\work_dir
Report_dir: C:\Users\jsmith\Documents\report_dir

# Primary Attribute
Attribute2: value2_1, value2_2, value2_3

# Secondary Attribute Lists
Attribute1: Value1_1, Value1_2
Attribute3: Value3_1, Value3_2, Value3_3

############### Important notes ##########################
1) The “gzip” files are very huge (It could be 400MB) so the code should be scalable
2) The code should run on both Windows and Unix/Linus environments.
3) The files in the “inbox_dir” will not be deleted or moved.
4) The files in the “inbox_dir” have the following naming convention: YYYYMMDD-SEQNO_*.gzip.

############ Sample contents of the “[url removed, login to view]” file ########################
timestamp attribute1:value1 attribute2:value2 attribute3:value3 attribute4:value4 attribute5:value5 attribute6:value6 ..... attributen:valuen
mestamp attribute1:value1 attribute2:value2 attribute4:value4 attribute5:value5 attribute6:value6 ..... attributen:valuen
timestamp attribute1:value1 attribute2:value2 attribute3:value3 attribute4:value4 attribute5:value5 attribute6:value6 ..... attributen:valuen
timestamp attribute1:value1 attribute2:value2 attribute4:value4 attribute5:value5 attribute6:value6 ..... attributen:valuen
timestamp attribute1:value1 attribute2:value2 attribute3:value3 attribute4:value4 attribute5:value5 attribute6:value6 ..... attributen:valuen
timestamp attribute1:value1 attribute2:value2 attribute3:value3 attribute5:value5 attribute6:value6 ..... attributen:valuen
timestamp attribute1:value1 attribute2:value2 attribute3:value3 attribute5:value5 attribute6:value6 ..... attributen:valuen
timestamp attribute1:value1 attribute2:value2 attribute3:value3 attribute4:value4 attribute5:value5 attribute6:value6 ..... attributen:valuen
##########################################################
1) Please let me know if you have any questions. I’ll send you a sample gzip file via private message so that you can test the application on real data.
2) If you respond to this request, please tell me the language you'll use and the approximate timeframe you'll need to do this and the approximate cost and I'll send you more details.
Please let me know if you have any questions.

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online