1. There are a series of json file each around 15GB contains (IP+port)
2. There are series of json files each around a GB contains (traceroutes)
3. There are 3 group of csv files, each group contains 30 csv file
1. Load all the Json (1) in a MySQL table and delete the duplicates. (tbl_ports)
2. Load all the Json (2) in a MySQL table as they are traceroute IP's, Ignore the first two hop and last hop and store the rest in a SQL table. (tbl_tracceroutes)
3. Load each group of CSV file in a seprate table in MySQL and and add new column to this table and write the name of the csv file. (Do this step for the two rest of csv files group)
group1(tbl_xxx) - I will tell you the exact name when you start working
group2(tbl_yyy) - I will tell you the exact name when you start working
group3(tbl_zzz) - I will tell you the exact name when you start working
1. in tbl_zzz, tbl_yyy, tbl_xxx there is a column contains subnetmask(IP+Cidr), take each of that (IP+Cidr) and look on tbl_ports on column IP, if any IP there matches to any (IP+Cidr) of tbl_zzz, you do this process for all of (IP+Cidr) for table tbl_zzz.
Create a new table (report_zzz), take some column from tbl_zzz (I will tell you which columns) and add a new column (port). for the ones that you find match write in report_zzz that row of that plus the port number you find for that (IP+Cidr).
** DO this process for tbl_xxx and tbl_zzz exactly the same way.
2. from the table report_zzz take the (IP+Cidr) data and look if any IP in tbl_traceroute match to that subnet, add new column to report_zzz named as (seen), if you find match write yes and if you don't find match write no.
** DO this process for report_xxx, report_yyy and report_zzz.
As it is relatively big data, it should be fast enough
26 freelancers are bidding on average $195 for this job
Hi, I have a lot of experience optimizing databases to makes searches quicker as well as filling databases faster. I have lots of tips which can make the whole process a lot faster