You have chosen to sponsor your bid up to a maximum amount of .
I require code written in any of the languages specified in the skillsets of this project. The code will find all occurrences of a series of bytes that occur in one file for their existence in a selection of other files.
the code will;
- Analyze the bytes of file A (lets call this the source)
- Analyze the bytes of files B,C,D,E and F . there may only be one file, or there could be an unlimited number based upon filesystem directory contents (lets call this the target).
- Starting in 4096byte chunks, check for any occurrence of a string of bytes from file A that exist in file B (starting 'chunk' size should be configurable)
- Divide chunk size by 0.5 and repeat
- Repeat until chunk size is 128 bytes (end chunk size should be configurable)
For example, lets say (in a very, very simple ascii representation of bytes) file A is
and file B is
then we would expect the code to return a match as 'sdfjfds' exists in both files. When this match is returned, the code will give the start byte, end byte and the byte length of the match as well as indicating the files (source and target) that the match was found in. The code will find all occurrences, so will not exit after finding the first match. it must find all matches and permutations possible from max chunk size to min chunk size.
id prefer this code in perl or java, but PHP or C are just as acceptable if you have good feedback.
Additional Project Description:
12/24/2012 at 10:54 CST
I should clarify, the 'chunks' need not start from byte 1. Im just using chunks as an example, my main concern is finding out if any section of binary data from the source file exists within the target somewhere.