Proof Of Concept of a distributed data optimization software.
Goal of this project is to evaluate time and complexity to set-up base architecture for a distributed architecture of process working on the same dataset.
Dataset is flexible, may be one table with unknown size or one extract of a database with 4-10 tables of unknown sizes.
3 processes are needed in the definitive version but intermediate version may be composed from only a controller and multiple workers, proxy is needed only when the load is really heavy.
Write a working controller and worker with network communication as follow:
Controller start and load data from XML(url), controller wait for workers to connect to him.
Worker starts and connects to url or IP and establish a permanent connection. He receive the complete data plus more data like: his id on the network, total number or workers,... First worker receive id 1, second receive id 2 etc … start to work on it.
When a worker has found one optimization, he sends his optimization back to the controller (original subset of data to change and new optimized data subset). The controller check that data received from the worker is better than him, commit change and send back all data to the workers. If data is not validated, commit is not done and only the workers sending his update receive updated data.
When a worker has done his work, he ask updated dataset. If dataset is the same, he wait that controller confirms the work is done (when all data is fully optimized)