We are working on a reporting tool. At the moment we are able to query only 1 table from single source at a time.
We plan to use Apache Spark to do data fusion on multiple data tables from multiple sources.
For example one table can be PostgreSQL & other can be on MySQL. We should be able to do data joins efficiently without having need to move big result sets over the network (any ideas like MapReduce or something similar can be considered)