This project is a proof of concept of a modern data processing engine that will bring to ingest, explore, prepare, manage, and serve data for immediate BI and machine learning needs. Currently, it is restricted to personal usage only.
The purpose of the modern data processing engine is to bring modern data warehousing capabilities with reasonable pricing ease.
• At the moment, the system is powered by a spark engine and other components.
• Functional Requirements: The system connects to the following data sources.
MS SQL Server
• The system has query engine capacity using SQL which allows running federated queries on the above-mentioned data sources except for web API.
• The system has integrated data processing functionality that allows users to use Apache Spark to process data from the above-mentioned data sources.
• The system has directed acyclic graph capability using Air flow.
• The system provides a search facility that will allow full text searching of all data in the memory of the engine that the user is permitted to access. The system must support the following searches:
find all words specified.
find any word specified.
find the exact phrase.
• The software is built on a framework or architect which lets it run on distributed systems.
• Secure impersonation inside its execution model for added security and offers data encryption and masking services.
• The system has connectors to share the output to third-party BI tools such as Power BI, or Looker.
• Experienced architecting and developing software for scalable, distributed systems.
• Understanding the current architecture of the application and assisting in scaling the architecture cost effectively.
• Develop orchestration in the application.
• Enable users to schedule jobs on the basis of time and event.
• Containerization/packaging of the application so the application is not reliant on any local systems.
• Cost analysis of current configuration & architecture of the application.
• Comparative analysis of applications with various configuration with the same architecture.
• Writing efficient and modular code.
• Experience with cloud technologies and distributed system servers.
• Ability to facilitate demonstrations, proof of concepts.
• Deep understanding of spark application.
• Able to expertly convey ideas and concepts to others.
• Understanding of the public cloud market and pain points driving enterprise cloud adoption.
• In-depth understanding and the ability to demonstrate expertise in designing, deploying, and maintaining custom enterprise web applications.
• Prepare a high-level PowerPoint presentation and detailed word document of the application on completion of the project.
Nice to Have
• Strong knowledge of Python Machine Learning standard libraries.
• Strong understanding of all commonly used Machine Learning models and the main algorithms that compose the models.
• Good understanding of the built-in data types. (Lists, dictionaries, tuples sets).
16 freelancers are bidding on average $2597 for this job
Hi, I’m Senior Data Architect working for Globant company with having many Hadoop ecosystems like HDFS, Spark, Hive, ETL, Azure etc Please contact me for more details