Pyspark jobs
As a senior developer in our team, you will be instrumental in both web application development and da...provide users with a personalized and efficient way of viewing key metrics and performance indicators. - Role-based Access Control to ensure that sensitive information is accessible only to authorized users. - Data Export Capabilities to allow users to easily export data in various formats for further analysis or reporting. Ideal Skills and Experience: - Extensive experience in Django, Python, and PySpark - Proven track record in web application development - Expertise in creating data processing solutions - Experience in developing custom business tools The project needs to be completed in a medium-term time frame. The developer will be highly involved in the design and planni...
I'm looking for a professional with substantial experience in setting up Azure Databricks. The main goal of this project is to establish a simple, yet efficient Databricks setup on Azure. A single node is sufficien...substantial experience in setting up Azure Databricks. The main goal of this project is to establish a simple, yet efficient Databricks setup on Azure. A single node is sufficient for this task. Once the setup is complete, I need you to run a sample program using PySpark. This will validate the setup and ensure everything is functioning correctly. The primary purpose of this Databricks setup will be focused on data processing and ETL. Ideal skills for this job include: - Extensive knowledge in Azure and Databricks - Proficiency in PySpark - Experience in d...
I'm looking for an experienced Databricks and PySpark developer to build a simple function that can retrieve data from a csv file in SharePoint and load it into a Databricks DataFrame. The function should take parameters such as SharePoint path, file name, and format, and return a DataFrame with the loaded data. Key Requirements: - The connection between Azure Databricks and the SharePoint site must be configured correctly. - Configuration of security, secrets, network settings, and/or service principles will be necessary. - The function and its configurations must work seamlessly in my corporate environment. - All configurations should utilize Service Principals for security or Oauth. Network Settings: - The function should be compatible with my current use of a Virtual ...
I'm currently engaged in a data engineering project and I need assistance with data transformation and ETL tasks. A significant portion of this project involves building and designing Directed Acyclic Graphs (DAGs) in Apache Airflow. Ideal Skills: - Proficiency in Python and Pyspark - Extensive experience with AWS services, particularly Glue, Athena, and S3 - Expertise in workflow automation using Airflow - Strong understanding of data transformation and ETL processes The selected freelancer will play a crucial role in ensuring the smooth operation of my project. Your expertise will help facilitate efficient data processing and workflow automation.
I'm seeking a seasoned Data Engineer with over 7 years' experience, who can manage and govern our data using Unity Catalog. The engineer will need to seamlessly integrate their work with our fully built-out data architecture. Ideal Candidate Should Have: - Strong expertise in Azure Data Factory (ADF), Azure Databricks, and PySpark. - Proficient in SQL, Azure DevOps (ADO), GIT, and has a basic understanding of PowerBI. - Over 2 years' practical experience with Unity Catalog.
Hi, This is a job support role. Mostly you will be working for 2 hours on a daily basis with the developer on zoom call Please confirm following - Early morning est 7 am to 9 am ist Daily 2 hours, Zoom call budget approx 400 /hr You will do an initial connect to get the work understanding. Billing will be started when you feel you are comfortable...role. Mostly you will be working for 2 hours on a daily basis with the developer on zoom call Please confirm following - Early morning est 7 am to 9 am ist Daily 2 hours, Zoom call budget approx 400 /hr You will do an initial connect to get the work understanding. Billing will be started when you feel you are comfortable with the work from the second session. Please confirm. Required skills- Pyspark, SQL, python, aws, Foundry Functions, ...
Hi, This is a job support role. Mostly you will be working for 2 hours on a daily basis with the developer on zoom call Please confirm following - Early morning est 7 am to 9 am ist Daily 2 hours, Zoom call budget approx 400 /hr You will do an initial connect to get the work understanding. Billing will be started when you feel you are comfortable with the work fro...role. Mostly you will be working for 2 hours on a daily basis with the developer on zoom call Please confirm following - Early morning est 7 am to 9 am ist Daily 2 hours, Zoom call budget approx 400 /hr You will do an initial connect to get the work understanding. Billing will be started when you feel you are comfortable with the work from the second session. Please confirm. Required skills- Pyspark, Databricks, snowfla...
Hi, This is a job support role. Mostly you will be working for 2 hours on a daily basis with the developer on zoom call Please confirm following - Early morning est 7 am to 9 am ist Daily 2 hours, Zoom call budget approx 400 /hr You will do an initial connect to get the work understanding. Billing will be started when you feel you are comfortable with the work from the sec...role. Mostly you will be working for 2 hours on a daily basis with the developer on zoom call Please confirm following - Early morning est 7 am to 9 am ist Daily 2 hours, Zoom call budget approx 400 /hr You will do an initial connect to get the work understanding. Billing will be started when you feel you are comfortable with the work from the second session. Please confirm. Required skills- Pyspark, SQL, p...
I'm seeking a PySpark specialist to assist with data processing and ETL tasks. The primary focus will be on optimizing existing scripts to enhance performance and efficiency. Ideal Skills and Experience: - Proficient in PySpark with extensive experience in data processing and ETL - Strong background in script optimization - Familiarity with data handling from SQL Databases, Cloud Storage, and CSV/Excel files - Excellent problem-solving skills and attention to detail
...Architect / Data Engineer Trainer, you will be responsible for delivering engaging and informative training sessions on a variety of data engineering topics. The ideal candidate has over 15 years of experience and is eager to share their expertise to help others grow in this rapidly evolving field. Key Responsibilities: Develop and deliver training sessions on the following topics: Apache Spark PySpark FastAPI Spark NLP Databricks or Snowflake Integrations with cloud platforms (AWS, GCP) Data virtualization (Starburst) Data modeling (Apache Iceberg, Parquet, JSON) Data Lakehouse architecture (Spark and Flink) Apache Airflow Oracle GoldenGate, Informatica Flask Framework, Docker, Kubernetes Pandas Control-M for scheduling MLOps in Data Engineering and Machine Learning Models Da...
hi i am hiring you for the task we discussed on freelancer.com calls.
...know you can do it on my machine too. Before Installation: run the following command in your terminal pyspark --version it shud show an error confirming that the pyspark is not already installed and configured. (hence, you cannot just show me a video of a google colab with pyspark working) After Installation: run the following command in your terminal pyspark --version it will show the spark version 3.5.3 confirming that the pyspark is installed. also, run the following piece of 2-line code: from import SparkSession spark = .appName("PySpark Example") .getOrCreate() it shud run without any errors confirming that the pyspark is configured correctly. if you show me a video like this on your mach...
I need a freelancer to develop a binary classification model that predicts card brands (Visa or MasterCard) based on data from the global BIN database. The model should utilize BIN6 (the first six digits of the card number) alongside other relevant features such as country, terminal, and affiliate for accurate classification. Key Requirements: - Proficiency in Spark, PySpark, and Docker - Extensive experience in machine learning model development - Ability to evaluate model performance based on accuracy - Strong proficiency in Python for data analysis and model development. The dataset provided is clean and ready for use without any need for preprocessing. The ideal candidate will be able to leverage the specified frameworks to create an efficient and effective prediction model. ...
...task is to be solved using Spark Ethical Practices Please submit original code only. All solutions must be submitted through the portal. We will perform a plagiarism check on the code and you will be penalized if your code is found to be plagiarized. Software/Language to be used Python 3.10.x Apache Spark v3.5.3 Apache Kafka v3.7.1 Additionally, the following Python libraries are required: pyspark == 3.5.3 kafka-python == 2.0.2 No other libraries are allowed. Include the following shebang at the top of your Python scripts. #!/usr/bin/env python3 Convert your files to executable using the following command chmod +x *.py Convert line breaks in DOS format to Unix format (this is necessary if you are coding on Windows without which your code will not run on our portal). dos2uni...
*Job Title: Big Data Developer* *Job Description:* We are seeking a skilled Big Data Developer to join our team and work on generating daily job recommendations for our platform. The ideal candidate will have experience with big data technologies and cloud services, particularly within the Oracle Cloud ecosystem. *Responsibilities:* - Develop and optimize PySpark applications for processing large datasets. - Implement machine learning models using PyTorch for recommendation systems. - Manage data storage and processing using Oracle Cloud Infrastructure (OCI) services. - Load processed data into Elasticsearch and MySQL for efficient retrieval. - Automate workflows using scheduling tools like Apache Airflow. - Monitor and improve job performance and resource utilization. *Technolo...
Implement a robust data warehousing solution on Snowflake to enable data-driven decision-making. Integrate data from SAP and other systems into a centralized Snowflake environment which act as a self-service analytics layer to Empower business users to access and analyze data independently. • Developed Data Ingestion Pipelines from SAP to Snowflake tables using Talend. • Creat...requirements. • Deliver tailored data feeds in various formats to support different downstream applications and services. • Designed and Implemented Glue CI/CD pipeline using GitLab. • Migrate existing Talend jobs to AWS Glue for potential cost savings and improved performance. • Developed Snowflake streamlit app for data sharing with business users. Environment: AWS Glue, Snowflake,...
I'm seeking an expert in data analysis using PySpark on AWS. The primary goal is to analyze a large amount of structured data. Key Responsibilities: - Analyze the provided structured data and generate outputs in the given format. - Build classification machine learning models based on the insights from the data. - Utilize PySpark on AWS for data processing and analysis. Ideal Skills: - Proficiency in PySpark and AWS. - Strong experience in analyzing large datasets. - Expertise in building classification machine learning models. - Ability to generate outputs in a specified format.
I am seeking a seasoned data scientist and PySpark expert to develop a logistic regression model from scratch for text data classification using public datasets. Key Requirements: - Build a logistic regression model from scratch(do not use libraries for regression) to classify text data into categories. - Use of Python and PySpark is a must. - Experience with handling and analyzing text data is essential. The model's primary goal will be to classify the data into categories. The successful freelancer will be provided with detailed specifications and project requirements upon awarding. Please, only apply if you have substantial experience in creating logistic regression models and are comfortable working with text data and public datasets.
I am seeking expert-level training in the following technologies, from basics to advanced concepts: Power BI Azure Cloud Services Microsoft Fabric Azure Synapse Analytics SQL Python PySpark The goal is to gain comprehensive knowledge and hands-on experience with these tools, focusing on their practical application in data engineering. If you or your organization provide in-depth training programs covering these tech stacks, please reach out with course details, duration, and pricing. Looking forward to hearing from experienced professionals!
...handling, grouping, sorting, and imputation of data, as well as implementation of advanced data bucketing strategies. The project also requires robust error-handling mechanisms, including the ability to track progress and resume operations after a crash or interruption without duplicating previously processed data. Requirements: Expertise in Python, especially libraries like Pandas, Dask, or PySpark for parallel processing. Experience with time-series data processing and geospatial data. Proficiency in working with large datasets (several gigabytes to terabytes). Knowledge of efficient I/O operations with CSV/Parquet formats. Experience with error recovery and progress tracking in data pipelines. Ability to write clean, optimized, and scalable code. Please provide examples of ...
I'm in need of a professional with extensive PySpark unit testing experience. I have PySpark code that loads data from Oracle ODS to ECS (S3 bucket). The goal is to write unit test cases that will achieve at least 80% coverage in SonarQube quality. You will focus primarily on testing for: - Data validation and integrity - Error handling and exceptions The ideal candidate should: - Be proficient in using PyTest, as this is our preferred testing framework - Have a comprehensive understanding of PySpark - Be able to deliver immediately Please note, the main focus of this project is not on the data transformations that the PySpark code performs (which includes data cleaning and filtering, data aggregation and summarization, as well as data joining and mergin...
I'm seeking a professional with extensive experience in PySpark and ETL processes. The project involves migrating my current ETL job, which sources data from a PySpark database and targets a Data Lake. Key tasks include: - Designing and implementing the necessary PySpark code - Ensuring data is effectively transformed through cleaning, validation, aggregation, summarization, merging and joining. Ideal candidates will have a deep understanding of both PySpark and data transformations. Your expertise will be crucial to successfully migrate this ETL job.
I'm seeking a skilled PySpark expert to assist with data analysis and transformations on structured data. The task involves: - Utilizing PySpark to manipulate and analyze big data. - Writing efficient PySpark code to handle the task. Ideal candidates should have extensive experience with PySpark and a strong background in data analysis and transformations. Proficiency in working with structured data from sources like CSV files, SQL tables, and Excel files is crucial.
I am looking for a professional to design a series of training videos for beginners on Python, SQL, Pyspark, ADF, Azure Data Bricks, and Snowflake. The primary goal of these videos is to teach the fundamental principles and techniques associated with each of these technologies. As such, the curriculum for each technology will need to be developed from scratch, ensuring that it covers all the necessary topics in a clear and engaging manner. Key responsibilities include: - Developing a detailed curriculum for each technology - Creating high-quality video content - Providing thorough explanations in PDF format - Incorporating our logo into each video Ideal candidates should have: - A strong background in IT, with a focus on the technologies listed - A proven track record in creat...
I am looking for a professional to design a series of training videos for beginners on Python, SQL, Pyspark, ADF, Azure Data Bricks, and Snowflake. The primary goal of these videos is to teach the fundamental principles and techniques associated with each of these technologies. As such, the curriculum for each technology will need to be developed from scratch, ensuring that it covers all the necessary topics in a clear and engaging manner. Key responsibilities include: - Developing a detailed curriculum for each technology - Creating high-quality video content - Providing thorough explanations in PDF format - Incorporating our logo into each video Ideal candidates should have: - A strong background in IT, with a focus on the technologies listed - A proven track record in creat...
I need an AWS Glue job written in PySpark. The primary purpose of this job is transforming data stored in my S3 bucket Ideal Skills: - Proficient in PySpark and AWS Glue - Experience with data transformation and handling S3 bucket data Your bid should showcase your relevant experience and approach to this project.
I have a PySpark code that requires optimization primarily for performance. Key requirements: - Enhancing code performance to handle large datasets efficiently. - The code currently interacts with data stored in Azure Data Lake Storage (ADLS). - Skills in PySpark, performance tuning, and experience with ADLS are essential. - Understanding of memory management in large dataset contexts is crucial. Your expertise will help improve the code's efficiency and ensure it can handle larger datasets without performance issues.
...candidate will have hands-on experience in architecting cloud data solutions and a strong background in data management and integration. If you are passionate about working with cutting-edge technologies and have a proven track record in the Azure ecosystem, We would love to hear from you! Key Responsibilities : -Cloud data solutions using Azure Synapse, Databricks, AzureDataFactory, DBT Python, PySpark, and SQL -Set up ETL Pipelines in Azuredatafactory -Set up data model in Azure data bricks / Synapse -Design and manage cloud data lakes, data warehousing solutions, and data models. -Develop and maintain data integration processes. -Collaborate with cross-functional teams to ensure alignment and successful project delivery. Qualifications : -Good understanding of data warehousing...
I'm looking for a professional who's proficient in AWS Glue, S3, Redshift, Azure Data Bricks, PySpark, and SQL. The project entails working on data transformation and integration, data analysis and processing, database optimization, infrastructure setup and management, continuous data processing, and query optimization. The expected data volume is classified as medium, ranging from 1GB to 10GB. Ideal Skills and Experience: - Strong experience in AWS Glue, S3, and Redshift - Proficiency in Azure Data Bricks, PySpark, and SQL - Proven track record with data transformation and integration - Expertise in database optimization and query optimization - Experience with managing and setting up infrastructure for data processing - Ability to handle continuous data processi...
I am looking for a data engineer to help me build data engineering pip...build data engineering pipelines in Microsoft Fabric using the Medallion Architecture. The primary goal of these pipelines is to perform ELT (Extract, Load, Transform). Key Responsibilities: - Design and implement data engineering pipelines via Microsoft Fabric. - Utilize the Medallion Architecture to optimize data flow and processing. - Creating separate workspaces for each layer and lakehouse - Pyspark to write jobs Ideal Skills and Experience: - Extensive experience with Microsoft Fabric. - Strong understanding and experience with ELT processes. - Familiarity with Medallion Architecture. - Able to work with both structured data and Json. - understand how to connect and work across workspaces and lakehouses...
I'm looking for a seasoned Databricks professional to assist with a data engineering project focused on the migration of structured data from cloud storage. Key Responsibilities: - Lead the migration of structured data from our cloud storage to the target environment - Utilize Pyspark for efficient data handling - Implement DevOps practices for smooth and automated processes Ideal Skills: - Extensive experience with Databricks - Proficiency in Pyspark - Strong understanding of DevOps methodologies - Prior experience in data migration projects - Ability to work with structured data Please, only apply if you meet these criteria, and can provide examples of similar projects you have successfully completed.
...and implement a data processing pipeline on Azure - Ensure the pipeline is capable of handling structured data, particularly from SQL databases - Optimize the pipeline for reliability, scalability, and performance Ideal Skills and Experience: - Extensive experience with Azure cloud services, particularly in a data engineering context - Proficiency in data processing tools such as Scala-Spark, Pyspark - Strong understanding of Unix/Linux systems and SQL - Prior experience working with Data Warehousing, Data Lake, and Hive systems - Proven track record in developing complex data processing pipelines - Excellent problem-solving skills and ability to find innovative solutions to data processing challenges This role is suited to a freelancer who is not only a technical expert in clo...
I'm in search of a Azure Data Factory expert who is well-versed in Delta tables, Parquet, and Dedicated SQL pool. As per the requirement, I have all the data and specifications ready. The successful freelancer will need to be familiar with advanced transformations as the ETL complex...expert who is well-versed in Delta tables, Parquet, and Dedicated SQL pool. As per the requirement, I have all the data and specifications ready. The successful freelancer will need to be familiar with advanced transformations as the ETL complexity level is high. It's a plus if you have prior and proven experience in handling such projects. Key Skills Required: - Expertise in Azure Data Factory -pyspark - Deep knowledge of Delta tables, Parquet and Dedicated SQL pool - Familiarity with adv...
I'm looking for a talented Pyspark Developer who has experience in working with large datasets and is well-versed in PySpark above version 3.0. The primary task involves creating user-defined function code in PySpark for applying cosine similarity on two text columns. Key Requirements: - Handling large datasets (more than 1GB) efficiently - Proficient in PySpark (above version 3.0) - Experienced in implementing cosine similarity - Background in health care data is a plus Your primary responsibilities will include: - Writing efficient and scalable code - Applying cosine similarity on two text columns - Ensuring the code can handle large datasets This project is a great opportunity for a Pyspark Developer to showcase their skills in handling big da...
I require a highly skilled AWS data engineer who can provide on-demand consultation for my data processing needs. The project involves helping me manage large volumes of data in AWS using Python, SQL, Pyspark, Glue, and Lambda. This is a long-term hourly consulting job, where I will reach out to you when I need guidance on any of the following areas: - Data Ingestion: The initial process of collecting and importing large volumes of data into AWS. - Data Transformation: The process of converting and reformatting data to make it suitable for analysis and reporting. - Data Warehousing: The ongoing management and storage of transformed data for analysis purposes. Your role will be to assist me in making critical decisions about data architecture and processing, using the tools and lan...
I'm on a quest for an expert in big data, specifically in areas of data storage, processing, and query optimization. The ideal candidate would be required to: _ Need someone who is experienced in PySpark - Manage the storage and processing of my large datasets efficiently. Foremost in this requirement is a dynamic understanding of big data principles as relates to data storage and processing. - Kick in with your expertise in PostgreSQL by optimizing queries for improved performance and efficiency in accessing stored data. - Using Apache Hive, you'll be tasked with data summarization, query, and in-depth analysis. This entails transforming raw data into an understandable format and performing relevant calculations and interpretations that enable insightful decisions. Skil...
I'm looking for an expert PySpark developer to help manage and process big data sets on AWS. The successful candidate will have a strong knowledge of key AWS services such as S3, Lambda, and EMR. Ingest the data from source CSV file to target delta tables Tasks include: - Building and managing large scale data processes in PySpark - Understanding and using AWS services like S3, Lambda and EMR - Implementing algorithms for data computation Ideally, you'll have: - Expertise in PySpark development - In-depth knowledge of AWS services, specifically S3, Lambda and EMR - Proven experience in handling and processing big data - A problem-solving approach with excellent attention to detail. Your experience should allow you to hit the ground running on this data p...
Full Stack PYTHON Developer within a growing tech start-up organisation focused on transforming the professional services industry. Our stack includes Python, React, JavaScript, TypeScript, GraphQL, Pandas, NumPy, PySpark and many other exciting technologies so plenty of scope to grow your skills. We're looking for someone experienced with Python, React, Material UI, Redux, Service Workers, Fast API, Django, Flask, Git and Azure. We'd also like this person to have a proven track record working as a fullstack developer or similar role with strong problem solving skills, attention to detail and initiative to get things done. Fully remote team working across the globe but with a fantastic team culture. This is not a project role but an open ended requirement so you really...
...using Azure Data Factory (ADF). Optimize data transformation processes using PySpark. Production experience delivering CI/CD pipelines across Azure and vendor products. Contribute to the design and development of enterprise standards. Key knowledge of architectural patterns across code and infrastructure development. Requirements: Technical Skills and Experience: Bachelor’s or master’s degree in computer science, Engineering, Data Science, or equivalent experience, with a preference for experience and a proven track record in advanced, innovative environments. 7-8 years of professional experience in data engineering. Strong expertise in Microsoft Azure data services, particularly Azure Data Factory (ADF) and PySpark. Experience with data pipeline design, deve...
Hi, You will be working for 2 hours on a daily basis with the developer on zoom call Please confirm following - Early morning est 7 am to 9 am ist Daily 2 hours, Zoom call budget approx 500 /hr Required skills- Data Engineer/ Databricks Developer: Python, spark, pyspark, SQL Azure cloud, data factory Scala Terraform Kubernetes
I am looking for a professional to design a series of training videos for beginners on Python, SQL, Pyspark, ADF, Azure Data Bricks, and Snowflake. The primary goal of these videos is to teach the fundamental principles and techniques associated with each of these technologies. As such, the curriculum for each technology will need to be developed from scratch, ensuring that it covers all the necessary topics in a clear and engaging manner. Key responsibilities include: - Developing a detailed curriculum for each technology - Creating high-quality video content - Providing thorough explanations in PDF format - Incorporating our logo into each video Ideal candidates should have: - A strong background in IT, with a focus on the technologies listed - A proven track record in creat...
Senior Python (Full Stack) Engineer Timezone: 1:30 PM to 10 PM IST What we expect: Strong knowledge of Python Experience with one of backend frameworks (Flask/Django/FastAPI/Aiohttp) Experience with one of the modern ...frameworks (React, Angular, , Vue.js) Experience with AWS Cloud database related experience (NoSQL, relational DBs) Good understanding of application architecture principles Good written and verbal skills in English (upper-intermediate or higher) Nice to have: Knowledge of and experience in working with Kubernetes Experience with Data Engineering / ETL Pipelines (Apache Airflow, Pandas, PySpark, Hadoop, etc.) Experience with CI/CD systems Experience with Linux/Unix Experience in working with cloud automation and IaC provisioning tools (Terraform, CloudFormation, et...
I'm looking for someone with solid experience in Google Cloud Platform (GCP) and Databricks, specifically for data processing and analytics. Your primary responsibility will be to translate SQL code to Spark SQL and adapt, so this experience is crucial. Key Responsibilities: - Translating SQL code to Spark SQL, and adapting as necessary - Working with Google Cloud Platform (GCP) and Databricks - Data processing and analytics Ideal Skills and Experience: - Strong experience in Google Cloud Platform (GCP) and Databricks - Proficient in SQL and Spark SQL - Previous experience working on data processing and analytics projects - A solid understanding of cloud storage and databases - Ability to efficiently adapt SQL code to Spark SQL Please apply if you have the required skills and exp...
I'm looking for someone with solid experience in Google Cloud Platform (GCP) and Databricks, specifically for data processing and analytics. Your primary responsibility will be to translate SQL code to Spark SQL and adapt, so this experience is crucial. Key Responsibilities: - Translating SQL code to Spark SQL, and adapting as necessary - Working with Google Cloud Platform (GCP) and Databricks - Data processing and analytics Ideal Skills and Experience: - Strong experience in Google Cloud Platform (GCP) and Databricks - Proficient in SQL and Spark SQL - Previous experience working on data processing and analytics projects - A solid understanding of cloud storage and databases - Ability to efficiently adapt SQL code to Spark SQL Please apply if you have the required skills and exp...
I'm looking for someone with solid experience in Google Cloud Platform (GCP) and Databricks, specifically for data processing and analytics. Your primary responsibility will be to translate SQL code to Spark SQL and adapt, so this experience is crucial. Key Responsibilities: - Translating SQL code to Spark SQL, and adapting as necessary - Working with Google Cloud Platform (GCP) and Databricks - Data processing and analytics Ideal Skills and Experience: - Strong experience in Google Cloud Platform (GCP) and Databricks - Proficient in SQL and Spark SQL - Previous experience working on data processing and analytics projects - A solid understanding of cloud storage and databases - Ability to efficiently adapt SQL code to Spark SQL Please apply if you have the required skills and exp...
...have a high-complexity T-SQL stored procedure used for data analysis that I need translated into PySpark code. The procedure involves advanced SQL operations, temporary tables, and dynamic SQL. It currently handles over 10GB of data. - Skills Required: - Strong understanding and experience in PySpark and T-SQL languages - Proficiency in transforming high complexity SQL scripts to PySpark - Experience with large volume data processing - Job Scope: - Understand the functionality of the existing T-SQL stored procedure - Rewrite the procedure to return the same results using PySpark - Test the new script with the provided data set The successful freelancer will assure that the new PySpark script can handle a large volume of data and maintai...
conversion modeling / predictive analytics. The whole department is transitioning to DataBricks. I need help with creating conversion models using pyspark. Compare the results to last year and what could have been a better approach.
I'm looking for a data engineer with solid Pyspark knowledge to assist in developing a robust data storage and retrieval system, primarily focusing on a Data Warehouse. Key Responsibilities: - Implementing efficient data storage solutions for long-term retention and retrieval - Ensuring data quality and validation procedures are in place - Advising on real-time data processing capabilities Ideal Candidate: - Proficient in Pyspark with hands-on experience in data storage and retrieval projects - Familiar with Data Warehousing concepts and best practices - Able to recommend and implement appropriate real-time processing solutions - Strong attention to detail and commitment to data quality. Specifically, I have a Jira ticket that consists of creating an application tha...
I'm seeking a knowledgeable Databricks Data Engineer to expertly navigate Python and Pyspark programming languages for my project. Your primary task will be optimize Delta Live Tables pipeline that is processing real-time data processing, optimization, and change data capture (CDC). An extensive working knowledge of Azure cloud platform is a must for this role. Your understanding and ability to apply crucial elements in these areas will greatly contribute to the success of this project. Applicants with proven experience in this field are preferred. In your proposal state if you have DLT experience else you wont be considered.
As the professional handling this project, you'll engage with big data exceeding 10GB. Proficiency in Python, Java, and Pyspark are vital for success as we demand expertise in: - Data ingestion and extraction: The role involves managing complex datasets and running ETL operations. - Data transformation and cleaning: You'll also need to audit the data for quality and cleanse it for accuracy, ensuring integrity throughout the system. - Handling Streaming pipelines and Delta Live Tables: Mastery of these could be game-changing in our pipelines, facilitating the real-time analysis of data.