Filter

My recent searches
Filter by:
Budget
to
to
to
Type
Skills
Languages
    Job State
    3 jobs found, pricing in USD
    Python and Big data with AWS 3 days left
    VERIFIED

    programming language python -hands on experience on spark -hands on experience on Hadoop ecosystem , hive, sqoop, sql queries, Unix -cloud experience on cloudera or AWS -oozie workflow -experienced on creating cicd pipelines -Unit/Junit testing, integration or end to end testing -kafka -Tools that are needed to be familiar with Bit bucket, Tectia(edgenode), sql developer, oozie, Git, Jenkins

    $7 / hr (Avg Bid)
    $7 / hr Avg Bid
    6 bids

    I'm looking for some one whos expertise in pyspark data stratification, I have pseudo code available and from the data set, I'm looking to remove duplicates from post strata. Here's is sample set of data I have created a bin field based on agg_readings. And the Data is so huge with close to 320 Million records stored in hive with parquet format. Of the 320Million, I'm looking to get 5 Million based on stratification. Below is the sample snippet I have used sampleBy here to fetch the stratified based on two columns. ( Columns are - mnth_src_fld & bin). All I'm looking at the stratified data piece is to get gen_rnd_id unique values across the entire data post stratification, But unfortunately I'm not getting unique gen_rnd_id's. For instance, here in ...

    $22 (Avg Bid)
    $22 Avg Bid
    4 bids

    Looking for Data Engineer with experience with Databrick, Datalake, Spark, Redshift Job Title : Data Engineer with Databrick Location : Asheville, NC (Onsite) Experience :10Years Duration : - Long Term Contract US Work Authorization is must Job description: This job requires 8+ years within data engineering, experience with database and data lake development and/or management, understanding of cloud technologies such as IaaS and SaaS, understanding of security concepts around data lakes Responsibilities: · Data Onboarding – define onboarding procedures and work with business stakeholders to onboard new data sources. · This position is responsible for technical leadership of a team focused on the creation of a data lake repository and related ETL pr...

    $50 - $50 / hr
    Local
    $50 - $50 / hr
    0 bids

    Top Spark Community Articles