Hadoop lucene nutch jobs
... Telephone calls as an alternative mechanism for teachers with health or accessibility issues. Processing medical records: Integrate OCR algorithms and computational lexicography to convert medical documents into operational data. Indexing engine (Apache Lucene/PyLucene): Use this standard to tokenize and index medical text, enabling fast and transparent searches in records. Example of Python source code illustrating the basic architecture of the channel: from flask import Flask, request, jsonify import jwt, cv2, pytesseract, lucene from import StandardAnalyzer from import Document, Field from import IndexWriter, IndexWriterConfig from import RAMDirectory app = Flask(__name__)
Automated Terraform infrastructure deployments, achieving a 97% security score in CSPM, ensuring cloud i...production AWS infrastructure across 15+ services, including EMR, WAF, GuardDuty, and Global Accelerator, supporting a highly available and secure multi-environment setup. Managed hybrid deployments spanning on-premises and AWS Serverless environments, enabling the team to maintain minimum-downtime releases across infrastructure types. Installed and configured a distributed data ecosystem — including Hadoop (HDFS/YARN), Spark, Hive, Kafka, Trino, Redis, PostgreSQL, and OpenSearch — across standalone and multi-node cluster environments. Diagnosed and resolved deployment failures stemming from missing environment variables, database connectivity issues, and service st...
...of computer science. Key elements I need covered: • A concise yet persuasive abstract that positions the topic within current Big Data trends. • A thorough literature review highlighting state-of-the-art approaches to large-scale image analysis, their limitations, and the opportunity my research will address. • A detailed methodology describing data acquisition, storage solutions (e.g., Hadoop, Spark, distributed file systems), model architecture choices, and evaluation metrics suitable for high-volume image streams. • A realistic timeline and milestone plan for a 3–4 year doctoral program, including publication targets. • Referenced citations formatted to IEEE style. Please ensure the final document is academically sound, plagiarism-fre...
Freelance OpenSearch/Elasticsearch/Solr Expert Project Overview I am seeking an experienced freelance developer to build custom plugins, contribute OpenSearch commits, perform bug fixes, and optimize search functionality for our Lucene-based search engine projects. This is a short-term to ongoing remote contract (10-20 hours/week initially). Key Requirements 5+ years hands-on experience with OpenSearch, Elasticsearch, or Solr (OpenSearch preferred). Strong proficiency in Java (core language, JVM tuning, build tools like Maven/Gradle). Deep expertise in Apache Lucene (indexing, querying, analyzers, postings, segments). Proven track record building plugins/extensions (e.g., custom analyzers, scorers, or transport layers). Experience with OpenSearch contributions (GitHub ...
I’m ready to commission a complete instructional module aimed at advanced data-science professionals who already speak fluent Python...project rubric. • Package everything so it can be imported into a standard LMS (SCORM or xAPI preferred). Acceptance criteria The module is considered complete when a pilot group of advanced learners can: 1. Implement and compare multiple ML algorithms on a supplied dataset. 2. Produce interactive visualizations that surface non-trivial insights. 3. Handle a big-data workload (Spark, Hadoop, or similar) and articulate performance trade-offs. If you’ve built graduate-level or enterprise-grade technical courses before, let’s talk about timelines and how you’d approach the storytelling, assessments, and technic...
We need an experienced Informatica BDM developer to join our team for full-time contract work supporting data engineering and ETL development projects. Requirements: • 7+ years of experience with Informatica Data Engineering, DIS and MAS • Strong expertise in Databricks and Hadoop ecosystems • Proficiency with relational SQL and NoSQL databases (Azure Synapse, SQL Server, Oracle) • Experience with major cloud platforms (Azure, AWS, or Google Cloud) • Knowledge of Agile methodologies and tools like SCRUM, TFS, and JIRA • Advanced SQL skills including T-SQL and PL/SQL • Experience building and optimizing big data pipeline architectures • Hands-on experience developing both batch and real-time workloads • Knowledge of Data Lake and dimensio...
...ends when the transformed results are written back to a target schema (or files, if that proves more efficient). Key points you should know • Source: relational database containing nested JSON / key-value blobs. • Goal: parse, normalize, and flatten these blobs into well-defined columns while preserving relationships and lineage. • Scale: millions of rows, so solutions that leverage Spark, Hadoop, BigQuery, Snowflake, or well-tuned SQL/Python pipelines are welcome—as long as they remain maintainable. Deliverables 1. Transformation code (Python, PySpark, SQL, or Scala) with clear comments. 2. A runnable job definition or workflow file (Airflow DAG, Spark submit script, dbt model, etc.) that shows how to execute the pipeline end-to-end. 3. Simple ...
...Everything must run on Apache Hadoop (current 3.x stack on Cloudera CDP). If you feel a touch of Hive, Pig, or even straight MapReduce will speed things up, I am open to it, but Hadoop remains the core platform. SQL engines or Spark can be mentioned if they genuinely simplify a step, yet the final solution must stay centred on Hadoop. Deliverables: • Working Hadoop jobs that clean, aggregate, and store results back to HDFS • Clear, commented code in Git • A concise hand-off guide (read-me or screenshare) so my in-house team can rerun the workflow unaided Accuracy, performance tuning, and straightforward documentation are more important to me than flashy dashboards. When you reply, please reference comparable structured-data analysis you h...
I’m upgrading our analytics stack and need an expert who can own the Hadoop side and turn raw, high-volume feeds into analysis-ready datasets. The core objective is to design and build end-to-end data pipelines on a Hadoop cluster—this is where I believe Hadoop will be most valuable for the project. Here’s what I need from you: • An architecture that takes terabyte-scale log files, lands them in HDFS, applies basic cleansing, and outputs partitioned Parquet tables queryable from Hive or Spark • All scripts, configs, and scheduling (Oozie, Airflow, or your preferred orchestrator) committed to Git with clear documentation • A deployment guide plus a brief hand-over session so I can reproduce the setup on another cluster Acceptance c...
I want to turn our raw data into clear, actionable customer insights. Your task is to design and run a full analytics workflow—everything from choosing the right architecture and cleaning the data to building the models and presenting the findings in an easy-...behaviour patterns or segmentation opportunities; I’ll use those cases to judge fit and approach. At project hand-off I expect: • A concise technical outline of the pipeline you implemented • Interactive visualisations or dashboards that explain the key insights • A short, plain-language brief summarising recommended next steps for the business Familiarity with distributed processing (Hadoop, Spark), cloud storage, and common analytics tools (Python, SQL, Tableau or similar) will help yo...
I have an Azure Databricks SQL warehouse that needs to read from an external Hive metastore hosted on HDFS using thrift service. The moment the warehouse tries to reach that store, the job fails with an “authentication failure” coming back from HDFS. I am using a token-based authentication flow rather than Kerberos or a simple username/password comb...that needs to read from an external Hive metastore hosted on HDFS using thrift service. The moment the warehouse tries to reach that store, the job fails with an “authentication failure” coming back from HDFS. I am using a token-based authentication flow rather than Kerberos or a simple username/password combination, and I suspect the problem lies either in how the token is passed or in mismatched Hadoop core-si...
...matches ranked by relevance by default, with an optional toggle to sort by date. • Narrow results by file type so they can quickly focus on just PDFs, DOCX files, or TXT notes. A lightweight web interface or a small REST API is fine—whichever you feel will get the fastest, most reliable response times. I am comfortable provisioning a Linux server, so feel free to lean on Elasticsearch, Apache Lucene/Solr, or another open-source stack you trust; just outline why you picked it and any helper libraries (for example, Tika for document parsing) in your proposal. Deliverables 1. Source code and setup script/container so I can deploy with a single command. 2. Clear README covering prerequisites, indexing instructions, and how to enable the sort/filter controls. 3. A...
...predictive modeling. 6. Deep Learning Neural Networks Convolutional Neural Networks (CNN) Recurrent Neural Networks (RNN) Why: For advanced pattern recognition, image, text, and sequence modeling. 7. Specialization Natural Language Processing (NLP) Computer Vision Social Network Analysis Why: Focus on a niche in data science for higher career impact. 8. Big Data & Distributed Computing Hadoop Spark Why: Handling very large datasets efficiently in real-world scenarios. The work will be structured around weekly video sessions, targeted exercises and project critiques. On the math side, I need you to revisit core probability, statistics, linear algebra and any calculus concepts that routinely appear in data-science screening tests. Coding practice must revolve aro...
I have a Hadoop cluster holding several large data sets, and I need a seasoned PySpark developer who also writes rock-solid SQL. The immediate aim is to connect to the cluster (YARN/HDFS with Hive metastore), develop or refine PySpark jobs, optimise the accompanying SQL, and make sure everything runs smoothly end-to-end. You’ll receive access to a staging namespace plus a sample of the data. Once the logic checks out we’ll promote the code to the full environment. Deliverables • A clean, well-commented PySpark notebook or .py job that executes successfully on the cluster • The corresponding SQL script or view definitions ready for Hive or spark-sql • A concise README detailing execution steps, parameters, and expected outputs Acceptance criteria...
- **Core Architecture:** Spring Cloud + Kafka + Hadoop + Python Automation,This project requires a certain level of technical expertise.
- **Core Architecture:** Spring Cloud + Kafka + Hadoop + Python Automation,This project requires a certain level of technical expertise.
**Operating System:** CentOS 7.x (7.9 recommended) **Core Architecture:** Spring Cloud + Kafka + Hadoop + Python Automation **Core Package Name:** ``
My current résumé sells me as a data engineer, yet my next move is a Data Analyst role. I need the Work Experience and Skills sections re-worked so recruiters immediately see me as a strong analytical hire. Here’s what you’ll be working with • Hands-on background in Hadoop administration, PySpark development, Databricks workflows and day-to-day data analysis. • A solid foundation in SQL and reporting tools, though these strengths are not highlighted well in the document. What I’m after • Rewrite both sections to spotlight analytical impact, business-friendly storytelling and in-demand keywords (think SQL, dashboards, data visualization, statistical insight, KPI tracking, etc.). • Re-order bullet points around results, not...
...tells the engineer: • Performance > fancy tech • No rebuild madness • Practical results expected ⸻ Current Tech Stack (must be explicit) Frontend • React (web) Backend / Infra • Firebase Firestore • Firebase Auth • Firebase Cloud Functions • Firebase Storage (if any) Expected Add-on (optional, NOT mandatory) • PostgreSQL (Supabase / Cloud SQL) — only if justified Snowflake, Kafka, Hadoop NOT required at this stage. ⸻ Scale Assumptions (anchor the design) Engineer must assume: • 2-2.5 lakh total users • 40–60k daily active users • High read/write operations during school hours • Multiple schools, each with: • Students • Teachers • Batches • Attendance &bu...
...reproducible write-up that shows exactly how you installed Hadoop on a Linux virtual machine and then proved it works with a small demo program (a classic word-count example is fine). The main deliverable is a Word document (DOCX) that walks me through each command, configuration change, and verification step from a clean OS all the way to running the MapReduce job successfully. Screenshots or terminal snippets inside the doc are welcome so I can follow along without guessing. Please base the walkthrough on the current, stable Hadoop release you are comfortable with; just call out the version in the guide so I can download the same binaries. Assume a vanilla Linux environment with standard package managers available and no prior Hadoop components installed. To rou...
...field rely on a mix of technical software and programming languages: Programming: SQL for database querying, and Python or R for complex statistical modeling and manipulation. Visualization: Tableau and Power BI for creating dynamic visual reports. Spreadsheets: Microsoft Excel remains a fundamental tool for quick calculations and basic data cleaning. Big Data: Frameworks like Apache Spark and Hadoop are used to process massive, complex datasets. 4. Career Outlook in 2025 The World Economic Forum's Future of Jobs Report 2025 highlights data analysts and scientists as one of the fastest-growing job categories. Entry-level roles, such as Junior Data Analyst, often start with salaries around $74,000 in the US, with significant growth potential as professionals specialize in a...
...field rely on a mix of technical software and programming languages: Programming: SQL for database querying, and Python or R for complex statistical modeling and manipulation. Visualization: Tableau and Power BI for creating dynamic visual reports. Spreadsheets: Microsoft Excel remains a fundamental tool for quick calculations and basic data cleaning. Big Data: Frameworks like Apache Spark and Hadoop are used to process massive, complex datasets. 4. Career Outlook in 2025 The World Economic Forum's Future of Jobs Report 2025 highlights data analysts and scientists as one of the fastest-growing job categories. Entry-level roles, such as Junior Data Analyst, often start with salaries around $74,000 in the US, with significant growth potential as professionals specialize in a...
...how customers across Latin America actually buy. The primary aim is to analyze customer behavior, zeroing in on purchase patterns. For phase one the only source on hand is a wide set of structured and unstructured customer surveys gathered in Spanish and Portuguese markets. The job is to clean, standardize, and load these surveys into an environment that supports high-volume processing (Spark, Hadoop, or an equivalent cloud stack you favour). Once the data is stable, I want exploratory analysis, clustering, and predictive modelling that highlight: • Which demographic segments purchase which categories most frequently • Seasonal or regional spikes in demand • Correlations between stated preferences and actual spend reported in the surveys Tableau, Power ...
Scope & Objectives You will be responsible for: • Designi...FluentMigrator • Distributed caching with Redis • Authentication & authorization with ASP.NET Core Identity and Azure Active Directory • HTTP security headers and security configuration • Application & tenant settings • Lucene-based indexing and search • Implementing initial core modules: • Setup Module • Tenant Management Module (host/default tenant only) • Identity Module (ASP.NET Core Identity) • External Authentication Module (Azure Active Directory) • Redis Distributed Cache Module • Resources Module (CSS & JS management) • Security Module (HTTP security headers) • Settings Module • Lucene Indexing Module • Prepa...
Roles and Responsibilities - Lead the design and development of AI and ML applications across cloud or on-prem environments - Build, maintain, and optimi...AI/ML engineering, including 3+ years in MLOps - Strong hands-on experience with AWS SageMaker, Databricks, and CI/CD tools - Proficiency in Python; experience with R or SAS is an advantage - Experience deploying, monitoring, and optimizing ML models in production environments - Knowledge of Docker, Kubernetes, REST APIs, and JSON processing - Familiarity with big-data ecosystems such as Spark or Hadoop - Solid understanding of ETL processes, data modeling, and data engineering practices - Experience with cloud platforms (AWS, Azure, or GCP) and ML-focused architectures - Strong version control, dependency management, and automat...
...sql, Hadoop, Spark and Kafka • Rigorous case-study style problem solving, mirroring the whiteboard or live-coding format used by most FAANG-scale companies Each session should pair concise theory refreshers with hands-on exercises—think cluster configuration walk-throughs, optimisation scenarios, streaming pipeline design and end-to-end data-flow troubleshooting. I also want timed mock interviews where you fire real questions, then give immediate, detailed feedback on clarity, depth and trade-off analysis. I can handle pre-work between meetings, so feel free to assign take-home challenges. If you believe touching briefly on Redshift/Snowflake architecture or tightening my Python-SQL idioms will strengthen my narrative, I’m open to short detours, but the mai...
...Pipeline Project Overview This term project involves building an end-to-end data pipeline using big-data tools and streaming technologies. The system will ingest data, process it in real-time with machine learning algorithms, and store analysis results for visualization. Key components proposed include Apache Kafka for data streaming, Apache Spark (Streaming) for real-time processing, Apache Hive (or Hadoop HDFS) for data warehousing, and MongoDB for storing processed results. All code will likely be written in a high-level language (such as Python via PySpark) to integrate these components. Below, we break down the project requirements and plan into specific sections. Big Data Tools and Frameworks: The pipeline will leverage the following technologies: • Apache Kafka: Kafka...
...and resolve performance or integration issues in real-time. Guide candidates through daily deliverables and project requirements. Ensure quality, accuracy, and timely completion of all assigned tasks. Technical Expertise Required: Strong proficiency in: Python, SQL, and Data Modeling. ETL Tools: Airflow, Informatica, AWS Glue, Azure Data Factory, or equivalent. Big Data Technologies: Spark, Hadoop, Hive. Cloud Platforms: AWS, Azure, or GCP (preferably Redshift, Snowflake, Databricks). Additional Skills: CI/CD pipelines, Git, data governance, performance optimization. Qualifications: Minimum 10+ years of experience in Data Engineering, ETL Development, or Data Pipeline Architecture. Strong background in mentoring or technical work support preferred. Excellent communica...
...Offer guidance on data architecture, best practices, and real-time project execution. Technical Skills Required (any of the following): Programming: Python, SQL, PySpark ETL Tools: Apache Airflow, Talend, Informatica, or similar Cloud Platforms: AWS (Glue, Redshift, S3), Azure (Data Factory, Synapse), or GCP (BigQuery) Databases: PostgreSQL, Snowflake, MySQL, MongoDB Big Data Technologies: Hadoop, Spark, Databricks (preferred) Version Control / CI-CD: Git, Jenkins Ideal Candidate: Has 10+ years of experience in data engineering and related technologies. Strong in troubleshooting, architecture design, and real-time project handling. Flexible for US shift hours if required. Excellent communication and client interaction skills. Note: This is not a training program. On...
...program that spans Cloud Computing (AWS, Azure, GCP), Blockchain, and Big Data Analytics. My goal is to cover everything on the syllabus— IaaS/PaaS/SaaS, serverless architectures, hybrid cloud patterns, distributed ledgers, smart contracts, DeFi use-cases, Hadoop, Spark, data lakes, ETL pipelines, real-time analytics, microservices, observability, fault tolerance, and more. Because I selected “All courses” in every category, I’m looking for versatile tech tutors who can teach the full stack rather than just a single niche. For Hadoop / Spark specifically, the need is hands-on guidance: environment setup, performance tuning, and practical data processing labs. What I’d like from you is a blend of live online sessions and well-structured l...
Seeking a skilled data scientist or engineer to create a database of millions of influencers across platf...storing large datasets (100M+ records) with tools like Python, SQL/NoSQL, or cloud services (AWS/GCP). Generating analytics (e.g., engagement scores, audience insights) using data science techniques. Building a scalable prototype database with sample search functionality. Requirements: Experience in web scraping (Scrapy, Selenium) and social media APIs. Proficiency in Python, big data tools (Spark, Hadoop), and databases (MongoDB, PostgreSQL). Portfolio of similar projects. Preferred Skills: Data Science, Web Scraping, API Integration, Machine Learning, Database Management, Cloud Computing. NOTE: If your a bot bidding, it will be ignored and trashed. Unique bids and mess...
...designing a concise research plan, identifying suitable datasets, and outlining the analytical workflow. Think of it as a focused blueprint that lets me move from idea to executable study without guesswork. • What I value. A freelancer who is comfortable with statistical thinking, qualitative and quantitative research methods, and at least one big-data toolset such as Python + Pandas, R, Spark, or Hadoop. The ability to translate complex concepts into a clear, step-by-step approach is essential. • Deliverables. By the end of the engagement I expect: – A written methodology document (2-4 pages) covering data acquisition, preprocessing, and chosen analytical techniques. – A brief outline of validation or testing steps to ensure robustness. – ...
I have a sizeable dataset and want to see it come to life through clear, interactive visuals. I’m flexible on the platform—Apache Hadoop, Apache Spark, Amazon Redshift, or another big-data stack you trust—so long as it can handle large volumes smoothly. Here’s what I need at this stage: • Connect to the source data I’ll provide privately. • Build a lightweight pipeline that produces one or two sample dashboards or charts illustrating key insights. • Write a brief step-by-step note so I can rerun or extend the workflow later. Think of this as an initial concept rather than a full production system; speed and clarity matter more than exhaustive coverage right now. Let me know which tool you’ll use, how quickly you can turn aroun...
...JPA/Hibernate, Spring Boot and a working understanding of Hadoop (MapReduce). You will be involved in building and optimizing data-driven applications, integrating distributed data processing with modern Java frameworks. Key Responsibilities Design and develop backend components using Java & Spring Boot Implement data-processing workflows using Hadoop (MapReduce) Create and manage JPA entity classes, relationships, and database transactions Apply dependency injection, manage application scopes, and optimize performance Collaborate with frontend and data-engineering teams to deliver scalable solutions Required Skills Strong proficiency in Core Java and Object-Oriented Design Hands-on with JPA / Hibernate / Spring Data JPA Good knowledge of Hadoop / MapR...
...might involve spinning up a Spark cluster (Hadoop is also in play for certain batch jobs), writing production-ready Python and SQL, then pushing the results into a downstream model. Beyond moving data, I expect you to carry the torch into exploratory analysis and model development: classic time-series forecasting, supervised machine-learning workflows, deeper neural nets, all the way through to experiments with large language models when the use-case warrants it. R is on our stack for ad-hoc statistical work, so familiarity there is a plus. Deliverables I will review for acceptance: • Robust, version-controlled ETL scripts/notebooks with clear logging and error handling • Automated scheduling (Airflow or similar) and resource-optimized Spark/Hadoop jobs &b...
Hypothetical ai , quantum ai , AGI(artificial general intelligence) , generative ai, multi agentic researcher ai and classical ai powered Virtual novel drug discovery and development Support to software : Data, Data format , Data base, Da...Quantum & AGI Tools Qiskit (IBM) → Quantum AI / drug simulation PennyLane (Xanadu) → Quantum machine learning OpenAI Gym / PettingZoo → Multi-agent reinforcement learning AGI Frameworks (NARS, OpenCog Hyperon, Leela AI) → AGI experimentation Data, Database & Integration SQL (MySQL, PostgreSQL, SQLite) → Structured data storage NoSQL (MongoDB, Cassandra) → Unstructured + big data Apache Spark / Hadoop → Data mining & large-scale processing Pandas / NumPy → Data learning, transformation ETL Tools...
Hypothetical ai , quantum ai , AGI(artificial general intelligence) , generative ai, multi agentic researcher ai and classical ai powered Virtual novel drug discovery and development Support to software : Data, Data format , Data base, Da...Quantum & AGI Tools Qiskit (IBM) → Quantum AI / drug simulation PennyLane (Xanadu) → Quantum machine learning OpenAI Gym / PettingZoo → Multi-agent reinforcement learning AGI Frameworks (NARS, OpenCog Hyperon, Leela AI) → AGI experimentation Data, Database & Integration SQL (MySQL, PostgreSQL, SQLite) → Structured data storage NoSQL (MongoDB, Cassandra) → Unstructured + big data Apache Spark / Hadoop → Data mining & large-scale processing Pandas / NumPy → Data learning, transformation ETL Tools...
Hypothetical ai , quantum ai , AGI(artificial general intelligence) , generative ai, multi agentic researcher ai and classical ai powered Virtual novel drug discovery and development Support to software : Data, Data format , Data base, Da...Quantum & AGI Tools Qiskit (IBM) → Quantum AI / drug simulation PennyLane (Xanadu) → Quantum machine learning OpenAI Gym / PettingZoo → Multi-agent reinforcement learning AGI Frameworks (NARS, OpenCog Hyperon, Leela AI) → AGI experimentation Data, Database & Integration SQL (MySQL, PostgreSQL, SQLite) → Structured data storage NoSQL (MongoDB, Cassandra) → Unstructured + big data Apache Spark / Hadoop → Data mining & large-scale processing Pandas / NumPy → Data learning, transformation ETL Tools...
Hello, We are looking for an experienced trainer to deliver a short-term training project on "Hadoop Administration". Responsibilities: - Conduct focused training sessions on Hadoop Administration - Create or adapt training material as required - Provide hands-on lab guidance (if applicable) Requirements: - Proven experience in Hadoop Administration - Prior corporate training experience preferred - Ability to deliver training effectively within a short-term timeline To Apply, Please Share: - Updated CV / Profile - Course contents (TOC) - Daily / Hourly commercial rates - Lab availability & charges (if applicable) - Your availability schedule Looking forward to collaborating with the right expert. Best regards, Anjali Koenig Solutions
...beginner-focused Data Science program and need it steered by someone with real-world experience in Machine Learning, Data Analysis and Big Data. The aim is to take absolute newcomers from their first line of Python to completing mini-projects that mirror what happens on the job, mixing clear theory blocks with hands-on coding in Jupyter, pandas, scikit-learn and, when we reach scale topics, Spark or Hadoop. Mostly our trainees are working in System Administration field. The format is live online sessions (about 4–6 hours a week), backed up by recordings, practical notebooks and weekly assignments that you grade and discuss in feedback clinics. I’ll supply the virtual classroom and handle enrolment; you handle the teaching and mentoring touchpoints that keep motivati...
...social media, IoT devices, sensors, clickstreams, etc. Tools: Apache Flume, Kafka, Sqoop (for importing from databases). --- 2. Data Storage Big Data needs distributed, fault-tolerant storage (not just normal databases). Options: HDFS (Hadoop Distributed File System) – stores data across many machines. NoSQL Databases – MongoDB, Cassandra, HBase. Cloud Storage – AWS S3, Google Cloud Storage, Azure Data Lake. --- 3. Data Processing Once stored, data must be processed (batch or real-time). Batch Processing (large chunks at once): Hadoop MapReduce Apache Spark (faster, in-memory processing) Stream Processing (real-time, continuous): Apache Kafka + Spark Streaming Apache Flink / Storm --- 4. Data Analysis Use algorithms & ML ...
...interpret data and recommend strategies. Requirements: - Proven experience managing data analytics projects for eCommerce or digital marketing agencies. - Expertise in SQL, Excel, R., and data visualization tools (Google Data Studio, Tableau, Power BI), SAS. - Data management: Experience with database systems (SQL, NoSQL), data warehouses (e.g., Teradata, Snowflake), and big data tools (e.g., Hadoop, Spark). - Cloud platforms: Familiarity with cloud services from providers like AWS, Google Cloud, and Microsoft Azure. - Advanced analytics: Knowledge of statistical modeling, machine learning, and predictive analytics techniques. - A bachelor's degree in a quantitative field such as computer science, statistics, mathematics, or a related field is typically the minimum requir...
We are looking for vendors who can provide s...our team. The ideal candidate will have experience in designing, building, and optimizing scalable data pipelines and architectures. You will work closely with data scientists, analysts, and application developers to ensure efficient data flow, reliability, and availability across systems. Key Skills Required: Strong proficiency in SQL, Python/Scala Hands-on experience with Big Data frameworks (Hadoop, Spark, Kafka, etc.) Experience in ETL pipeline design and optimization Knowledge of data warehousing (Snowflake, Redshift, BigQuery, etc.) Familiarity with cloud platforms (AWS/GCP/Azure) Strong problem-solving and debugging skills If you have suitable candidates available, please share their profiles along with engagement models and co...
I'm looking for an experienced Hadoop Big Data Developer skilled in Scala and PySpark. You will work with: - Raw zone containing nested JSON and XML data. - Managed zone where data needs to be stored in JSON or Parquet format, creating Hive tables per client requirements. Ideal Skills and Experience: - Proficiency in Hadoop ecosystem - Strong expertise in Scala and PySpark - Experience with Hive and data transformation - Ability to work with nested JSON and XML
...delete) Document types or metadata tags Workflow status or name Inactivity thresholds (e.g., users inactive > 30 days) Implement secure, role-based access to all reporting interfaces. Deliver clear CSV export capability for all report views. Provide documentation and test data for validation. Technical Requirements Must-Have Skills: Alfresco Community Edition 5.2 (deep experience) REST API, CMIS, Lucene/SOLR queries Java or JavaScript backend development (for custom repo modules) CSV export implementation Alfresco content model and metadata handling Alfresco Audit Subsystem configuration and parsing Nice-to-Have: Experience with: Alflytics JasperReports or Pentaho integration Alfresco Share Dashlet/UI development Familiarity with extracting and storing audit data to external...
...for Android. Web Apps: HTML5, CSS, JavaScript, React, Angular, or Vue.js. Media Encoding & Compression: Audio Codecs: MP3, AAC, OGG, or newer formats like Opus for compressing audio files without significant quality loss. Recommendation & Personalization: Machine Learning & AI: Algorithms for personalized playlists, recommendations, and user behavior analysis. Data Analytics: Tools like Spark, Hadoop, or dedicated analytics platforms. Digital Rights Management (DRM): Technologies to protect copyrighted content, such as Widevine, PlayReady, or FairPlay. Payment & Subscription Management: Integration with payment gateways and subscription billing systems. Security: Authentication and authorization protocols like OAuth 2.0, SSL/TLS encryption, and secure token s...
Senior Data Engineer. In this role, you’ll play a key part in designing, building, and optimizing large-scale data processing systems that pow...3 years of hands-on experience with Databricks. • Strong SQL skills and experience with relational and NoSQL databases (e.g., MySQL, PostgreSQL). • Proven experience with SAS migration projects. • Proficiency in programming languages such as Python, Scala, or Java. • Experience with ETL frameworks and orchestration tools. • Familiarity with big data technologies (e.g., Apache Spark, Kafka, Hadoop) and major cloud platforms (AWS, GCP, or Azure). • Solid understanding of data warehousing, data modeling, and schema design. • Excellent problem-solving skills and attention to detail. • Strong...
I need an expert to perform spatial queries on a MySQL database, specifically for polygon data. Key Requirements: - Proficiency in MySQL, Apache spark - Experience with spatial data and queries - Knowledge of polygon data handling - Data manipulation skills Familiarity with the following technologies is a plus: - Apache Spark 3.4.1 - SparkSQL 3.4.1 - Scala 2.13 - Java 17.0.8 - Hadoop 3.3.6 - Python 3.10 - SBT 1.9.4
I need an expert to perform spatial queries on database u Essential tasks include: - Writing and optimizing spatial queries - Working with polygon data Ideal skills and experience: - Proficiency in MySQL - Experience with spatial data and queries - Knowledge of polygon data handling ● Apache Spark 3.4.1 ● SparkSQL 3.4.1 ● Scala 2.13 ● Java 17.0.8 ● Hadoop 3.3.6 ● Python 3.10 ● SBT 1.9.4 Please provide relevant work experience in your bids
...model, optimize large-scale data pipelines, or query massive datasets — I can deliver solutions tailored to your needs. What I Can Do for You: AI & Machine Learning Deep Learning (CNN, LSTM, Transformer) Image & Text Processing (NLP, OCR, Captioning) Model training, tuning, and evaluation TensorFlow, PyTorch, scikit-learn Big Data Engineering ETL pipelines with Apache Spark Hive, HBase, Hadoop ecosystem Data warehouse optimization Stream & batch data processing Databases SQL (PostgreSQL, MySQL) NoSQL (MongoDB, Neo4j, Cassandra) Graph databases & complex queries Development Tools Python scripting & automation REST APIs, FastAPI, Flask Git, Docker, Linux, Jupyter...