Find Me on the Moon: NASA Lunar Navigation Challenge Winners Announced
Freelancer announces the winners of the Find Me on the Moon: NASA Lunar Navigation Challenge.
I am looking for a freelancer to assist with the implementation of my graduation project. I already have a clear research idea and an initial proposed methodology, but please note that the methodology is flexible and open to refinement since this is still a proposal an...Writing the graduation thesis and paper You are NOT expected to: • Design a completely new research idea from scratch • Train the model yourself • Write the thesis or academic paper This is an academic project, so clarity, correctness, and reproducibility are very important. Experience in the following is a strong plus: • Deep Learning / PyTorch • Research-oriented implementations • Multimodal models (audio & visual) If you are interested, please share relevant experience...
I am working on a graduation-level academic research project in the area of AI and Computer Vision, specifically related to multimodal media analysis. I am looking for an experienced AI/ML research writer to help write a full academic paper, while I focus on the implementation, experiments, and code development. The research idea, experimental design, and results will be provided privately after selecting the freelancer. The role primarily involves translating technical concepts and experimental findings into clear, publication-quality academic writing. Responsibilities: * Writing all paper sections (Introduction, Related Work, Methodology, Experiments, Results, Discussion, Conclusion) * Structuring the paper according to academic standards * Ensuring originality, clarity, and prope...
Project Overview: We are looking for an experienced AI Automation Specialist to develop advanced multimodal AI agents. The ideal candidate has deep expertise in Google Cloud (Vertex AI/Agent Builder) and/or n8n workflow automation. You will be responsible for building agents capable of processing various data types (text, audio, images). Key Responsibilities: Design and deploy AI agents using Google Cloud Vertex AI (Agent Builder) or n8n. Implement multimodal capabilities (e.g., analyzing medical images, processing voice commands, and handling complex text queries). Integrate agents with external APIs and databases. Ensure workflows are robust, scalable, and secure. Requirements: Proven experience building AI Agents and workflows. Strong knowledge of...
I am building a clinically robust, retrieval-augmented framework that produces structured radiology reports from chest-x-ray images and associated text. Accuracy and clinical relevance drive every design choice, so I want the system to learn equally from both the IU X-ray and MIMIC-CXR datasets. The pipeline I envision looks like this: • Visual encoding with ViT-B16 to obtain global image embeddings. • Retrieval of the top-k similar studies from the training corpus to steer generation toward clinically plausible language and findings. • Text generation with Clinical T5, producing both the “Findings” and “Impression” sections. • Relation-aware validation using RadGraph, with a specific focus on analyzing relationships between clinical enti...
...a single AI agent that becomes the first point of contact for my dealership on every channel customers already use—voice calls, website chat/SMS, and email. The goal is for this agent to greet prospects, answer their questions, book test-drive or service appointments, and handle day-to-day customer service without human intervention unless the inquiry is escalated. Core capabilities I need • Multimodal communication: the same agent must work over Voice, Text/SMS, and Email, preserving context when a customer switches among them. • Full customer-service coverage: technical support, sales inquiries, and general questions about our inventory, financing, or policies. • Appointment setting: real-time scheduling into our existing calendar so customers can lock...
3A Logistics OS – End-to-End ERP, Control Tower & AI Operating System 1. Company Overview 3A International is a multimodal freight forwarding and logistics group in Egypt, operating: Air & sea freight (import/export, FCL/LCL, consolidation) Customs clearance & brokerage Inland multimodal transport (rail, river, road) Terminals, depots, CFS and value-added logistics We are ISO 9001 / 14001 / 45001 certified We want a custom, AI-native ERP / “Logistics Operating System” that becomes the central brain of the company. 2. Project Goal Build a web-based ERP platform that: Centralises all shipments and operations (air, sea, rail, river, road, customs, terminals). Manages customers, partners, carriers, contractors, rates and contracts in one...
...GPU. 2. Captions pass a basic grammar checker with ≥ 95 % accuracy and follow supplied style rules. 3. At least 80 % of generated media assets meet resolution and duration specs for major platforms (Instagram, TikTok, X). 4. Codebase installs from scratch with one command and all tests pass. If this aligns with your skill set, let’s discuss timelines and milestones so we can bring this multimodal content engine to life....
...GPU. 2. Captions pass a basic grammar checker with ≥ 95 % accuracy and follow supplied style rules. 3. At least 80 % of generated media assets meet resolution and duration specs for major platforms (Instagram, TikTok, X). 4. Codebase installs from scratch with one command and all tests pass. If this aligns with your skill set, let’s discuss timelines and milestones so we can bring this multimodal content engine to life....
...Expertise in AI and machine learning - Experience with live video processing - Proficiency in mobile app development - Background in computer vision technologies Real-Time Multimodal Vision & Wearable Platform Project Overview: We are building a cutting-edge, real-time "Action-Analysis" platform. The app uses a device’s camera to monitor high-speed activity, provides instant AI-driven verbal/visual verdicts, and allows for retrospective "highlight" clipping. We are moving toward a multi-camera ecosystem involving external hardware and wearable integration. Key Technical Requirements for Initial and Future Developments: Multimodal AI: Implementation of Gemini 2.0 Flash / Live API for real-time video/audio reasoning. Audio/Voice Logic: ...
...the user can upload either a résumé or a job description in PDF or Word format. Your backend should parse the document, identify key skills and context, and instantly generate a tailored set of interview questions. The next step is an AI-powered mock interview, ideally with real-time voice (and, if practical, video) so the system can follow up naturally. After the session finishes, I want a multimodal analysis engine—text, audio and video—to rate performance, uncover sentiment cues, and surface constructive feedback on a dashboard that’s clear and actionable. Deliverables • Fully tested social-login module for Facebook, Google and LinkedIn • Upload component that accepts PDF and Word files and feeds the question generator &...
This project covers preprocessing of a breast cancer mammography dataset strictly following the methodology as discussed. Tasks include lesion cropping using ground-truth masks, image resizing to 224×224, normalization, and augmentation (rotation, flipping). Clinical features will be encoded as one-hot vectors with proper handling of missing data to ensure full compatibility with downstream multimodal fusion models.
Project Overview: I am looking for a freelancer to draft a base research paper that consolidates concepts from a specific project (Causal Multimodal Diagnostic Agent) and several reference IEEE papers. The goal is to create a unified paper that synthesizes the observations, methodologies, and results from the provided materials into a single cohesive document. What Will Be Provided: Main Project Details: Documentation/summary of the "Causal Multimodal Diagnostic Agent" project. Reference Papers: A list of IEEE-standard papers related to the topic. Scope of Work: You are required to: Review: Read the provided project details and the additional reference papers. Synthesize: Combine the observations, methods, and findings from all provided sources. Draft: Write a stru...
Build a high-performance binary classifier using multimodal data: • images •tabular features The model must incorporate Explainable AI (XAI) In training and using advanced fusion technique.
I have a half-finished manuscript on MedXpert AI, our multimodal clinical decision assistant, that needs to be transformed into a fully developed research paper. The core emphasis must remain on the system’s technical implementation details, written in a formal academic style with clear sections, solid citations and polished language suitable for submission to a peer-reviewed venue. In parallel, I also need a compact, five-page survey paper that distils and showcases the most innovative features of MedXpert AI. This survey is meant to sit alongside the main article as a quick, literature-backed overview that highlights why our approach is novel compared with existing clinical decision assistants. Deliverables • Finalised technical paper on MedXpert AI’s implemen...
...(SPP profile) between Head Unit and Pocket Unit. Wi-Fi disabled on head unit. • Image Preprocessing: Grayscale conversion and JPEG compression to minimize data size. • Network Logic: 4G/LTE preference. If signal drops or timeout (>10s) occurs, trigger the error vibration immediately. • Target Latency: $<5$ seconds end-to-end (from capture to audio start). D. Software Architecture • Function: Multimodal Image Analysis. o Instead of local OCR, the system must send the compressed image directly to a Vision-capable Cloud AI (e.g., GPT-4o, Gemini Pro Vision). o This allows the logic of "where to start/stop reading" to be controlled via the prompt based on visual layout and finger position. • AI: Cloud AI supported (API-based). No API keys hardc...
I want a self-contained AI tutor that runs entirely on a Raspberry Pi zero w . Once installed it should let students ask anything—from world facts to coding techniques, web-design tips, image-gener...and image formats on demand. • Local inference only—TensorFlow Lite, ONNX-runtime, , , Stable Diffusion-Lite or similar lightweight frameworks are fine, as long as startup scripts and dependencies are provided. Acceptance for hand-over – Ready-to-run model files and optimized weights. – Python (or Bash) launcher that handles user input by voice or text and returns multimodal output. – Example session demonstrating a coding question, an image-based question, and an auto-generated mixed quiz. – Clear setup guide tested on a fresh Raspb...
...upload, retrieval, and Q&A Integrate functionality into our Angular front end and Laravel backend Enable the bot to display screenshots, images, or short instructional clips when helpful guide us in generating screenshots or visual steps on the fly after learning our application workflow Preferred Skills Strong experience with RAG pipelines, vector databases, and LLM tuning Familiarity with multimodal AI (text + images) Ability to create or guide demonstration clips or step-by-step visuals To Apply Please provide: Examples of similar AI or RAG projects A brief outline of how you would approach improving our bot Your hourly rate or project-based pricing...
...years multi-agent systems Type: Contract ROLE SUMMARY We are seeking a highly experienced Senior AI Engineer to lead the development of production-grade multi-agent AI systems, backend services, LLM orchestration, and full-stack AI-driven product experiences. The ideal candidate possesses deep technical expertise across Python backends, multi-agent workflows, LLM integrations, RAG pipelines, multimodal processing, and frontend engineering. KEY RESPONSIBILITIES ● Design and implement scalable multi-agent architectures: supervisor patterns, orchestrators, shared memory/state, workflow dependencies, checkpointing, retries, and debuggability. ● Build agent-driven coding workflows with hooks, background tasks, and toolchains integrating AI coding tools. ● Develop high-performance Pyth...
Describe what you need I’m building a system that can sense emotional signals in a live conversation — from audio, video and speech — and return a synchronized emotional stream for a weekly podcast. I need one engineer who can build a real-time multimodal pipeline from scratch. The role is hands-on: prototype fast, ship weekly improvements, and make it work end-to-end. This is inference only, not model training. The System (High-Level) The pipeline will: Capture 2 video feeds from cameras extract facial/body emotional signals timestamp frames Capture audio input from a dual mic receiver run emotion model track tone/tension/stress cues timestamp stream Run Whisper (or similar) in real time speech-to-text confidence scores timestamped text segments S...
need this done in ONE day 60-Second Fast-Paced Product Demo (Travel Tech Platform) PROJECT OVERVIEW BookSmart24 creates multimodal routes (Train replaces Flight / Train→Plane combos). We need a 60-second horizontal demo showing these functions in a fast, clean, TikTok-paced style – but with a professional, investor-grade look. WHAT YOU WILL DO 1. Record our UI (we give route instructions): – search input – loading animation (train + plane) – SmartChoice results – train→plane combined itinerary – unified checkout 2. Edit the video: – TikTok-style pacing (fast, crisp, smooth) – clean modern transitions – light zooms on UI elements – minimal text overlays – AI voice-over (script provided) – soft te...
...must be reported with standard clinical metrics—AUC, sensitivity, specificity—on a held-out test set. • I need concise documentation so that hospital staff can reproduce the results, plus a short technical report explaining the architecture choices and how the attention maps can be visualised for clinical insight. If you have prior experience with medical imaging, EEG feature engineering, or multimodal transformers, I’d like to see examples. Otherwise, let me know how you plan to tackle regulatory-grade data handling and the small-sample challenges inherent to psychiatric datasets. Deliverables that will mark the job complete: 1. Full, commented source code and environment file. 2. Trained model weights and a reproducible inference notebook. 3. Docu...
Project Title: Causal Multimodal Diagnostic Agent (Medical AI) – Code + Frontend + Research Paper Budget: 8,000 (Fixed Price) Deadline: December 20, 2025 (Strict) Project Overview: I am looking for an experienced AI/ML developer and researcher to build a Causal Multimodal Diagnostic Agent (CMDA). This system must integrate medical imaging (Chest X-rays) and clinical text reports to diagnose diseases, using causal graph learning to eliminate spurious correlations. The project requires delivering a fully working codebase, a basic interactive frontend for testing, and a complete, high-quality research paper suitable for publication. Key Technical Requirements (Based on Project Design) Multimodal Inputs: Image Encoder: ResNet50 or Vision Transformer (ViT) for...
We need a developer to build a fully offline AI companion pipeline that integrates directly with Unity. The system must include a Python function that uses Qwen2.5-VL to generate a clean, one-sentence caption from a base64-encoded screenshot using the correct Hugging Face multimodal workflow. A local RAG component (FAISS or Chroma) should preload our text documents, embed them locally, and retrieve the most relevant chunks using both the scene caption and the player’s question. A final response generator must then combine the caption, the retrieved RAG context, and the player’s query to produce a concise, grounded, one-sentence answer from the AI companion. On the Unity side, we need InputActions for the Q key, screenshot capture and base64 encoding, a UnityWebRequest PO...
I need a skilled expert to implement and publish a CNN and GNN-based fusion model on a multimodal dataset of images and text. The primary goal is to improve classification accuracy. Requirements: - Expertise in CNNs and GNNs - Experience with multimodal datasets - Strong background in image and text data processing - Proven track record in model implementation and publication Please include relevant past work and experience in your application.
We are an AI consulting services company with several potential clients in our sales pipeline. We aim to be the single 'source of truth' for clients when it comes to creating bespoke AI automation strategies and products. For our first client, we are developing a multimodal conversational AI app with real-time chat, secure payments, session-based billing, wallet logic, transcript storage, and strong privacy controls. We already have the frontend foundation; we now need a skilled engineer to harden the backend, integrate payments, and make the application secure, scalable, and deployable. This role requires someone comfortable with Node/Express or FastAPI, secure payment integrations, LLM proxying, and production-grade backend architecture. Here are the core responsibili...
We need a developer to build a fully offline AI companion pipeline that integrates directly with Unity. The system must include a Python function that uses Qwen2.5-VL to generate a clean, one-sentence caption from a base64-encoded screenshot using the correct Hugging Face multimodal workflow. A local RAG component (FAISS or Chroma) should preload our text documents, embed them locally, and retrieve the most relevant chunks using both the scene caption and the player’s question. A final response generator must then combine the caption, the retrieved RAG context, and the player’s query to produce a concise, grounded, one-sentence answer from the AI companion. On the Unity side, we need InputActions for the Q key, screenshot capture and base64 encoding, a UnityWebRequest PO...
I am looking for an experienced Deep Learning / Medical AI Expert to develop a complete multimodal pipeline for early-stage (prodromal) Parkinson’s disease classification. The project consists of four phases: MRI data curation + 3D CNN imaging-only model Explainability using LRP Clinical feature-based model with attention
We are looking for a technology commercialization and IP licensing specialist to help bring a patent‑pending, multi‑module AI system to market through licensing, joint ventures, or strategic partnerships. The invention introduces a predictive AI engine that anticipates user intent, refines prompts before execution, and delivers real‑time, multimodal results—defining a new class of adaptive AI. The system is patent pending (U.S. Non‑Provisional) and supported by a trademark filing for “IQ Prompt.” Key Modules: Predictive Interaction Engine, Recursive Refinement Core, IQPROMPT Widgets, Predictive Keyboard, and Developer Integration API. Markets: SaaS, AI assistants, workflow automation, and adaptive interface systems. Your Role: Identify markets and potent...
Qualitative research to code 10 short TikTok videos using ATLAS.ti. The task involves watching each video, transcribing spoken and written text, and coding multimodal features (linguistic, visual, auditory, gestural, and spatial modes). The goal is to identify and organize recurrent multimodal patterns across videos and able to deliver a completed project file with a summary of the codes and patterns.
I need a very straightforward Wix site built to satisfy an assignment requirement. The focus is on speed and clarity rather than advanced features, so a clean, well-structured template is fine. Scope • Set up a basic multi-page layout (home plus up to three inner pages). • Apply a cohesive color scheme and typ...environments that promote language development and communication should be outlined. A full list of resources used to develop the content of the website must be provided. The website should include: • A homepage introducing the purpose of the site • Clear headings and logical navigation suitable for the audience (families) • Accessible and inclusive language suitable for the audience (families) • Multimodal content • A dedicated “R...
I need a complete n8n workflow that lets end–users drag-and-drop or otherwise upload their own files—PDF, voice recordings, DOCX, Excel, JPEG, and video—and have every item automatically stored, indexed, and ready for Retrieval-Augmented Generation (RAG) queries. The flow should: • Ingest each file type on upload. Appropriate forms should be provided. • Extract the raw content and perform text, image, and video analysis as appropriate. • Capture key metadata, at minimum the original creation date and a concise description of the content, so later prompts can filter or rank results. • Persist the embeddings and metadata in a store that works smoothly with RAG (I’m open to your preferred vector database). • Expose simple n8n n...
We are seeking experienced Math educators and subject matter experts to develop a set of original, high-level question & answer pairs, at least half of which must be multimodal - containing essential visual components. This project supports the evaluation and training of advanced AI systems through the creation of reasoning-intensive, graduate-level problems that are resistant to surface-level AI solutions. Project Overview Our goal is to build a challenging and novel dataset that pushes the boundaries of current AI capabilities in question answering. The questions must test deep conceptual understanding, multi-step reasoning, and problem-solving mathematics. At least 50% of the questions must incorporate visuals (e.g., graphs, diagrams, models) that are essential to solving t...
...supplier credibility, quality standards, pricing, and compliance with international regulations (e.g., customs, tariffs, and ethical sourcing). • Transportation and Logistics Optimization: Analyze and design the most cost-effective shipping routes and methods from China to Jamaica, including sea freight (e.g., via ports like Shanghai or Ningbo to Kingston), air freight for time-sensitive items, or multimodal options. Evaluate factors like transit times, fuel costs, and potential delays to ensure the lowest overall cost. • Cost-Cutting and Profit Maximization Strategies: Develop innovative ways to reduce expenses, such as negotiating bulk shipping rates, leveraging free trade agreements (e.g., under the WTO or bilateral deals), consolidating shipments, using warehousing ...
...supplier credibility, quality standards, pricing, and compliance with international regulations (e.g., customs, tariffs, and ethical sourcing). • Transportation and Logistics Optimization: Analyze and design the most cost-effective shipping routes and methods from China to Jamaica, including sea freight (e.g., via ports like Shanghai or Ningbo to Kingston), air freight for time-sensitive items, or multimodal options. Evaluate factors like transit times, fuel costs, and potential delays to ensure the lowest overall cost. • Cost-Cutting and Profit Maximization Strategies: Develop innovative ways to reduce expenses, such as negotiating bulk shipping rates, leveraging free trade agreements (e.g., under the WTO or bilateral deals), consolidating shipments, using warehousing ...
...genres and evidence. Critical Thinking: Papers should summarize, analyze, and synthesize ideas fairly, show awareness of context and multiple perspectives, and incorporate credible sources. Writing Processes: Documents should reflect multiple drafts, invention strategies, revisions, and collaborative development. Conventions: Proper organization, formatting, grammar, syntax, punctuation, and multimodal awareness are essential. Confidence and Ownership: Writing should show a clear voice, strong style, and the author’s ownership of arguments and perspectives. Deliverables: 1. Edited versions of each individual paper (tracked-changes copy + clean copy). 2. A master portfolio document (print-ready PDF). 3. All working files for future updates. Requirements: Exper...
Deep Learning & Modeling: This includes training the Conditional Progressive Transformer, performing CSLR pre-training, implementing baseline models (DANN/JAN, TCN/RNN), and generating the synthetic continuous corpus. Required skills are PyTorch or TensorFlow, Transformers, ST-GCN, and 3D CNNs. Computer Vision (CV): This covers multimodal data preprocessing for RGB, Depth, and Skeleton streams, extraction of Non-Manual Features (NMFs), and simulation of occlusions. Required skills are OpenCV, MediaPipe (or similar), and Python programming. Statistics & Analysis: Tasks include computing FID and pose-level MSE for the generated corpus, conducting significance testing, calculating bootstrapped confidence intervals, and performing ablation studies. Required skills are Python o...
...TikTok, Facebook and WhatsApp Business channels/chats, or saves them for manual approval. The workflow should be fully automated but include optional checkpoints after text, image and video generation. Background: There is already an automation (code on GitHub) that partially implements this workflow. It needs to be modernized: rather than relying on external cloud models, it should use local multimodal models like Wan 2.2, Hunyan or other available models. The uploaded documents (“Social Media Devotional Workflows”) define example processes with prompts, LoRAs, sampling settings and social media tips. Tasks / Deliverables Data import: Read devotional data from Excel/CSV (titles, Bible verses, devotional text, English/German verses, positive/negative prompts for ea...
I have an in-house data that blends text, video and image content and I am ready to push a GPT-class transformer beyond pure language. What I need is a specialist who already speaks the language of DPO and GRPO and can translate those techniques into a practical training pipeline. The objective is straightforward: take an existing open-weight model, fine-tune it on my multimodal set, and return a checkpoint that outperforms vanilla GPT on the tasks my team actually cares about. You will be free to choose between TensorFlow or PyTorch for the heavy lifting—both are wired into our environment—so long as the final codebase is reproducible on standard CUDA hardware. The data are pre-sharded; you will focus on building the loaders, aligning the modalities, and steering the o...
...moving into data analysis and visualisation with Jupyter, pandas, NumPy, Matplotlib and Seaborn. By the end of this track every attendee should be comfortable cleaning data, building exploratory dashboards, and writing production-ready scripts. Generative AI section The second track focuses on the hands-on use of GenAI. Instead of deep theory, we’ll dive straight into how language, vision and multimodal models solve everyday problems in finance, retail, healthcare and more. Participants will build and deploy small-scale projects with popular frameworks such as OpenAI’s API, Hugging Face Transformers, and LangChain, learning prompt-engineering techniques, fine-tuning workflows, and ethical guardrails along the way. Deliverables • A structured curriculum that...
I need an intermediate-level, causal multimodal agent that can take raw chest X-ray images, link them with the corresponding clinical notes and output a clear diagnosis pipeline. The workflow must cover three core functions: automatic image analysis, robust disease prediction focused on thoracic findings, and the generation of both text and graphical reports (heat-maps, saliency overlays, or similar visual explanations). All processing will involve X-rays only; CT and MRI are outside the project scope. The system must ship with fully commented, runnable source code and deliver reliable, end-to-end results on a small validation set so I can demonstrate functionality immediately. Deliverables • Clean, modular code (Python preferred) that loads chest X-rays, parses the paired ...
...model interprets it, generates a response, and possibly invokes “tools” (e.g. web search, document analysis, plugin APIs) as needed. The system uses safety and moderation filters to prevent disallowed content. Wikipedia In some cases, the app or backend uses “memory” (i.e. context saved across sessions) to remember details you ask it to, so future interactions are more personalized. 3. Multimodal Input and Output Text: You type a prompt and get a text response (classic chat). Voice / Speech: You can speak your question and have ChatGPT speak back (i.e. the app uses speech recognition and text-to-speech). Image Input: You can upload or take photos; ChatGPT can analyze them (e.g. “what’s in this image?”, “read text from thi...
...Installable application (desktop or web) with the full feature set above. 2. Source code and build instructions. 3. A concise user guide that walks through creating both a short film and a short cartoon from mixed media. 4. Post-delivery support for setup issues and a quick patch window for critical bugs discovered in the first month. If you have prior work in AI-driven video editors, multimodal diffusion models, or real-time animation pipelines, highlight it—those skills will be vital for achieving the top-quality finish I’m aiming for....
Objective: Develop a multimodal emotion recognition system that integrates audio, video, and text modalities using advanced deep learning models, cross-modal fusion, and meta-learning techniques (MAML/Reptile). Responsibilities: Implement feature extraction pipelines using pre-trained models: Visual → Vision Transformer (ViT) for facial features Audio → Wav2Vec 2.0 for speech features Text → BERT for contextual embeddings Design and implement a Cross-Modal Transformer with cross-attention for fusion of modalities. Integrate a meta-learning framework (MAML/Reptile) for few-shot adaptation. Preprocess datasets (IEMOCAP, CMU-MOSEI, MELD, etc.) and handle data imbalance. Optimize model for real-time and efficient processing (lightweight models, pruning, frame selection). ...
...AI avatar (voice + synchronized animation). • Real-time interaction (questions to the student + answers). • Basic features: quizzes, progress tracking, dashboard for student and parent. Indicative timeline: 4 months. Full Platform Development After MVP validation, the client plans a dedicated budget for extending the project with all planned features (adaptive learning, advanced gamification, multimodal content, etc.). Request for Quotation The client requests from the provider: • A detailed quotation for the development of the demo and the MVP with an interactive avatar. • An estimated delivery timeline. • Any technological proposals (AI stack, voice, avatar, cloud). Additional Requirement To ensure the right developer or software company is selected,...
...stack (LiveKit, Twilio, WebRTC). Optimize latency, scalability, call flows, and bandwidth for healthcare-grade reliability. Collaborate with data engineers for ML pipelines & real-time processing. Ensure HIPAA-compliant handling of data across AI and telephony systems. Troubleshoot, monitor, and continuously improve AI/voice services in production. Research and implement new trends in GenAI, RAG, multimodal AI, and RTC infrastructure. Requirements: 5+ years in AI/ML engineering or VoIP/RTC engineering (candidates with hybrid skills strongly preferred). Hands-on with: AI/ML Tools: Gemini, GPT-style models, STT/TTS, TensorFlow/PyTorch, LangChain, HuggingFace. Voice/RTC Tools: LiveKit, Twilio APIs, WebRTC, SIP, RTP/RTCP, STUN/TURN servers. Strong coding skills in Python (plus...
...Company financial reports • Economic indicators I will supply sample datasets as CSV, XBRL, and API endpoints; you may suggest additional publicly available feeds if they strengthen performance. Scope • Design the data pipeline, including cleaning, feature extraction, and secure storage. • Fine-tune or train an LLM (e.g., GPT-J, Llama-2, or a comparable open-source model) to handle multimodal numeric-text inputs. • Implement evaluation routines—back-testing for market forecasts, MAPE or SMAPE for revenue projections, and directional accuracy on macro indicators. • Expose the model through a lightweight REST or gRPC service with clear inference examples in Python. • Provide concise documentation covering setup, data refresh, and ...
...We are looking to leverage the power of Google's Gemini AI to create an intelligent automation and data analysis layer on top of our QuickBooks Online account. This project is central to our strategy to increase operational efficiency and unlock data-driven insights. The project has two primary, interconnected components: Automated Invoice Processing: An end-to-end workflow that uses Gemini's multimodal capabilities to scan vendor invoices (PDFs/images), extract data, and automatically create corresponding Bills in QuickBooks. Natural Language Data Analysis: A backend system that allows us to ask plain-English questions about our financial data (e.g., "What were our top 5 expenses last quarter?") and receive intelligent, summarized answers powered by Gemini an...
... LLaMA 3, Mistral). Implement effective prompt strategies and retrieval-augmented generation (RAG) pipelines for contextual responses. Data Pipelines & Knowledge Management: Build secure data pipelines to ingest, embed, and serve tenant-specific knowledge bases (FAQs, scripts, product docs) using vector databases (e.g., Pinecone, Weaviate). Voice & Text Interfaces: Implement and optimize multimodal agents (text + voice) using ASR (e.g., Whisper), TTS (e.g., Polly), and NLP for automated qualification and call handling. Conversational Flow Orchestration: Design dynamic, stateful conversations that can take actions (e.g., book meetings, update CRM records) using tools like LangChain, Temporal, or n8n. Platform Scalability: Ensure models and agent workflows scale acr...
a multi-intelligent-agent investment decision-making framework that includes analysts, fund managers, and risk controllers throughout the entire process. This framework simulates an efficient, rational, and disciplined research organization. By analyzing K-line charts, technical indicators, news texts, financial reports in PDF format, research reports, and other multimodal information, it generates investment recommendations and provides strong technical support for investment decisions. Through this framework, comprehensive investment analysis reports can be quickly generated, which not only include research perspectives from different types of intelligent agents but also contain a variety of quantitative indicators of multiple types.
Freelancer announces the winners of the Find Me on the Moon: NASA Lunar Navigation Challenge.