Speech synthesis jobs
...Responsibilities The freelancer will help complete and finalize the Android version of the application. Tasks include: Android Build & Environment Troubleshoot Android builds in Android Studio Resolve Gradle or dependency conflicts Configure Capacitor Android integration Fix any build or runtime errors App Functionality Assist with troubleshooting Android-specific issues such as: Text-to-Speech announcer functionality Avatar rendering or loading In-app purchase validation Device compatibility issues Google Play Store Preparation Configure Google Play Console integration Set up service account / API keys Verify AAB bundle signing Assist with internal testing deployment Testing Ensure app works correctly across Android devices Fix crashes or UI inconsistenc...
This project is about ...the Desk Hub, a compact, high‑end box that lives on a desk like a small instrument. It has a small front display, a few physical buttons, a subtle light ring or LED bar, a microphone, and a speaker. Internally it runs on an ESP32‑class microcontroller with Wi‑Fi and Bluetooth. The Hub is always powered via USB‑C and always connected to my cloud backend, where all speech‑to‑text, language model processing, tools, and text‑to‑speech live. From the user’s perspective, they can press a button or say the wake word (for example “Hey Assistant”), speak naturally, and hear/see the assistant’s response. The screen shows short status messages like “Listening”, “Thinking”, “Reply sent to email”, or simp...
I have a text-only manuscript in PDF format that must become fully accessible. The top priority is seamless text-to-speech support: every heading, paragraph, footnote and page break needs to be properly tagged so a screen-reader voices the content in the correct order without odd pauses or repeated characters. Because many readers enlarge text, the file also has to accommodate adjustable text size; when someone zooms to 200 % or higher the layout should reflow cleanly, with no need for horizontal scrolling. No images, tables or equations appear in the document, so your attention can stay on accurate structural tagging, embedded fonts and metadata. Deliverables • A single PDF/UA-compliant, fully tagged PDF that passes Adobe Acrobat Accessibility Checker and PAC 3 • T...
Integrated Treatment Services is a UK Speech and Language Therapy service working with schools, therapists and families across the country. We are seeking a Digital Marketing & Content Assistant to support the publishing and organisation of our marketing and website content. The Director regularly develops ideas, course content and marketing campaigns using tools such as ChatGPT and other AI tools. Your role will be to turn this content into polished, published material across our website, email campaigns and social media platforms. This role focuses on implementation and organisation, rather than marketing strategy. Hours Approximately 4–8 hours per month - open to UK based freelancers only due to the aspect of the role and safeguarding content. Workload may occasional...
We are looking for native Dutch speakers to participate in a voice recording project. The task involves recording short sentences using a mobile application. The collected recordings will be used to improve speech recognition and language technology systems. The task is simple and can be completed within 30–45 minutes.
I’m building a bilingual speech dataset for an AI project and need a voice that sounds natural in both Hindi and English. You’ll receive short, ready-made scripts; all you have to do is read them exactly as written, in a calm, neutral tone, and supply the recordings as clean WAV files. Quality matters more than studio gear. As long as you record in a quiet room, keep background noise to a minimum, and export at 44.1 kHz / 16-bit WAV, we’re set. No post-processing is necessary beyond cutting out mistakes and long silences—I need raw, intelligible takes with clear pronunciation. Deliverables • One WAV file per sentence, named according to the script ID • A quick note if any sentence felt awkward or unclear while reading, so I can adjust the prom...
...automatically create an Odoo record. Captured fields: Field Description caller_number incoming phone number contact_id matched contact staff_member answering user call_duration seconds call_recording audio file transcript speech-to-text ai_summary AI generated summary call_category IVR selection 15. Call Recording Storage Call recordings must be stored as Odoo attachments. Flow: Twilio → middleware → Odoo attachment 16. Call Transcription Pipeline: Twilio recording created Middleware downloads audio Audio sent to OpenAI transcription Transcript stored in Odoo Note: Speech recognition for Maltese may be unreliable. Fallback procedure: store recording allow manual staff summary 17. AI Call Summary The AI system must generate a structured summary. ...
I have a video of myself delivering a speech and want the entire piece re-imagined as a Pixar-inspired 3D animation. The final cartoon should keep my original voice track and match every gesture, pause, and facial movement, so lip sync and timing need to feel natural. The character must look unmistakably like me while living comfortably in a Pixar-like world: large expressive eyes, clean stylised surfaces, and rich, film-quality shading. I’m after a highly detailed, realistic finish rather than something simplified. Please pay special attention to my clothing and accessories; these items are the main visual cues I’d like pushed forward. Core deliverables • A fully-rigged 3D model that resembles me in Pixar style, with clothing and accessories accurately recreate...
I have a manuscript that needs to move from paper (or basic digital) into an accessible, reader-friendly PDF. My priority is accessibility: the final file must be fully searchable and compatible with text-to-speech software so that visually-impaired readers can navigate it with ease. Here is what I need from you: • A clean, professionally laid-out PDF created from my original manuscript. • OCR or native text formatting so every word is searchable. • Proper tagging and structure for screen-reader / text-to-speech tools (headings, alt-text where images appear, logical reading order). If you already work with Adobe Acrobat Pro, InDesign, or similar accessibility-focused tools, that will help you hit the required standards quickly. I am open to your creati...
I'm looking for a live-action promotional so 30 second video to boost brand awareness on social media. The video should be engaging, high-quality, and tailored for platforms like Instagram, Facebook, and Twitter. Key Requirements: - Live-action footage - Engaging and high-quality production - Tailored for...Facebook, and Twitter. Key Requirements: - Live-action footage - Engaging and high-quality production - Tailored for social media platforms - Strong storytelling and brand messaging Ideal Skills and Experience: - Experience in producing live-action promotional videos - Expertise in social media content optimization - Professional video editing skills - Creative storytelling abilities - Voice over & speech text shown on screen Please share relevant portfolios and pa...
Our company has developed a dictation and translation platform tailored for the Maharashtra market. The software converts speech to text and offers instant translation across Marathi, Hindi, and English, making it ideal for professionals who juggle multiple languages every day. The commercial model is straightforward: you earn purely on commission for every paid licence or subscription you close. I will supply product training, demo accounts, marketing collateral, and prompt technical support; you bring the drive, local network, and on-ground selling skills necessary to turn interest into revenue. Key expectations • Source and qualify leads throughout Maharashtra • Conduct in-person or virtual demos, highlighting the multi-language (Marathi-Hindi-English) dictation and...
...settings that feel lived-in. Team Up: Work closely with our artists and designers to make sure your script translates perfectly to the final page. Control the Pacing: Know how a comic page breathes—when to drop a massive splash page for impact and when to use tighter panels for conversations. What We're Looking For Comic Experience: You need to know how comic grammar works (panels, gutters, speech balloons). Screenwriting helps, but knowing how to write specifically for a visual, static medium is a must. Genre Fan: You should genuinely love the medium, from mainstream epics to the daring creativity of the indie and NSFW scenes. Comfortable with NSFW: You need to be 100% comfortable writing explicit adult content and able to switch effortlessly between that and sta...
I’m building a real-time, two-way translator dedicated to business calls and need an expert who can take it from concept to a working product. The first release must handle English ⇄ Spanish flawlessly, converting speech to text, translating it, then rendering clear synthesized speech back to both parties with minimal latency. Compatibility is non-negotiable: the same core engine has to run inside a web browser, a mobile app (iOS and Android), and a lightweight desktop client. I value reusable backend services—WebRTC for voice transport, a robust ASR + NMT pipeline (DeepSpeech, Whisper, or similar paired with a proven translation model), and near real-time TTS. Security, call recording toggles, and an admin dashboard for basic analytics should round out the fea...
I’m collecting clean, native-accent Dutch speech for a voice-recognition training set. Once you join the project I’ll send you a small mobile app; each sentence pops up on your smartphone screen and you simply read it aloud, one by one, in a calm, natural tone. What I need from you • 251 separate recordings, captured directly in the app. • A quiet environment so there is no background noise competing with your voice. • A consistent speaking pace—no rushing or theatrical pauses. The app labels and uploads every clip automatically, so there’s no manual file handling. I’ll verify the audio quality after a short 20-sentence pilot you submit first; if everything sounds clear and natively Dutch, you’ll continue with the remaining ...
I need an experienced audio engineer to clean up a radio speech recording. The audio has issues with background noise, echo, and distortion. The final cleaned recording should be in MP3 format. Ideal Skills and Experience: - Proficiency in audio editing software (e.g., Audacity, Adobe Audition) - Experience in noise reduction and echo cancellation - Ability to work with distortion and deliver high-quality audio - Attention to detail and good listening skills Please provide samples of previous work and estimated turnaround time.
...Participate in a brief recorded or live phone/voice call • Speak naturally using everyday Argentine Spanish, including local slang and expressions (lunfardo and regional colloquialisms are a plus) • Follow simple conversational prompts during the call REQUIREMENTS • Must be a native Argentine Spanish speaker (born and raised in Argentina) • Must be fluent in authentic local slang and informal speech patterns • Reliable internet or phone connection • Available for a scheduled call within the next few days • No prior freelancing experience required PROJECT DETAILS • Duration: 10–15 minutes (one-time) • Format: Phone or voice call • Language: Argentine Spanish only WHEN BIDDING, PLEASE INCLUDE 1. Which city/province of Argen...
1. Project Overview Thank you for participating in the Dutch Speech Data Collection Project. All recordings must strictly follow the standards below. Any deviation may result in rejection or reduced payment. 2. Recording App - All recordings must be completed using the our designated app. - Ensure the app is updated to the latest version. - Log in with the correct assigned ID. - Record only your assigned sentences. - Do not share your account or tasks. 3. Participant Requirements 3.1 Language Requirements - Must be a native Dutch speaker. - Must be born and raised in the Netherlands (or specified Dutch-speaking region). - Must use standard Dutch pronunciation. - Strong regional accents are not allowed. - Do not mix other languages. 3.2 Voice Requirements - Voice must be clear, n...
I need three distinct background graphics that will frame my short English-learning podcast episodes on YouTube and Instagram Reels. The overall vibe should feel modern and sleek, not cartoonish, so think clean lines and a balanced composition. Content & style • Imagery that instantly signals “learning”: subtle icons like graduation caps, open books, light bulbs, or speech bubbles can sit in the corners or form a repeating motif. • A few abstract shapes and patterns will help break up any flat areas and add depth without distracting from the speaker’s video window. • Including small photographs or stylised illustrations of books or notebooks is welcome, provided they harmonise with the icons rather than compete with them. Colour direction...
...visit with a historical figure played by AI-generated video. You generate the character clips, assemble the scene, add subtitles for non-English dialogue, and deliver a finished episode. What You Need to Know Descript — transcript-based editing, multi-track assembly Runway Gen-4 or Kling or similar — AI character video generation, scene consistency across clips ElevenLabs or similar — voice synthesis and lip sync for AI characters Green screen compositing — I shoot on green screen in my studio Midjourney or equivalent — character portrait generation Short-form clip extraction for Instagram/YouTube Shorts What I'll Send You Raw camera footage (DSLR, DJI lav mic, green screen) Slide screen recording Script with speaker notes Character reference ...
...assignment and need a full literature review written to APA 7th-edition standards. The discipline is social psychology; I am still weighing up whether to centre it on group dynamics, social influence, or interpersonal relationships, so please be ready to recommend which of those themes offers the richest and most current body of research. What I expect from you is a carefully structured, critical synthesis of recent peer-reviewed journal articles (books or conference papers may be cited only when they add clear value). Every citation, heading level, table, figure, and reference entry must comply with APA 7 formatting—including running head, page numbers, and correct DOI presentation. Deliverables • A polished literature review (Word document) at a postgraduate standa...
...clinician speech into text, 2) recognise medical terminology with clinical-grade precision, and 3) push structured data straight into an EHR through FHIR or HL7. Transcription accuracy must sit in the 95–100 % range, so please be comfortable fine-tuning models, adding language-specific dictionaries, and building post-correction logic. The system must be entirely web-based, deployable to my cloud account, and compliant with HIPAA best practices. Python, FastAPI, React, WebRTC, or comparable stacks are all acceptable as long as latency stays low and security stays tight. Start your message with the name of the similar project you have already delivered and, if possible, a demo link or client reference—this will be my first filter. Deliverables • Production-ready...
I...mixed-sensitivity formulation, and tune the H∞ controller so it meets industry-standard robustness and performance margins. Please implement and simulate everything in MATLAB/Simulink, documenting the exact toolbox functions and scripts you use so I can reproduce every step. When the design is closed, deliver: • Clean, well-commented MATLAB/Simulink files for the plant model, weighting functions, synthesis, and time-/frequency-domain analyses • A concise technical note that explains design choices, achieved γ value, gain/phase margins, and key plots (step, Bode, Nichols, Monte-Carlo uncertainty sweeps) I will run an independent Monte-Carlo batch on my end; acceptance hinges on the controller keeping the closed-loop system stable and within my envelope ...
...feature is hands-free use: I want to press one button, speak naturally—“remind me to drink water at 3 p.m.”—and have the app parse the command, schedule it, and confirm out loud or with a brief on-screen prompt. Think of the voice flow you experience with Alexa or Google Assistant; I’m after that same friction-free feel, but inside a dedicated app. Scope of the first release • Voice capture and speech-to-text using any reliable low-code or no-code tools you’re comfortable with (Flutterflow, Adalo, React Native with Expo Voice, etc.). • Time-based reminders only, with options for one-off or recurring schedules (daily, weekly, custom interval). • Local notifications that fire even if the device is locked, plus a concise hist...
...With VoiceUI, users can: • navigate a website using voice • control sections and page scrolling • open elements of the interface • fill forms using natural voice commands Instead of traditional clicking, a website can be operated entirely by voice. Technology VoiceUI is implemented directly in the website code, not as a plugin. Main technologies used: • • React • TypeScript • Web Speech API VoiceUI is part of the front-end architecture of the website. Important VoiceUI is implemented only in new websites built from scratch (greenfield projects). The technology is not integrated into existing websites. Each project is developed from start to finish with full control over architecture and code. Project Value Typical VoiceUI proj...
I am coordinating a long-term voice-data collection initiative and need a lot of native speakers of Vietnamese, Indonesian, Lao, and Burmese. The task is straightforward: you will read prompted sentences in your own ac...The task is straightforward: you will read prompted sentences in your own accent, capture them in a quiet environment, and provide the raw audio. Requirements: No voice-over experience is required. Simply read aloud from the text; no emotional expression is needed. You can record using a computer or a mobile phone. 100 % native pronunciation for the listed languages. The use of AI-generated speech is strictly prohibited. Please review the attached test file. If you are capable of undertaking this task, please complete and reply. We look forward to hearing fro...
I'm looking for a professional to evaluate the impact of new carpet flooring on our church's audio quality in the main sanctuary hall. We are considering tile and/or ca...flooring after it is determined that there is no danger of asbestos contamination. We would like a consultation on what direction we should take with the new flooring. The demographic of our congregation is mostly elderly, for which we would like to choose flooring that will not be detrimental to the sound quality of our audio equipment. Key concerns include: - Echo and reverberation - Clarity of speech Ideal skills and experience: - Acoustic engineering - Experience with church audio systems - Knowledge of flooring impact on sound Please provide a detailed assessment and recommendations to mitigate a...
The proposed translational pipeline for peptide‑based AD therapeutics, beginning with AI‑enabled target identification and de novo sequence generation, followed by medicinal chemistry optimization and stability engineering, scalable synthesis (including fast‑flow platforms), BBB‑competent delivery strategies, and finally clinical trial design and regulatory evaluation. Feedback arrows indicate that data from in vitro assays, in vivo models, and early‑phase trials iteratively refine upstream AI models and design criteria.
...generate videos without technical skills. 5. **Documentation & training:** * Deliver clear instructions for generating videos, managing the platform, and integrating new AI models if needed. * Optional: a short tutorial video for my team. **Skills Required:** * Experience with **self-hosted AI/UGC video platforms** (OneUGC Studio or similar). * Knowledge of **text-to-video, avatars, voice synthesis, and AI video generation**. * Familiarity with **API integration and server deployment** (VPS, AWS, DigitalOcean). * Experience in **cost-optimization for AI platforms** is a strong plus. **Budget:** Looking for a **one-time setup solution**. Must ensure **running cost of AI/video generation ≤ $50/month**. **Timeline:** Setup should be completed **within 1 weeks**. ...
I’m collecting a set of 300 short Thai sentences for speech-training research and I’d like a native speaker to record them directly in our mobile app. You’ll be working with a smartphone—any operating system is fine, even alternatives beyond iOS or Android—and you should be able to record in a completely quiet space so there’s no background noise on the clips. Once you accept, I’ll send login credentials, a step-by-step recording guideline, and a link to the app. The workflow is straightforward: open the script inside the app, tap to record each sentence, review the waveform for clarity, and save. The system automatically uploads every take, so there’s no post-processing required on your side. Deliverable • 300 clearly spoken ...
...the video: Intro screen "New company registered" Company information Company Name City / Canton Industry Closing screen "Source: Commercial Register Data" Target video length: 10–20 seconds Technical Requirements The solution should preferably use: Python or Node.js Possible tools / libraries: OpenAI API (text generation) FFmpeg or programmatic video generation YouTube Data API Text-to-speech engine (optional) The system must run automatically via script or scheduler. Important We are NOT looking for manual video editing. This must be a fully automated system. Deliverables Working script or application that: * processes input data * generates videos * uploads them to YouTube automatically Source code must be provided. To Apply Please...
I’m collecting 360 short “wake-up word” sentences for a speech-recognition dataset and need native voices from the UK or Canada. The process is straightforward: install my mobile app, follow the brief in-app tutorial, read each sentence exactly as shown, and submit. Most participants finish in 30 minutes or less. Any age is welcome. While I don’t insist on studio-level silence, please choose a reasonably quiet spot so the words are clear. Deliverable: • One complete session of 360 correctly read sentences uploaded through the app. Once I receive and verify the submission I release payment immediately, so you can start and finish today.
I have a continuous stream of general german recordings that need to be converted into polished, error-free text. Your day-to-day work will follow three clear steps: first, create an accurate transcript from scratch; next, open the automatic speech-recognition draft in our online portal and correct every mismatch; finally, rate the transcript against the guidelines using the built-in rubric. All recordings are general-topic conversations, so you should feel comfortable handling anything from casual chats to short interviews. The interface, instructions, and rubric are entirely in English, which is why solid written English matters almost as much as native-level German. Deliverables for each assigned file: • A clean, time-aligned German native transcript • An AI-correct...
I have a continuous stream of general Bengali recordings that need to be converted into polished, error-free text. Your day-to-day work will follow three clear steps: first, create an accurate transcript from scratch; next, open the automatic speech-recognition draft in our online portal and correct every mismatch; finally, rate the transcript against the guidelines using the built-in rubric. All recordings are general-topic conversations, so you should feel comfortable handling anything from casual chats to short interviews. The interface, instructions, and rubric are entirely in English, which is why solid written English matters almost as much as native-level Bengali. Deliverables for each assigned file: • A clean, time-aligned Bengali transcript • An AI-corrected v...
...Core workflow • The dialer should pull a lead from the CRM, place the call automatically, open a tailored script, and capture everything the prospect says through real-time speech recognition. • Natural language processing must understand intent and sentiment, while AI-driven responses keep the conversation moving until a meeting is booked or a hand-off to a human rep is required. • Once the call ends, the transcript, outcome, and next-step task should save to the same CRM record without anyone touching a keyboard. Required features – Automated dialing (single or progressive) with adjustable pacing – Speech recognition, NLP, and dynamic response logic running live during the call – Seamless two-way CRM integration for lead pull...
...make it fun, fresh, and genuinely entertaining from the first line to the closing call-to-action. The core brief is simple: shape my key ideas into a lively speech that resonates with teens and young adults, keeps their attention, and sparks conversation afterward. Expect plenty of pop-culture references, clear story arcs, and language that feels current without sounding forced. I’ll provide the high-level outline, personal anecdotes, and any factual points that must be woven in; you transform them into a polished script with natural transitions, humor, and an upbeat rhythm that works well on stage. Deliverables • One complete speech draft written for oral delivery • One round of revisions after my initial feedback • A brief speaker’s no...
We are looking for pairs of native English speakers from the United States of America to participate in an AI speech research project. The task involves recording natural, unscripted conversations with a partner. These recordings will be used only for AI speech training and research purposes. You need to find a friend for the conversation with you for the project. Important Eligibility Requirements : Participants must be native English speakers who grew up in the USA and learned English as their first language. Applicants who were born or raised outside the USA will not be accepted, even if they currently live in the USA. USA natives currently living abroad are also welcome to apply. We are specifically looking for participants with authentic native USA accents. Conver...
...(Microsoft 365 / Outlook) Provide general information about the company (FAQ-style responses) Send SMS confirmations for bookings, updates, or cancellations Optionally transfer calls to a human agent when required Required Technology Stack (Microsoft Solutions Only) The system should primarily use Microsoft technologies such as: Microsoft Azure Azure OpenAI Service Azure AI Speech (Speech-to-Text and Text-to-Speech) Azure Communication Services (Voice & SMS) Azure Bot Service / Conversational AI Azure Functions or Azure App Services Microsoft Graph API (for Microsoft Calendar integration) Azure Key Vault Azure Monitor / Application Insights Azure Storage or Azure SQL for secure data storage Microsoft Entra ID (Azure AD) for authentication Compli...
I’m collecting beginner-level Japanese↔English dialogue to expand a language-training dataset. I need two speakers to hold a natural conversation on six everyday themes—think travel, health, sport, film & television, and two similar topics you pick—recording 15 minutes per theme for a total of 1 ½ ...for Zoom, one for the recorder. Before the main session I’ll run a quick 2-minute test snippet right after the contract is awarded. Once the audio quality and flow are approved, we move straight into the full recording. Payment is a flat $60 per pair ($30 each). If you already have a partner, great—come as a team. If not, I’ll match you with another applicant. I’m looking for clear, relaxed speech, natural turn-taking, and no ...
I need an engaging, informative speech on health and wellness. The speech is for women over 30. Requirements: - Focus on health and wellness - Tailored for women over 30 - Engaging and informative Ideal Skills: - Experience in speech writing - Strong understanding of health and wellness topics - Ability to engage a mature audience
I’m collecting 360 short “wake-up word” sentences for a speech-recognition dataset and need native voices from the UK or Canada. The process is straightforward: install my mobile app, follow the brief in-app tutorial, read each sentence exactly as shown, and submit. Most participants finish in 30 minutes or less. Any age is welcome. While I don’t insist on studio-level silence, please choose a reasonably quiet spot so the words are clear. Deliverable: • One complete session of 360 correctly read sentences uploaded through the app. Once I receive and verify the submission I release payment immediately, so you can start and finish today.
I am assembling a new Spanish-language speech dataset and need a few native speakers from Spain to read and record 334 very short sentences. The whole session usually takes less than half an hour and can be done straight from your smartphone—iOS or Android, whichever you prefer. Here is what the job involves: • I will send you a link to our web-recorder. • You log in on your phone, read each sentence once, and submit when all 334 are finished. • Recordings must sound natural, with clear Castilian pronunciation and no background noise; please make sure you are in a quiet room before you start. Deliverable • One completed recording session containing all 334 sentences, accepted by our quality-control system. Once your session passes QC, I relea...
I'm seeking an experienced AI developer with a background in music technology. The goal is to create an AI system that can replicate the sound of my musical instrument with a focu...with a background in music technology. The goal is to create an AI system that can replicate the sound of my musical instrument with a focus on realistic sound production. Key Requirements: - AI to replicate sound production of my instrument. - Output must be realistic, not experimental or synthesized. - Tailored specifically for Funk genre. Ideal Skills and Experience: - Proficiency in sound synthesis and AI sound modeling. - Strong understanding of Funk music nuances. - Experience with music production software and tools. Looking for a developer who can deliver high-quality, genre-specific sou...
...across the major medical databases (MEDLINE, Embase, Cochrane, Scopus) plus grey-literature checks. Peer-reviewed journal articles are my primary focus, but I am open to conference abstracts or book chapters whenever they fill a gap in the data landscape. After the search you will handle screening, data extraction, risk-of-bias assessment, and synthesis. If the numbers allow, a meta-analysis would be ideal; otherwise a structured narrative synthesis is fine. Deliverables • Search strategy strings for each database • PRISMA flow diagram • Extraction table with study characteristics, measurement techniques, outcome metrics, and complications • Risk-of-bias/quality-assessment tables (ROB 2, NOS, or tailored tools as appropriate) • Draft man...
I’m building a browser based demo where a pre-made 3D character reacts to anything the user says. When the visitor presses a “talk” button, the microphone audio is captured and audio passed to the character’s mouth so the lips match the speech. Live audio to lip sync of character. Which could mean that the audio is analysed, phoneme / viseme data is extracted, and the character’s blend shapes / mouth poses update in sync. I’m open to whichever phoneme-detection approach you prefer so long as the lip sync looks believable. We can buy a 3d model / avatar if you share a link. It could be of just a head with the lips moving. Scope you’ll handle – Integrate Web Audio for capture. – Implement or wire in the phoneme-mapping log...
Data Collection Specification Mock-up call center conversation data 1. The speech data will be collected by data collectors with natural speaking by mocking-up the bank call center conversations between Client and Agent roles based on the provided script. 2. Script reading will be not acceptable. Collectors can get familiar with the script first, then understand it themselves, and then have a free conversation based on it as long as you don't stray from the topic. There should be tone words, natural pauses, overlapping speech included in the recordings. Please note, you cannot read the script as it is, otherwise we will consider it as invalid audio. 3. The data collectors should be native speakers. 4. One pair collector can have at most 60 minutes audio recordings....
...document and image understanding Implement Text-to-Speech (TTS), Speech-to-Text (STT), and Speech-to-Speech (STS) pipelines Fine-tune LLMs to create offline, self-hostable AI models Architect and develop a scalable backend system for AI workloads Create end-to-end AI pipelines optimized for performance and scalability Integrate third-party APIs and AI services where required Collaborate closely with product and engineering teams to turn ideas into working solutions Required Skills & Experience Strong experience with LLMs (OpenAI, LLaMA, Mistral, Gemma, etc.) Hands-on experience with RAG frameworks (LangChain, LlamaIndex, Haystack, etc.) Experience in OCR & Vision (Tesseract, OpenCV, Vision Transformers, multimodal models) Knowledge of sp...
Project: Custom Gemini-Powered App with Voice Recognition and Document Processing Goal: To develop a new application (or enhance an existing one) that integrates Gemini's intelligence with advanced voice capabilities. Key Features: Voice Interaction: Full voice-to-voice support. The app will capture user speech (Speech-to-Text), process it via Gemini, and provide both a written and a spoken response (Text-to-Speech). Custom "Gem" Logic: Replicating Gemini Gem functionality by providing custom instructions through System Prompts and a dedicated knowledge base. Data Ingestion: The ability to "train" or inform the AI's context using uploaded PDFs, text files, or live web links. Implementation: This can be built as a standalone application wit...
Our labeling process involves precise segmentation of audio waveforms, identification of the speaker’s role and relevant attributes based on their voice, and transcription of the speech using our TMS (Transcription Management System) platform. Please share any prior experience you may have with German text and audio annotation, if applicable. If you have any questions, feel free to contact us via Freelancer. We look forward to reviewing your application. Thank you.
...Core workflow • The dialer should pull a lead from the CRM, place the call automatically, open a tailored script, and capture everything the prospect says through real-time speech recognition. • Natural language processing must understand intent and sentiment, while AI-driven responses keep the conversation moving until a meeting is booked or a hand-off to a human rep is required. • Once the call ends, the transcript, outcome, and next-step task should save to the same CRM record without anyone touching a keyboard. Required features – Automated dialing (single or progressive) with adjustable pacing – Speech recognition, NLP, and dynamic response logic running live during the call – Seamless two-way CRM integration for lead pull...
I'm looking for a live-action promotional video to boost brand awareness on social media. The video should be engaging, high-quality, and tailored for platforms like Instagram, Facebook, and Twitter. Key Requirements: - Live-action footage - Engaging and high-quality production - Tailored for social media...Facebook, and Twitter. Key Requirements: - Live-action footage - Engaging and high-quality production - Tailored for social media platforms - Strong storytelling and brand messaging Ideal Skills and Experience: - Experience in producing live-action promotional videos - Expertise in social media content optimization - Professional video editing skills - Creative storytelling abilities - Voice over & speech text shown on screen Please share relevant portfolios and pa...