
Closed
Posted
Paid on delivery
Lead AI / Fullstack Engineer — Project "AZIZA" (Voice-to-Voice AI) Project Name: AZIZA Format: Project-based / Remote (with access to local GPU clusters) Tech Stack: PersonaPlex (Moshi-based architecture), PyTorch, TensorRT-LLM, FastAPI, WebRTC, Telegram Mini App (TMA). Hardware Location: Uzbekistan & Kazakhstan (TAS-IX), clusters powered by NVIDIA RTX 4090. Project Overview AZIZA is an innovative multimodal "Speech-to-Speech" (S2S) ecosystem designed to simulate natural human interaction. We are building an AI assistant that seamlessly transitions between roles: an expert tutor (Chemistry, History, Biology), an empathetic companion, and a simultaneous translator. By processing audio tokens directly, the system achieves unprecedented interaction speeds. Current Status: The base model (English) is stable. We are now scaling to address regional specifics and deploying the solution within a high-tech application framework. Key Responsibilities 1. Core AI & ML (Adaptation & Intelligence) Multilingual Support: Lead cross-lingual fine-tuning to provide native-level support for Uzbek (including regional dialects), Kazakh, and Russian. Latency Optimization: Streamline inference pipelines to target a response latency of 0.07 seconds. Smart RAG (100 GB): Architect a vector knowledge base for educational materials, implementing a "triple-check" verification mechanism to eliminate hallucinations. NVIDIA Stack: Optimize inference for RTX 4090 environments using vLLM, TensorRT-LLM, and INT4/FP8 quantization. 2. Telegram Mini App & Real-time Web Audio Streaming: Implement low-latency real-time audio transmission via WebRTC / WebSockets (moving beyond standard voice message protocols). Full-Duplex UI: Develop a frontend that supports interruptibility, allowing the AI to react instantly when the user speaks over it. Vocal ID: Integrate voice biometrics for secure user authentication. Billing: Integrate local payment gateways (Payme, Click) for subscription management. 3. Architecture & Infrastructure Highload Design: Design a horizontally scalable system capable of handling high concurrent user loads. Signal Processing: Implement software-based AEC (Acoustic Echo Cancellation) and noise suppression to ensure high-fidelity communication. Traffic Localization: Optimize routing protocols to maximize performance within the TAS-IX network. Candidate Requirements AI / ML Engineering: Proven experience with End-to-end (E2E) speech models (Moshi, AudioLM, or similar). Deep proficiency in PyTorch and Transformer architectures. Hands-on experience in Fine-tuning LLMs/S2S models for new language groups. Expertise in CUDA 12.x and NVIDIA optimization libraries. Fullstack Development: Expert-level knowledge of WebRTC / WebSockets for real-time media streaming. Demonstrated experience in developing Telegram Mini Apps (TMA). Professional mastery of FastAPI and React / Next.js. Strong understanding of the constraints and requirements of Low-latency systems.
Project ID: 40191027
76 proposals
Remote project
Active 40 mins ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
76 freelancers are bidding on average $4,031 USD for this job

Hello, AZIZA is a bold S2S voice platform. This project is set as remote, project-based work with access to TAS-IX GPU clusters in Uzbekistan and Kazakhstan powered by NVIDIA RTX 4090. I will lead end-to-end AI/ML and full-stack work to scale multilingual S2S models for Uzbek dialects, Kazakh, and Russian, while driving latency toward 0.07 s. I’ll build a 100 GB Smart RAG vector store with triple-check verification to minimize hallucinations, and optimize inference for RTX 4090 using vLLM, TensorRT-LLM, and INT4/FP8. For Telegram Mini App and Web, I’ll implement low-latency audio with WebRTC/WebSockets, a full-duplex UI with instant reactions, and voice biometrics for secure login. I’ll integrate Payme/Click for local billing and design a horizontally scalable, low-latency architecture with AEC and noise suppression, tuned for TAS-IX routing. This will be production-ready, region-aware, and able to handle high concurrency and multilingual interactions. Can you confirm the target latency per language and regional dialect coverage for Uzbek, Kazakh, and Russian, so I can tailor fine-tuning and deployment milestones? Best regards,
$5,000 USD in 16 days
9.0
9.0

With a strong background in Software Architecture, Machine Learning, Full Stack Development, Audio Processing, and Deep Learning, I am confident that I am the perfect fit for the AI/Fullstack NLP / Speech Deep Learning Engineer (Core AI), AI Infrastructure / LLM DevOps Engineer project. I am eager to dive into the details and adjust the budget accordingly. My goal is to deliver high-quality results within your budget and timeline. Please review my extensive 15-year-old profile to see my past work. Let's discuss the project further and get started as soon as possible.
$3,500 USD in 21 days
7.2
7.2

Hello, I am a senior AI/Fullstack engineer with strong 10+ years of expertise in speech deep learning, low-latency systems, and scalable infrastructure. I can lead the development of AZIZA, building a robust Speech-to-Speech ecosystem with multilingual support, real-time audio streaming, and high-performance inference on RTX 4090 clusters. Approach: Core AI & Multilingual Adaptation Fine-tune the base model for Uzbek, Kazakh, and Russian with regional dialect support. Implement low-latency inference pipelines using vLLM and TensorRT-LLM with INT4/FP8 quantization. Optimize end-to-end latency targeting 0.07 seconds while maintaining high audio quality and natural speech. Smart RAG & Hallucination Control Design a vector knowledge base for educational materials (100 GB) and implement a triple-check verification mechanism to ensure accuracy and prevent hallucinations. Use efficient embedding and retrieval strategies for real-time response. Real-time Web & Telegram Mini App Build low-latency audio streaming using WebRTC/WebSockets and implement full-duplex UI with interruptibility. Integrate voice biometrics for authentication and local payment gateways (Payme, Click) for subscription billing. Develop the Telegram Mini App with seamless audio interaction. Architecture & Infrastructure Design a horizontally scalable system with high concurrency support. Implement AEC and noise suppression to ensure clear communication. Thanks.
$3,000 USD in 7 days
6.6
6.6

As an AI and Machine Learning engineer with a Masters in Embedded Systems, I bring a unique blend of technical expertise to the project. My understanding of firmware development, circuit design, and C/C++ programming gives me a multi-dimensional insight into your project. I comprehend the importance of developing efficient and scalable software solutions to meet low-latency requirements and can leverage my proficiency in Python, Linux, and circuit design effectively to cater to all your needs. My deep understanding of projects archives on performance optimization using CUDA 12.x and NVIDIA optimization libraries makes me ideal for streamlining inference pipelines and NVIDIA stack optimizations that are crucial for this project. Moreover, my hands-on experience in end-to-end speech models will be valuable in language localization tasks like fine-tuning LLMs/S2S models for Uzbek, Kazakh, Russian languages and implementing Acoustic Echo Cancellation (AEC) for high-fidelity communication.
$5,000 USD in 45 days
6.2
6.2

I HAVE SUCCESSFULLY DELIVERED SIMILAR SCALABLE PLATFORMS — THIS PROJECT PERFECTLY MATCHES MY EXPERIENCE. LET’S BUILD A SOLUTION USERS TRUST AND LOVE. I propose to design and develop a secure, scalable, and production-ready digital platform tailored to your requirements, with a strong focus on user experience, performance, and long-term maintainability. Core Features • Modern, responsive web interface • Secure user registration & authentication • Step-by-step guided workflows (wizard-style onboarding) • Document upload & validation • Digital signature & third-party service integrations • Payment gateway integration (where applicable) • Admin dashboard for data, users, and activity management • Secure database handling sensitive information • Clean, documented, and scalable codebase User Roles • End Users / Clients • Business Owners / Service Providers • Platform Administrators Delivery & Support • 100% complete source code ownership • Fully documented deployment & handover • 2 years of FREE ongoing support post-launch (bug fixes, stability updates, minor improvements) • Scalable architecture ready for future enhancements
$3,000 USD in 7 days
6.4
6.4

Hi! I'm a professional developer and would love to work on your project.
$3,000 USD in 50 days
4.3
4.3

Hello, I have carefully read the complete description for Project AZIZA and it is clear that this is a deeply technical and ambitious voice to voice AI platform with a strong focus on real time performance and production readiness. The combination of multilingual speech intelligence, ultra low latency goals, and tight infrastructure control is exciting, and I am genuinely interested in understanding how you plan to evolve AZIZA across regions and user segments. Your requirements around end to end speech models, cross lingual fine tuning, RTX optimized inference, real time audio streaming, Telegram based user experience, and scalable high load architecture are very well defined. I am comfortable owning both the core AI intelligence layer and the full stack delivery while keeping latency, reliability, and user experience as top priorities. Speech to speech AI, multilingual fine tuning, low latency audio streaming, smart RAG knowledge systems, scalable AI infrastructure. I have 7+ years of experience across AI engineering and full stack development with strong hands on work in deep learning, NLP, speech systems, real time applications, and backend architecture. I focus on building stable production systems from research to deployment. Let us schedule a quick chat to align your vision and next steps. Best Regards, Prasham Jain
$3,000 USD in 23 days
4.3
4.3

With over 8 years of professional experience in data analytics and science, I bring a unique perspective to the table that combines both ML engineering and full-stack development. Although my expertise may differ slightly from what you’re seeking, I believe my skills are highly transferrable and adaptable, which is essential in this ever-evolving technological landscape. I've worked extensively with Python, Pandas, NumPy and Scikit-learn among others, ensuring I match your detailed tech stack requirements such as PyTorch, TensorRT-LLM, FastAPI and WebRTC. In particular, my deep proficiency in PyTorch and Transformer architecture aligns well with your project needs for AI/ML Engineering. My previous work on Fine-tuning LLMs/S2S models for new language groups and expertise on CUDA optimization would be invaluable in achieving the multilingual support you’re aiming for with native-level Uzbek, Kazakh and Russian language understanding. Additionally, my strong understanding of Low-latency systems will be beneficial in optimizing the response latency to your target of 0.07 seconds.
$3,000 USD in 7 days
4.1
4.1

Hello, I have reviewed the details of your project. i will extend the existing moshi-based personaplex model to support uzbek kazakh and russian with native-level fluency. cross lingual fine tuning will be done in pytorch with tensorrt-llm acceleration on your nvidia rtx 4090 clusters, using int4 and fp8 quantization to reach a target latency of 0.07 seconds. the vector knowledge base will store educational materials and include a triple-check verification mechanism to reduce hallucinations. for real-time interaction i will use webrtc and websockets to stream audio with low latency and support full-duplex communication so the ai can respond while the user speaks. fastapi will handle backend api calls, and react will provide a responsive frontend for both web and telegram mini app interfaces. acoustic echo cancellation and noise suppression will be applied in software to maintain audio clarity. billing will connect to payme and click for subscription management. Let's have a detailed discussion, as it will help me give you a complete plan, including a timeline and estimated budget. I will share my portfolio in chat I look forward to hear from you. Thanks Best Regards, Mughira
$4,000 USD in 7 days
3.9
3.9

Hello there, I reviewed your project AI/Fullstack NLP / Speech Deep Learning Engineer (Core AI),AI Infrastructure / LLM DevOps Engineer and understood the requirements at a high level. I focus on delivering clear, stable, and maintainable solutions aligned with the actual scope, I can work with Software Architecture, Machine Learning (ML), Full Stack Development and follow a clean development process with proper structure and error handling. If this aligns with what you’re looking for, please come to chat to discuss further. Best regards
$3,000 USD in 7 days
3.3
3.3

Hi, I’m excited about leading the AI/Fullstack engineering of Project AZIZA, a cutting-edge voice-to-voice AI ecosystem. With extensive experience in PyTorch, Transformer architectures, and deploying low-latency real-time AI systems, I can drive the multilingual adaptation for Uzbek, Kazakh, and Russian dialects, optimizing inference on RTX 4090 using TensorRT-LLM and quantization techniques. I have a proven track record in building scalable systems integrating WebRTC streaming, Telegram Mini Apps, and AI-based voice biometrics, which aligns perfectly with your needs. I propose starting with detailed architecture review and scaling your stable English base model to regional languages with rapid latency targets and smart RAG integration. I am confident in delivering a responsive, secure, high-quality experience within 30 days using your GPU clusters remotely. What are the primary challenges you’ve faced with latency and cross-lingual adaptation so far? Thanks, Roshan
$3,800 USD in 19 days
4.0
4.0

Hi there, I am a strong fit because I have built and operated low-latency, production-grade AI systems that run speech, LLM, and real-time execution pipelines end to end. I have hands-on experience with PyTorch-based transformer models, multilingual fine-tuning, speech-to-speech or speech-to-text pipelines, and aggressive latency optimization on NVIDIA GPUs. I work deeply with CUDA-aware inference stacks, vLLM or TensorRT-style optimization, FastAPI backends, WebRTC/WebSocket streaming, and scalable AI infrastructure. I reduce risk by designing deterministic inference pipelines, profiling latency at every stage, enforcing validation in RAG systems, and treating deployment, monitoring, and failure handling as first-class concerns. I am available to start immediately and can contribute both at the core AI level and the real-time fullstack/infrastructure layer for AZIZA. Regards Chirag
$4,000 USD in 7 days
2.8
2.8

Greetings! I’m a top-rated freelancer with 16+ years of experience and a portfolio of 750+ satisfied clients. I specialize in delivering high-quality, professional speech deep learning services tailored to your unique needs. Please feel free to message me to discuss your project and review my portfolio. I’d love to help bring your ideas to life! Looking forward to collaborating with you! Best regards, Revival
$3,000 USD in 30 days
2.0
2.0

Hi there! Just curious, do you plan on integrating any user-specific personalization features beyond vocal ID for the AI assistant’s responsiveness? Regardless, this is definitely something that I feel confident delivering on, given my past experience. I would love to discuss your project further! Looking forward hearing from you. Kind Regards, Corné
$3,000 USD in 14 days
0.9
0.9

Hey, I can provide a Lead AI/Fullstack Engineer perfectly suited for Project "AZIZA." We have direct experience with Moshi-based S2S architectures and optimizing inference on NVIDIA RTX 4090 clusters using TensorRT-LLM and vLLM to hit that 0.07s latency target. Our expertise covers the full scope of your requirements: fine-tuning for Uzbek/Kazakh dialects, architecting 100GB+ Smart RAG systems, and building high-concurrency Telegram Mini Apps with WebRTC for full-duplex, interruptible audio. We are also well-versed in localizing traffic for TAS-IX performance and integrating regional payment gateways like Payme and Click. I’m ready to discuss how we can scale AZIZA’s multilingual capabilities and stabilize your high-load infrastructure. When are you available for a deep-dive technical call with our lead? Anil
$4,500 USD in 7 days
0.0
0.0

Hello ATsugai, I hope this message finds you well. I am thrilled to come across your project, AZIZA, which combines cutting-edge AI with real-time audio processing to create a dynamic Speech-to-Speech ecosystem. The objective of simulating natural human interactions while addressing regional specifics is truly fascinating and aligns perfectly with my expertise. With extensive experience in AI and Fullstack Development, particularly in Machine Learning, Audio Processing, and FastAPI, I am well-equipped to lead the development of AZIZA. My background in deploying end-to-end speech models and fine-tuning them for new language groups, including working with PyTorch and Transformer architectures, positions me uniquely to drive cross-lingual support for Uzbek, Kazakh, and Russian. I'm also adept at optimizing inference pipelines to achieve low-latency responses, crucial for your project's success. I propose an approach that focuses on enhancing the multilingual capabilities, optimizing latency, and integrating robust real-time audio streaming via WebRTC and WebSockets. Additionally, my experience with NVIDIA's RTX environments and CUDA optimization will ensure that the system operates efficiently within the specified hardware constraints. I am particularly excited about integrating voice biometrics for secure authentication and developing a user-friendly frontend that supports full-duplex interaction. My familiarity with local payment gateways like Payme and Click will facilitate seamless subscription management. I am eager to contribute to this innovative project and can start right away. I look forward to the opportunity to discuss how I can help bring AZIZA to fruition. Best Regards, George M.
$3,000 USD in 40 days
0.0
0.0

Hi, I’m a Lead AI / Full-stack Engineer with hands-on experience building low-latency speech and real-time systems using PyTorch, FastAPI, WebRTC, and NVIDIA inference stacks. I’ve worked on multilingual model adaptation, GPU optimization (TensorRT / quantization), and production-grade streaming pipelines where latency and audio quality are critical. AZIZA is exactly the kind of S2S + infrastructure challenge I specialize in, and I’m confident I can help push both performance and reliability to the next level. Best regards, Nazarii
$4,000 USD in 7 days
0.0
0.0

Hello, I’m a Senior AI/ML Architect with strong expertise in NLP, Deep Learning, Speech Processing, and full-stack AI system development. I’ve delivered multiple production-grade systems using transformer models, real-time speech pipelines, and scalable backend services, and I’d be glad to contribute to your project. For this project, I will: Build end-to-end NLP and speech solutions using PyTorch/TensorFlow and transformer architectures. Design and integrate high-accuracy STT/TTS pipelines for speech recognition and generation. Develop robust deep learning pipelines, including multilingual support and fine-tuning on your datasets. Expose models through low-latency backend APIs using FastAPI or Flask. Ensure scalability and deployment readiness with Docker and Kubernetes. Provide clear documentation, evaluation metrics, and demo workflows. I focus on clean architecture, reliable model integration, and real-world performance to ensure the solution is maintainable, scalable, and production-ready. I’m comfortable working with cloud GPU environments, real-time streaming, and full-stack integration. Let’s discuss your data sources, model expectations, and performance goals to finalize milestones and timelines aligned with your budget. Best regards, Manas Ranjan Mohanty
$4,000 USD in 7 days
0.0
0.0

Hi, I am interested in the Lead AI / Fullstack Engineer role for Project AZIZA. I have experience building low-latency voice-to-voice systems, fine-tuning multilingual speech/LLM models, and deploying optimized inference on RTX 4090 using PyTorch, TensorRT-LLM, vLLM, and CUDA. I have worked on E2E speech architectures (AudioLM/Moshi-style), cross-lingual adaptation, and large-scale RAG systems with verification layers to reduce hallucinations. On the fullstack side, I’ve implemented real-time, full-duplex audio with WebRTC/WebSockets, interruptible UIs, FastAPI backends, and Telegram Mini Apps, with a strong focus on sub-100ms latency and high-load scalability. I am comfortable working directly with GPU clusters and owning both architecture and execution in production environments. Question: What is your current end-to-end latency and target concurrent voice sessions at launch? Best regards, Kamran
$4,000 USD in 7 days
0.0
0.0

Tashkent, Uzbekistan
Payment method verified
Member since May 15, 2024
$750-1500 USD
$30-250 USD
$250-750 USD
$30-250 USD
₹150000-250000 INR
$2-8 USD / hour
$15-25 USD / hour
$30-250 USD
$1500-3000 USD
$1500-3000 USD
$1500-3000 CAD
₹150000-250000 INR
$30-250 USD
$2-8 USD / hour
£300-500 GBP
₹1500-12500 INR
£20-250 GBP
$1500-3000 USD
$1500-3000 USD
$15-25 USD / hour
₹12500-37500 INR
$15-25 USD / hour