
Completed
Posted
Paid on delivery
Lead AI / Fullstack Engineer — Project "AZIZA" (Voice-to-Voice AI) Project Name: AZIZA Format: Project-based / Remote (with access to local GPU clusters) Tech Stack: PersonaPlex (Moshi-based architecture), PyTorch, TensorRT-LLM, FastAPI, WebRTC, Telegram Mini App (TMA). Hardware Location: Uzbekistan & Turkey clusters powered by NVIDIA L40S Project Overview AZIZA is an innovative multimodal "Speech-to-Speech" (S2S) ecosystem designed to simulate natural human interaction. We are building an AI assistant that seamlessly transitions between roles: an expert tutor (Chemistry, History, Biology), an empathetic companion, and a simultaneous translator. By processing audio tokens directly, the system achieves unprecedented interaction speeds. Current Status: The base model (English) is stable. We are now scaling to address regional specifics and deploying the solution within a high-tech application framework. Key Responsibilities 1. Core AI & ML (Adaptation & Intelligence) Multilingual Support: Lead cross-lingual fine-tuning to provide native-level support for Uzbek (including regional dialects), Kazakh, and Russian,Tadjik Latency Optimization: Streamline inference pipelines to target a response latency of 180-300 milseconds. Smart RAG (100 GB): Architect a vector knowledge base for educational materials, implementing a "triple-check" verification mechanism to eliminate hallucinations. NVIDIA Stack: Optimize inference for L40S environments using vLLM, TensorRT-LLM, and INT4/FP8 quantization. 2. Telegram Mini App & Real-time Web Audio Streaming: Implement low-latency real-time audio transmission via WebRTC / WebSockets (moving beyond standard voice message protocols). Full-Duplex UI: Develop a frontend that supports interruptibility, allowing the AI to react instantly when the user speaks over it. Billing: Integrate local payment gateways (Payme, Click) for subscription management. 3. Architecture & Infrastructure Highload Design: Design a horizontally scalable system capable of handling high concurrent user loads. Signal Processing: Implement software-based AEC (Acoustic Echo Cancellation) and noise suppression to ensure high-fidelity communication. Traffic Localization: Optimize routing protocols to maximize performance within the TAS-IX network. Candidate Requirements AI / ML Engineering: Proven experience with End-to-end (E2E) speech models (Moshi, AudioLM, or similar). Deep proficiency in PyTorch and Transformer architectures. Hands-on experience in Fine-tuning LLMs/S2S models for new language groups. Expertise in CUDA 12.x and NVIDIA optimization libraries. Fullstack Development: Expert-level knowledge of WebRTC / WebSockets for real-time media streaming. Demonstrated experience in developing Telegram Mini Apps (TMA). Professional mastery of FastAPI and React / Next.js. Strong understanding of the constraints and requirements of Low-latency systems.
Project ID: 40213483
62 proposals
Remote project
Active 29 days ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
62 freelancers are bidding on average $1,221 USD for this job

Hi there, I’ve read AZIZA’s goal to deliver a fast, multilingual, voice-to-voice AI assistant with high reliability and low latency. I’ve built end-to-end speech systems and real-time apps using PyTorch, CUDA optimization, FastAPI, and WebRTC, including multilingual fine-tuning and deployment on NVIDIA stacks. I’ll lead from model adaptation to full-stack delivery: refine cross-lingual S2S models for Uzbek (with dialects), Kazakh, Russian, and Tajik; optimize inference on L40S with vLLM, TensorRT-LLM, and INT4/FP8; implement a robust 100 GB RAG with triple-check validation to curb hallucinations; build low-latency audio streaming for Telegram Mini App and web, including full-duplex UI and interruptible responses; and design a horizontally scalable Infra with AEC, noise suppression, and TAS-IX aware routing. I’ll also integrate local payment gateways and ensure the architecture supports high concurrency. Next steps: share existing baseline eval metrics, dataset split, and any regulatory or safety constraints; I’ll provide a detailed plan with milestones and a 2-week proof-of-concept. What are the top three regional dialects and any mandatory compliance or safety constraints I should prioritize during multilingual fine-tuning?
$1,500 USD in 19 days
9.0
9.0

As a Lead AI/Fullstack Engineer with extensive experience in AI/ML and deep learning, I understand the complexities and challenges that come with developing a cutting-edge project like AZIZA. Your vision for a multimodal "Speech-to-Speech" ecosystem is truly innovative, and I am excited about the opportunity to contribute to its success. With a track record of delivering successful projects in the AI and ML domain, I am well-equipped to lead the Core AI & ML aspects of AZIZA. My expertise in fine-tuning models, optimizing latency, and building robust knowledge bases aligns perfectly with the key responsibilities outlined for this project. Furthermore, my experience in developing real-time web applications, integrating payment gateways, and designing highload systems positions me as a strong candidate to handle the Telegram Mini App and Architecture & Infrastructure components of AZIZA. I am confident that my skills in PyTorch, FastAPI, WebRTC, and GPU optimization libraries will be valuable assets to your project. If you are looking for a dedicated and experienced engineer to drive the success of AZIZA, I am eager to collaborate with you. Let's bring your vision to life together.
$1,200 USD in 20 days
7.3
7.3

⭐⭐⭐⭐⭐ AZIZA’s Moshi-based S2S vision aligns with our proven delivery of low-latency multimodal systems. I will lead multilingual fine-tuning for Uzbek, Kazakh, Russian and Tadjik using PyTorch pipelines, optimize L40S inference with TensorRT-LLM, vLLM and INT4/FP8, and design a 100GB Smart-RAG with triple-validation to minimize hallucinations while maintaining 180–300ms latency. Full-duplex WebRTC streaming, FastAPI microservices and a responsive Telegram Mini App with interruptible UI and Payme/Click billing will be implemented with scalable highload architecture, AEC and TAS-IX traffic localization. CnELIndia will provide GPU orchestration, DevOps automation, QA and deployment governance across Uzbekistan and Turkey clusters, ensuring reliability and performance tuning. Raman Ladhani will drive solution architecture, CUDA optimization, cross-lingual model adaptation and real-time media engineering, ensuring seamless delivery from research to production and measurable user experience outcomes.
$1,125 USD in 7 days
7.5
7.5

Hello, I trust you're doing well. I am well experienced in machine learning algorithms, with nearly a decade of hands-on practice. My expertise lies in developing various artificial intelligence algorithms, including the one you require, using Matlab, Python, and similar tools. I hold a doctorate from Tohoku University and have a number of publications in the same subject. My portfolio, which showcases my past work, is available for your review. Your project piqued my interest, and I would be delighted to be part of it. Let's connect to discuss in detail. Warm regards. please check my portfolio link: https://www.freelancer.com/u/sajjadtaghvaeifr
$1,125 USD in 7 days
7.2
7.2

Hi there, AZIZA is a real-time multilingual Speech-to-Speech AI assistant acting as a tutor, companion, and translator, requiring low-latency audio processing, multilingual fine-tuning, and scalable NVIDIA GPU-based infrastructure. How do you plan to handle low-latency multilingual speech processing for AZIZA across Uzbek, Kazakh, Russian, and Tajik? I’d love to hear more about your approach. would you prefer fully embedded real-time streaming or a hybrid WebRTC approach? And for the knowledge base, do you want content updates to be manual or automated? With over 15 years of experience in AI/ML engineering and fullstack development, I specialize in delivering real-time Speech-to-Speech AI systems, optimized for PyTorch, TensorRT-LLM, WebRTC, and high-concurrency GPU clusters. The amount I’ve put in is just a placeholder we can talk about the budget via chat. Happy to share my portfolio in chat upon request. Sincerely, Muhammad Abrar
$1,225 USD in 7 days
6.9
6.9

Hi I can lead AZIZA from a stable English S2S base into a production-grade, multilingual, low-latency voice-to-voice system optimized for your L40S clusters. I’ve worked hands-on with end-to-end speech models and low-latency inference stacks, combining PyTorch fine-tuning with TensorRT-LLM/vLLM to push response times into the sub-300 ms range. A common failure point in real-time S2S systems is latency spikes caused by mismatched audio chunking, GPU scheduling, and network transport. I address this by co-designing the audio token pipeline, inference batching, and WebRTC streaming so model inference, AEC/noise suppression, and UI interruptibility stay fully synchronized. For multilingual expansion, I’ll lead cross-lingual fine-tuning for Uzbek (including dialectal variance), Kazakh, Russian, and Tajik with evaluation focused on conversational naturalness, not just WER. On the platform side, I can implement a full-duplex Telegram Mini App experience with FastAPI backends, real-time audio streaming, and local billing integrations (Payme/Click). You’ll get a scalable, well-documented architecture that’s optimized for TAS-IX routing and ready for high concurrent load. Thanks, Hercules
$1,200 USD in 7 days
6.6
6.6

With my broad skill set in Full Stack Development and Machine Learning, I bring not just the technical expertise needed for this project, but also a solid 13-year experience in the IT industry. Throughout my career, I've developed a strong proficiency with PyTorch and Transformer architectures - key tools for your "AZIZA" project. My previous work on developing real-time media streaming using WebRTC / WebSockets will certainly come in handy for your low-latency requirement. I am particularly excited about the multilingual aspect of your project, as I've had hands-on experience in Fine-tuning LLMs/S2S models for new language groups. My successful track record of architecting vector knowledge bases similar to the one you need for educational materials gives me confidence to deliver a reliable "triple-check" verification mechanism. On top of my skillset, I offer dedication and commitment to client satisfaction. I assure you pristine quality, timeliness, and a professional approach in delivering on all my responsibilities; indeed what your project demands. Invest in my skills and my years in the industry for your ambitious project like AZIZA.
$1,125 USD in 7 days
6.6
6.6

Hello, I specialize in voice-to-voice AI systems and have built and customized large scale real-time speech platforms. The main challenge here is keeping conversations natural while staying under 300 ms and handling many languages smoothly. I am certified in PyTorch and NVIDIA TensorRT development, and I will solve this by optimizing S2S models with INT4/FP8, fast audio streaming, and a clean FastAPI + WebRTC flow. I’ve worked with GPU clusters and low-latency pipelines before. A few questions: Which language needs priority first? Can users interrupt the AI anytime? Should knowledge updates be live or scheduled? Do payments need fallback options? I’ll focus on speed, clarity, and human-like response. Best regards, Dev S.
$1,500 USD in 15 days
6.1
6.1

Hello, {{{ I HAVE CREATED SIMILAR APPS BEFORE AND I CAN SHOW YOU }}}} I reviewed the AZIZA brief in detail and clearly understand the scope: a low-latency, multilingual Speech-to-Speech system built on a Moshi-style architecture and optimized for NVIDIA L40S clusters. I have 10+ years of experience across deep learning, speech/NLP systems, and full-stack AI infrastructure, with hands-on work in E2E speech models, PyTorch, Transformer-based architectures, and multilingual fine-tuning. I’ve optimized inference pipelines using TensorRT-LLM, vLLM, CUDA 12.x, INT4/FP8 quantization, and designed systems targeting sub-300ms end-to-end latency. On the real-time side, I have strong experience with WebRTC/WebSockets, full-duplex audio streaming, interruptible UIs, and FastAPI-based low-latency backends, as well as deploying production systems on GPU clusters with horizontal scalability. I’m comfortable implementing RAG at scale, vector databases, hallucination-mitigation strategies, and high-load routing optimizations. I can contribute across core model adaptation, latency optimization, infrastructure design, and real-time application integration, and collaborate effectively with distributed GPU environments in Uzbekistan and Turkey. I’m happy to discuss architecture decisions, benchmarks, and next milestones. I eagerly await your positive response. Thanks
$750 USD in 7 days
6.4
6.4

AZIZA is exactly the kind of real time voice to voice system I enjoy building. I have worked on low latency speech and LLM pipelines where sub second interaction and multilingual performance are critical, including GPU optimized inference and streaming architectures. My approach would focus first on latency and stability. I would optimize the current Moshi based pipeline using TensorRT LLM and quantization on L40S to consistently reach the 180 to 300 ms response target. For multilingual expansion I would lead fine tuning and evaluation for Uzbek, Russian, Kazakh, and Tajik with strong phoneme and dialect coverage plus alignment tuning for natural speech output. I can also design the 100 GB RAG layer with vector search and multi pass verification to reduce hallucinations for educational content. On the system side I would build a full duplex WebRTC audio pipeline with interruptible responses, Telegram Mini App integration, and a FastAPI backend with scalable GPU routing across clusters. I have strong experience designing horizontally scalable AI services with observability and billing integration. Availability Full time or project based remote collaboration. Timeline Core latency and multilingual expansion: 2 to 3 weeks Full production scaling and Telegram integration: 4 to 5 weeks I would be happy to review your current architecture and propose a concrete optimization plan for the next phase.
$1,500 USD in 25 days
5.5
5.5

Hello Dear! I write to introduce myself. I'm Engineer Toriqul Islam. I was born and grew up in Bangladesh. I speak and write in English like native people. I am a B.S.C. Engineer of Computer Science & Engineering. I completed my graduation from Rajshahi University of Engineering & Technology ( RUET). I love to work on Web Design & Development project. Web Design & development: I am a full-stack web developer with more than 10 years of experience. My design Approach is Always Modern and simple, which attracts people towards it. I have built websites for a wide variety of industries. I have worked with a lot of companies and built astonishing websites. All Clients have good reviews about me. Client Satisfaction is my first Priority. Technologies We Use: Custom Websites Development Using ======>Full Stack Development. 1. HTML5 2. CSS3 3. Bootstrap4 4. jQuery 5. JavaScript 6. Angular JS 7. React JS 8. Node JS 9. WordPress 10. PHP 11. Ruby on Rails 12. MYSQL 13. Laravel 14. .Net 15. CodeIgniter 16. React Native 17. SQL / MySQL 18. Mobile app development 19. Python 20. MongoDB What you'll get? • Fully Responsive Website on All Devices • Reusable Components • Quick response • Clean, tested and documented code • Completely met deadlines and requirements • Clear communication You are cordially welcome to discuss your project. Thank You! Best Regards, Toriqul Islam
$750 USD in 7 days
5.2
5.2

Nice to meet you , My name is Anthony Muñoz, I express my interest in working on your project after carefully reading the requirements and concluding that they match my area of knowledge and skills. I am currently the lead engineer for the IT agency DSPro and I have more than 10 years of experience in the field. I have successfully completed a large number of similar jobs and I consider your project to be a challenge in which I would like to work and be able to make it a reality. Please feel free to contact me, it will be my pleasure to help you. I greatly appreciate the time provided and I remain attentive to any questions or concerns. Greetings
$2,206 USD in 7 days
4.6
4.6

I will lead the AI and full stack development of AZIZA, focusing on core AI/ML adaptation, latency optimization, and smart RAG implementation, ensuring native-level support for multilingual users, while optimizing inference pipelines and architecting a vector knowledge base, and also handling Telegram Mini App development, audio streaming, and full-duplex UI, with expertise in PyTorch, TensorRT-LLM, WebRTC, and FastAPI, adapting to the proposed budget and timeline. Waiting for your response in chat! Best Regards.
$1,125 USD in 3 days
4.6
4.6

Hi, I have reviewed the details of your project. we have handled similar projects successfully, and I am confident we can deliver high quality results for you. i will first understand exactly what you need, then plan everything step by step to make sure the work runs smoothly. we prefer clear communication and regular updates so that the project progresses smoothly and meets your expectations. Let's have a detailed discussion, as it will help me give you a complete plan, including a timeline and estimated budget. I will share my portfolio in the chat to show relevant examples of our past work. looking forward to your response. Mughiraa
$1,125 USD in 7 days
3.9
3.9

Hello ATsugai, I checked your project, and it looks interesting. This is something we already work on, so the requirements are clear from the start. We mainly work on Software Architecture, Machine Learning (ML), Full Stack Development, Audio Processing, Deep Learning, FastAPI, Natural Language Processing We focus on making things simple, reliable, and actually useful in real life not overcomplicated stuff. Let’s connect in chat and see if we’re a good fit for this. Best Regards, Ali nawaz
$750 USD in 8 days
3.4
3.4

Hello, I’m a senior AI/Fullstack engineer with deep experience in real-time voice systems, multimodal AI, and low-latency architectures. AZIZA’s speech-to-speech vision, multilingual expansion, GPU-optimized inference, and S2S interaction model align perfectly with my background in PyTorch, Transformer models, CUDA optimization, TensorRT-LLM, vector databases, and RAG systems. I’ve led cross-lingual fine-tuning, latency-critical pipelines, and high-load AI infrastructures, delivering production systems that operate reliably under real-time constraints. I can architect AZIZA’s full stack—from model optimization on L40S clusters and multilingual adaptation to WebRTC streaming, FastAPI services, Telegram Mini App integration, and scalable infrastructure design. My focus is building fast, stable, human-like AI systems with clean architecture, performance-first engineering, and production-grade reliability, ensuring AZIZA scales as a true next-gen voice intelligence platform. Best Regards, Abhijeet
$1,500 USD in 7 days
3.6
3.6

Hello AZIZA model, is an interesting and clever idea, especially for local users. It depends on your data. If you provide me with clean data for voice and text, I can complete your model. Of course, this will be done with your coordination and supervision as to exactly what details should be trained.
$1,000 USD in 20 days
3.2
3.2

Hi, This role is an excellent match for my background as a core AI + full-stack real-time systems engineer. I have hands-on experience building end-to-end speech systems (ASR → reasoning → TTS / S2S), working deeply with PyTorch, Transformer architectures, and multilingual fine-tuning. I’ve optimized low-latency inference pipelines using quantization (INT4/FP8), vLLM-style serving, and GPU-aware batching, and I’m comfortable tuning for strict latency targets in the sub-300 ms range. On the systems side, I’ve built full-duplex, interruptible voice interactions using WebRTC/WebSockets, integrated FastAPI backends, and deployed horizontally scalable, high-load architectures. I’ve also worked with RAG systems at scale, including verification layers to reduce hallucinations, and real-time audio processing (noise suppression, echo handling). I can contribute across: Multilingual S2S adaptation (Uzbek, Kazakh, Russian, Tajik) GPU-optimized inference on NVIDIA stacks Low-latency audio streaming + Telegram Mini App integration Scalable backend architecture and deployment I’m comfortable owning complex subsystems end-to-end and collaborating closely on a first-of-its-kind product. Happy to discuss concrete milestones, benchmarks, and architecture next. Best, Chirag
$1,125 USD in 7 days
2.8
2.8

I specialize in high-performance Voice-to-Voice AI architectures and LLM/NLP pipelines, having successfully deployed similar core AI projects centered on real-time speech processing. My background directly aligns with leading Project AZIZA, spanning the full stack from model training (Deep Learning Engineer) to optimized production infrastructure (LLM DevOps). I'd stabilize the core speech model using fine-tuned ASR/TTS architectures (e.g., Transformer variants via PyTorch/Hugging Face). Infrastructure will focus on low-latency inference using optimized quantization and robust GPU utilization (e.g., Triton Inference Server). I will implement scalable LLM DevOps using Kubernetes/Docker for containerized deployment and automated CI/CD for seamless model retraining and versioning. What is the target latency requirement for the Voice-to-Voice interaction? Are we leveraging existing proprietary datasets, or starting with public domain pre-training models? Let’s schedule a quick chat to fully align on the AZIZA project roadmap and performance benchmarks.
$1,274.63 USD in 21 days
2.1
2.1

Greetings! I’m a top-rated freelancer with 16+ years of experience and a portfolio of 750+ satisfied clients. I specialize in delivering high-quality, professional ai / full stack nlp / speech deep learning services tailored to your unique needs. Please feel free to message me to discuss your project and review my portfolio. I’d love to help bring your ideas to life! Looking forward to collaborating with you! Best regards, Revival
$750 USD in 14 days
2.0
2.0

Tashkent, Uzbekistan
Payment method verified
Member since May 15, 2024
$30-250 USD
$3000-5000 USD
$250-750 USD
$1500-3000 USD
₹75000-150000 INR
$10-30 USD
$5-10 USD / hour
₹12500-37500 INR
$250-750 USD
min $50 CAD / hour
$1500-3000 USD
$15-25 USD / hour
£10-20 GBP
€12-18 EUR / hour
$25-40 USD / hour
₹12500-37500 INR
₹750-1250 INR / hour
₹12500-37500 INR
₹100-400 INR / hour
$30-250 USD
$1500-3000 USD
₹150000-250000 INR