Location: Cluj, Romania
Job Type: Full Time
Salary: 14,100 RON/month
Position Overview:
We are looking for an experienced Senior Developer with a strong background in real-time audio streaming, low-latency systems, and AI-powered speech processing to join our Speech-to-Speech Translation team.
In this role, you will design and develop components that power real-time, bidirectional audio translation — from live voice capture and speech recognition to translation, synthesis, and playback. You’ll work with cutting-edge technologies like WebRTC, RTP, WebSockets, and AI models (OpenAI Realtime, Azure Speech) to deliver seamless multilingual voice communication between callers and agents.
You will collaborate closely with architects, AI engineers, and DevOps to build fault-tolerant, scalable streaming pipelines and real-time orchestration services for our telco-grade Speech-to-Speech solution.
Key Responsibilities:
- Design and implement real-time audio streaming and processing components using Python (and optionally Go).
- Build low-latency data pipelines for Speech-to-Text (STT) → Translation (MT) → Text-to-Speech (TTS) processing.
- Develop and optimize WebSocket, WebRTC, or RTP connections for full-duplex audio transmission.
- Integrate AI services such as OpenAI Realtime API, Azure Speech Services, and internal LLM translation modules.
- Implement audio session orchestration (session binding, queueing, failover, retries, latency control).
- Work with CCaaS integration components (Twilio TaskRouter, Amazon Connect Streams API, LiveKit, Hello Media Server).
- Collaborate with DevOps on observability, scalability, and infrastructure automation in Azure (Terraform / AKS).
- Ensure security and reliability of media and signaling channels, including mTLS, token-based access, and session isolation.
- Participate in architecture discussions, performance profiling, and design reviews focused on real-time performance optimization.
- Mentor mid-level developers and contribute to best practices for Python/Go development and real-time distributed systems.
Qualifications & Skills:
Required:
- 6–10 years of professional software development experience.
- Strong programming skills in Python (asyncio, FastAPI, or custom networking frameworks).
- Experience in real-time streaming systems using WebRTC, RTP, or WebSocket protocols.
- Understanding of audio processing pipelines — jitter buffering, packet loss concealment, VAD, or audio codecs.
- Familiarity with AI-based speech systems: STT, MT, TTS, and real-time model inference APIs.
- Experience with event-driven architectures and microservices.
- Proficiency with Docker, Kubernetes, and CI/CD pipelines (GitHub Actions / Azure DevOps).
- Solid understanding of networking, TLS/mTLS, and secure data streaming.
- Strong problem-solving skills and experience optimizing for low latency and high throughput.
Optional / Nice to Have:
- Proficiency in Go (Golang) for performance-critical components or concurrency-heavy workloads.
- Experience integrating telephony / CCaaS APIs (Twilio, Amazon Connect, Genesys, NICE).
- Knowledge of Temporal, GraphQL (Hasura), or message brokers (Kafka, NATS).
- Familiarity with AI orchestration (prompt-based or context-driven translation flows).
- Experience in observability stacks (Prometheus, Grafana, Application Insights).
- Prior work in telecommunication, contact center, or streaming media domains