Job Title: Senior Backend AI Engineer
Location: Fully Remote
Company Overview
Founded in 2019, Elife Transfer is a fast-growing startup headquartered in San Francisco, California, at the heart of Silicon Valley. We are an all-in-one global ground transportation marketplace, enabling travelers to book airport transfers, ride-hailing, shared rides, private cars and rail tickets. Trusted by over 40 million travelers across 182+ countries, we are rapidly scaling to become the world’s go-to platform for seamless, end-to-end ground mobility.
Key Responsibilities:
Agent Architecture & Pipeline Design
Design and operate multi-agent systems with orchestrator and specialist agents covering planning, coding, testing, and review
Build feedback loops so agents can detect failures, read error output, and self-correct without human intervention
Define agent tool APIs: shell execution, code interpreter, file system access, git operations, CI triggers
Implement sandboxed execution environments (Docker, Firecracker, E2 B) for safe autonomous code execution
Automated Software Delivery
Build pipelines where AI autonomously generates unit, integration, and regression tests from specifications
Integrate agents with Git Hub/Git Lab: branch creation, PR lifecycle management, automated review bots
Implement CI pipeline agents that interpret test results, triage failures, and propose fixes
Automate code review: style checking, correctness analysis, security scanning — all agent-driven
LLM & Reasoning Stack
Design prompting strategies for code generation, test synthesis, PR narration, and review response
Build and maintain RAG pipelines over codebases using vector databases with AST-aware chunking
Manage context windows for long-horizon tasks that span multiple files and subsystems
Implement fine-tuning and RLHF pipelines to specialize models for domain-specific code generation
Evaluation & Quality
Define success metrics for agent output before any system ships to production
Build and maintain evaluation harnesses that test agent quality systematically across scenarios
Benchmark agent performance regressions on each model update or pipeline change
Track token costs, latency, and failure rates per agent run through structured observability
Safety & Reliability
Apply least-privilege execution to every autonomous agent: scope permissions to the minimum required
Implement human-in-the-loop gates for destructive or irreversible actions
Defend against prompt injection in tool-call pipelines exposed to untrusted content
Ensure all agent actions are idempotent — safe to retry without side effects
Define and enforce token budgets and cost throttling per agent run
Requirements:
Non-negotiable experience
5+ years of backend engineering in production systems
2+ years designing or building agentic AI systems (not chatbots or AI autocomplete)
At least one production multi-agent system shipped and maintained end-to-end
Direct experience with agent output where AI authored the majority of code diffs
Hands-on with an LLM evaluation harness for code quality assessment
Technical skills
Python as primary language; Go or Rust for performance-critical components
API design for agent tool endpoints (REST, async queues, event-driven architectures)
Kubernetes-based agent orchestration and scaling
Vector databases and embedding pipelines (Weaviate, Qdrant, Pinecone)
Distributed tracing and observability tooling (Open Telemetry, Datadog, Lang Smith)
Nice to have
• Experience with SWE-bench, Human Eval, or other code generation benchmarks
• Contributions to open-source agentic frameworks
• Knowledge of formal verification or property-based testing for agent output validation
• Experience deploying LLMs on custom hardware (A100/H100 clusters)
• Background in compiler design or static analysis (useful for AST-level code understanding)