Research Engineer

Research✦Bengaluru / Remote✦Full-time✦₹55–90L + meaningful equity

We route every run across models, and we need to know — measurably, repeatably — which model should get which step, when an agent's output is degrading, and how to evaluate agents that act rather than chat. That is applied research with a four-week loop to production, not a paper mill.

What you will do

[01]

Build the evaluation harness for agent runs — task success, side-effect safety, cost — and make it the gate every routing change passes through.

[02]

Own model routing research: when does the small model win, and how do we know before the user does.

[03]

Detect drift and degradation in production runs automatically, before it becomes a support ticket.

[04]

Design experiments with the statistical hygiene to kill our own ideas, then write up what died and why.

[05]

Publish what we can — honestly, including negative results.

What we need

[01]

Strong empirical ML background — 3+ years in applied research or ML engineering with experiments that shipped.

[02]

You are fluent in the current LLM landscape: capabilities, costs, context behavior, and where the benchmarks lie.

[03]

Production-grade Python; you profile before you optimize.

[04]

Statistical rigor — you know what a fair baseline is and you flag your own confounds first.

[05]

You would rather be correct than impressive, in that order, every time.

Nice to have

[01]

Published work on evaluation, routing, or multi-step agent behavior.

[02]

Experience fine-tuning or distilling models for narrow tasks.

[03]

You have built an eval suite a team actually trusted.

Apply — we reply to everyone