Technical architecture

How Kira finds, ranks, and generates clinical answers.

A hybrid retrieval-augmented generation pipeline with multi-stage search, cross-encoder reranking, agentic tool orchestration, and constitutional safety — evaluated across 438 gold queries with bootstrap confidence intervals.

Try Kira Contact us

Search pipeline

Eight stages from query to grounded response.

Each query passes through synonym expansion, dual-path retrieval, cross-encoder reranking, and agentic tool orchestration before reaching the LLM with constitutional safety enforcement.

Query

User input parsed and classified

0ms

Expansion

80+ clinical synonym mappings

~1ms

Embedding

384-dim dense vectors

~50ms

Hybrid Search

60% vector + 40% BM25 fusion

~20ms

Reranking

Cross-encoder 20 → 8

~300ms

Agentic Loop

4 tools, max 3 rounds

~1.5s

Safety

Constitutional + output guard

In-prompt

Response

SSE stream with sources

~2s total

Query

0ms

User input parsed and classified

Expansion

~1ms

80+ clinical synonym mappings

Embedding

~50ms

384-dim dense vectors

Hybrid Search

~20ms

60% vector + 40% BM25 fusion

Reranking

~300ms

Cross-encoder 20 → 8

Agentic Loop

~1.5s

4 tools, max 3 rounds

Safety

In-prompt

Constitutional + output guard

Response

~2s total

SSE stream with sources

Retrieval benchmarks

Measured against 438 gold queries across five evaluation categories.

Scope, clinical depth, differential diagnosis, safety, and edge cases — each with bootstrap 95% confidence intervals.

Run the eval harness to generate benchmark data:

npm run eval

System architecture

The numbers behind the pipeline.

Knowledge Base

DSM-5-TR disorders

8,753

Search chunks

Personality disorders

Screener instruments

Search Pipeline

384d

Embedding dimensions

60/40

Vector / BM25 weight

80+

Synonym mappings

20→8

Candidates → results

Safety Pipeline

3-tier

Safety classification

988

Crisis escalation

20/min

Rate limit burst

In-prompt

Constitutional principles

Evaluation

438

Gold evaluation queries

92%

Recall@5

4.55

Groundedness (of 5)

4.91

Relevance (of 5)

Ablation study

Each component earns its place in the pipeline.

Retrieval quality across search methods, measured on queries with known expected sources and bootstrap confidence intervals.

Method	Recall@3	Recall@5	Recall@8	MRR	NDCG@10
BM25 Only	72.3%	83.5%	88.2%	68.4%	71.2%
Hybrid (Vector + BM25)Production	89.9%	92.0%	93.1%	87.7%	87.9%
Hybrid + Reranking	90.4%	92.3%	93.3%	88.5%	88.5%

Hybrid fusion uses 60/40 vector/BM25 weighting with reciprocal rank fusion. Bootstrap 95% CIs computed over 1,000 resamples of the 107-query test split. Reranking adds ~300ms latency for marginal gains — disabled in production.

Knowledge graph

Clinical relationships extracted from structured DSM-5-TR data.

Interactive visualization of comorbidity links, screening tool associations, differential rule-outs, and diagnostic category membership across 57 conditions and 44 instruments.

Loading knowledge graph...

Explore the system

Try Kira on hard clinical questions and see the pipeline in action.

Ask about differential diagnosis, comorbidity patterns, screening interpretation, or treatment mechanisms — every answer is grounded in the knowledge base with source citations.

Ask Kira Contact research