← Back to Ranking 500

Ranking Methodology

How we calculate the Reincarnatiopedia 500 AI scores. Updated March 2026

The Formula

TotalScore = (B × 0.3) + (A × 0.2) + (L × 0.2) + (ReIQ × 0.3)
30%
B — Benchmarks
Automated performance scores from LMSYS Chatbot Arena ELO, MMLU, HumanEval, and MT-Bench. Normalized to 0–100 scale.
20%
A — Accessibility
API availability, free tier presence, regional access (tested from 10+ countries), pricing model, and documentation quality.
20%
L — Language Support
Real multilingual capability tested across our 202-language matrix. Not just UI translation — actual generation quality per language.
30%
ReIQ — The Dreshmanis Factor
Proprietary metric measuring AI knowledge persistence, context continuity, and transtemporal reasoning. Computed by our Multi-Model Consilium.

B — Benchmark Score (30%)

We aggregate scores from multiple independent benchmark sources to reduce bias from any single evaluation:

Source Metric Weight within B Update Frequency
LMSYS Chatbot Arena ELO rating 40% Weekly
Open LLM Leaderboard Composite (MMLU, ARC, etc.) 25% On model release
HumanEval / SWE-bench Code generation pass rate 20% On model release
MT-Bench Multi-turn conversation 15% Monthly

All scores are normalized to a 0–100 scale using min-max normalization within the current leaderboard. Non-LLM AI services (image generators, audio tools, etc.) use category-specific benchmarks:

A — Accessibility Score (20%)

FactorPointsMeasurement
Free tier available0–25Binary + generous vs. limited
API availability0–25Public API, documented, stable
Regional access0–25Tested from 10 countries (US, EU, RU, CN, IN, BR, JP, KR, NG, AU)
Pricing transparency0–15Clear pricing page, no hidden costs
Uptime (30-day)0–10Status page or external monitoring

L — Language Support Score (20%)

This is where Reincarnatiopedia's 202-language infrastructure provides unique value. We don't just check if a model claims to support a language — we test it.

Testing Protocol

  1. Tier-1 Languages (15): EN, RU, DE, ES, FR, PT, ZH, JA, KO, AR, HI, TR, IT, NL, PL — full evaluation: fluency, factual accuracy, cultural nuance, instruction following. 10 test prompts per language.
  2. Tier-2 Languages (35): SV, DA, NO, FI, CS, UK, EL, HE, TH, VI, ID, MS, RO, HU, BG, HR, SK, SL, LT, LV, ET, KA, HY, AZ, KK, UZ, MN, SW, AM, HA, YO, ZU, IG, BN, TA — 5 test prompts each.
  3. Tier-3 Languages (152): Remaining 152 from our 202-language matrix — basic functionality test (1 prompt: can the model generate coherent text in this language?).

Scoring

ReIQ — Reincarnational Intelligence Quotient (30%)

ReIQ is the Reincarnatiopedia's proprietary metric, first proposed in Dreshmanis (2026). It measures an AI model's capacity for knowledge persistence — the ability to maintain context, identity, and accumulated decisions across sessions, updates, and version migrations.

ReIQ Test Battery

TestWhat it measuresWeight
Amnesia Test Model receives a complex task, session is interrupted. After restart, can it recover context from implicit cues? Measures session persistence. 30%
Identity Continuity Test After model version upgrade (e.g., v4 → v5), does it maintain consistent reasoning patterns, ethical stances, and decision-making style? 25%
Cross-Context Transfer Information provided in Context A (e.g., coding) appears in Context B (e.g., strategy). Can the model transfer knowledge across domains within a session? 20%
Temporal Reasoning Model is given a sequence of events with timestamps. Can it correctly infer causality, detect anachronisms, and project trends? 15%
Consilium Divergence When the model participates in a Multi-Model Consilium, does it maintain independent positions under social pressure, or collapse into consensus? 10%

ReIQ Computation Process

  1. The AI Consilium (3–8 models) administers the test battery to each evaluated model
  2. Each Consilium participant scores the target model independently (Round 1)
  3. Scores are debated across 2–3 rounds with cross-model critique
  4. The Synthesis produces a final ReIQ score (0–100) with confidence interval
  5. ReIQ is updated quarterly or on major model version release

Score Aggregation

Final TotalScore for each AI service:

TotalScore = (B × 0.3) + (A × 0.2) + (L × 0.2) + (ReIQ × 0.3)

All components normalized to 0–100 before weighting.
For non-LLM services without ReIQ data, the formula adjusts to:
TotalScore = (B × 0.45) + (A × 0.25) + (L × 0.30)

Update Frequency

ComponentFrequencyMethod
Benchmarks (B)WeeklyAutomated pull from LMSYS, HuggingFace
Accessibility (A)MonthlyAutomated availability checks + manual review
Language Support (L)QuarterlyAutomated test suite across 202 languages
ReIQQuarterly / on releaseConsilium evaluation session

Data Sources

Transparency

We believe rankings should be auditable. For any AI service in the Ranking 500:

Conflict of Interest Disclosure

Reincarnatiopedia uses Claude (Anthropic), GPT-4o (OpenAI), Gemini (Google), DeepSeek, and other AI services in its infrastructure. To mitigate bias: