Appwrite Arena

LLM benchmarking leaderboard that evaluates how well AI models understand Appwrite services. Compare model performance with and without skill file context across 70 questions spanning 7 Appwrite product categories.

Live at arena.appwrite.network

How It Works

The benchmark tests leading AI models on their knowledge of Appwrite through two modes:

With skills — Models receive comprehensive Appwrite documentation as context
Without skills — Models answer based solely on their training data

Questions are split into 57 multiple-choice (auto-scored) and 13 free-form (AI-judged by Claude Sonnet 4.6) across these categories:

Category	Topics
Foundation	Core Appwrite concepts and architecture
Auth	Authentication, users, teams, OAuth
TablesDB	Tables, rows, queries, permissions
Functions	Serverless functions, runtimes, triggers
Storage	File uploads, buckets, previews
Sites	Web hosting and deployment
Messaging	Email, SMS, push notifications

Models Tested

Claude Opus 4.7 — Anthropic
GPT 5.5 — OpenAI
Gemini 3.1 Pro (Preview) and Gemini 3.1 Flash Lite (Preview) — Google
Grok 4.3 — xAI
DeepSeek V4 Flash — DeepSeek
Qwen 3.6 Plus — Alibaba
GLM 5.1 — Zhipu
MiniMax M2.7 — MiniMax
Mistral Large 3 2512 — Mistral
Kimi K2.6 — MoonshotAI

All models are accessed via OpenRouter with temperature set to 0 for deterministic results.

Tech Stack

Frontend: React, TanStack Start, Tailwind CSS, Vite, TypeScript

Benchmark: Bun, OpenRouter

Getting Started

Prerequisites

Node.js 18+
Bun (for benchmark scripts and pre-build step)

Development

npm install
npm run dev

The app runs at http://localhost:3000.

Production Build

npm run build
npm run preview

Linting & Formatting

npm run lint
npm run format
npm run check

Tests

npm run test

Running Benchmarks

The benchmark suite lives in the benchmark/ directory and requires an OpenRouter API key.

cd benchmark
cp .env.example .env
# Fill in your API key in .env

# Run both modes
bun run bench:all

# Or run individually
bun run bench:with-skills
bun run bench:without-skills

Keep in mind, the benchmark only fills missing data in result JSON files, to minimize cost. If you intend to re-run the benchmark on existing results, you should delete the contents of the JSON files first.

Results are saved to src/data/results-with-skills.json and src/data/results-without-skills.json, which the frontend reads at build time.

Project Structure

├── src/                    # Frontend application
│   ├── components/         # React UI components
│   ├── routes/             # File-based routes (TanStack Router)
│   ├── data/               # Static benchmark result JSON files
│   └── lib/                # Types, utilities, site config
├── benchmark/              # Benchmark suite
│   ├── src/
│   │   ├── questions/      # 70 questions across 7 categories
│   │   ├── skills/         # Appwrite documentation for context mode
│   │   ├── runner.ts       # Test execution logic
│   │   ├── judge.ts        # AI judge for free-form answers
│   │   └── config.ts       # Model definitions and settings
│   └── package.json
├── scripts/                # Build-time scripts (GitHub stars fetcher)
└── public/                 # Static assets

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 178 Commits
.vscode		.vscode
benchmark		benchmark
public		public
src		src
.cta.json		.cta.json
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
biome.json		biome.json
bun.lock		bun.lock
nitro.config.ts		nitro.config.ts
package-lock.json		package-lock.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Appwrite Arena

How It Works

Models Tested

Tech Stack

Getting Started

Prerequisites

Development

Production Build

Linting & Formatting

Tests

Running Benchmarks

Project Structure

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Appwrite Arena

How It Works

Models Tested

Tech Stack

Getting Started

Prerequisites

Development

Production Build

Linting & Formatting

Tests

Running Benchmarks

Project Structure

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages