evaluating-llms-harness

Evaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking model quality, comparing models, reporting academic results, or tracking training progress. Industry standard used by EleutherAI, HuggingFace, and major labs. Supports HuggingFace, vLLM, APIs.

v1.0.0New

Signing

SignedSLSA L2

Signed by: skills-hub.ai distributor
Method: Distributor-signed by skills-hub.aiCryptographically signed by the skills-hub.ai distributor key at publish time.
Signed: May 3, 2026, 2:16 AM

Install this skill

Run this command in your terminal. No account required — it auto-detects your AI tool and installs the skill file.

npx @skills-hub-ai/cli install ai-research-evaluating-llms-harness

Or download directly:

Browse all CLI commands →

Setup by platform

Claude Code

~/.claude/skills/<skill>/SKILL.md

Setup guide →

Install

One-click setup for your editor

Run in your project root

npx @skills-hub-ai/cli install ai-research-evaluating-llms-harness --target claude-code

evaluating-llms-harness

Signing

Install

Instructions

Security

Reviews (0)