agent-platform-eval-flywheel

Measures and improves the quality of AI models and agents on Google Cloud using the Eval Quality Flywheel methodology. Use when evaluating an agent or model, building an eval dataset, picking or writing evaluation metrics, analyzing failures, comparing results before and after a fix, or when guidance is needed on Agent Platform eval methodology — including dataset schema, LLM-as-judge scoring, and common failure causes. For fine-tuning, use agent-platform-tuning. For general production deployment, use agent-platform-deploy.

v1.0.1New

#google-cloud-skills

Signing

SignedSLSA L2

Signed by: skills-hub.ai distributor
Method: Distributor-signed by skills-hub.aiCryptographically signed by the skills-hub.ai distributor key at publish time.
Signed: Jul 16, 2026, 8:07 PM

Install this skill

Run this command in your terminal. No account required — it auto-detects your AI tool and installs the skill file.

npx @skills-hub-ai/cli install google-cloud-skills-agent-platform-eval-flywheel

Or download directly:

Browse all CLI commands →

Setup by platform

Claude Code

~/.claude/skills/<skill>/SKILL.md

Setup guide →

Install

One-click setup for your editor

Run in your project root

npx @skills-hub-ai/cli install google-cloud-skills-agent-platform-eval-flywheel --target claude-code

Instructions

This skill doesn’t include stateful context yet, instructions only. Learn about stateful skills.

Security

Loading security scan...

Reviews (0)

View full changelog & diffs →

Browse all →

gke-costOptimizes GKE costs, rightsizes workloads, and configures Spot VMs and CUDs. Use when optimizing GKE costs, rightsizing GKE workloads, or configuring GKE Spot VMs. Don't use for general compute class provisioning or GPU Selection (use gke-compute-classes instead).0 installs bigtable-basicsAssists in provisioning instances/tables, designing performant schemas, and querying data in Bigtable. Use when designing Bigtable row keys, configuring column families, writing SQL queries or client library code (Java, Go, Python) for Bigtable, or diagnosing performance/hotspotting issues. Also use when provisioning Bigtable clusters using gcloud or cbt CLIs. Don't use for generic Cloud SQL administration.0 installs agent-platform-model-registryAgent Platform Model Registry Management. Use when you need to upload, list, describe, update, or delete machine learning models (and their versions) in the Agent Platform Model Registry. Don't use for model training, model deployment to endpoints, or managing non-Agent Platform models.0 installs google-cloud-recipe-networking-observabilityInvestigates Google Cloud networking issues by analyzing logs, metrics, and diagnostics. Use when investigating VPC Flow Logs, NAT, firewall, or threat logs, querying latency and throughput metrics, or running Connectivity Tests for path diagnostics.0 installs google-cloud-solution-agentic-ai-borderless-data-lakehouseGuides agents to discover requirements and design a governed, secure borderless open data lakehouse with agentic AI integration. Use when designing a multi-product architecture that connects data silos to AI agents, joining data across clouds, or running federated queries across Google Cloud and external data sources, including on-premises or other cloud providers. Don't use for simple single-cloud data warehouses or non-AI workloads.0 installs google-cloud-recipe-foundation-builderDeploys a baseline landing zone foundation for a Google Cloud Organization, establishing security guardrails using Organization Policies, resource hierarchy folders and projects, billing association, and centralized logging and monitoring. Deploys Google Cloud's recommended security controls and architecture. Use when setting up a new Google Cloud Organization or establishing a secure, enterprise-grade landing zone foundation. Don't use for individual project onboarding (use google-cloud-recipe-onboarding or product-specific skills instead).0 installs

More from Google Cloud Skills

View source →

bigquery-basicsManages datasets, tables, and jobs in BigQuery. Use when you need to interact with BigQuery, run SQL queries, manage BigQuery resources (datasets, tables, views), or perform basic data ingestion and analysis.0 installs bigquery-ai-mlLeverages BigQuery's built-in machine learning and GenAI capabilities for advanced data analytics. Use when you need to write SQL queries that perform time-series forecasting, detect outliers, find key drivers, or leverage generative AI capabilities in BigQuery.0 installs gke-observabilityConfigures GKE observability, including Cloud Logging, Cloud Monitoring, and managed Prometheus. Use when configuring GKE monitoring, setting up GKE logging, or configuring Prometheus metrics collection. Don't use to configure local application logging frameworks or external APMs outside GKE.0 installs google-cloud-networking-observabilityInvestigates Google Cloud networking issues by analyzing logs, metrics, and diagnostics. Use when investigating VPC Flow Logs (including cost estimation), NAT, firewall, or threat logs, querying latency and throughput metrics, or running Connectivity Tests for path diagnostics. Don't use for generic VM management or non-observability tasks.0 installs google-cloud-waf-reliabilityGenerates reliability-focused guidance for Google Cloud workloads based on the design principles and recommendations in the Google Cloud Well-Architected Framework. Use this skill to evaluate a workload, identify reliability requirements, and provide actionable recommendations for build, deploy, and manage the workload reliably in Google Cloud.0 installs google-mobile-ads-rewardedProvides instructions for implementing, integrating, or configuring Google Mobile Ads (GMA) SDK rewarded ads in Android or iOS mobile applications. Use this skill when the task involves setting up rewarded ads. Don't use for "rewarded interstitial" ads.0 installs

More Build skills

Browse category →

ui-design-systemUI design system toolkit for Senior UI Designer including design token generation, component documentation, responsive design calculations, and developer handoff tools. Use for creating design systems, maintaining visual consistency, and facilitating design-dev collaboration.60 installs self-improving-agentCurate Claude Code's auto-memory into durable project knowledge. Analyze MEMORY.md for patterns, promote proven learnings to CLAUDE.md and .claude/rules/, extract recurring solutions into reusable skills. Use when: (1) reviewing what Claude has learned about your project, (2) graduating a pattern from notes to enforced rules, (3) turning a debugging solution into a skill, (4) checking memory health and capacity.31 installs senior-frontendFrontend development skill for React, Next.js, TypeScript, and Tailwind CSS applications. Use when building React components, optimizing Next.js performance, analyzing bundle sizes, scaffolding frontend projects, implementing accessibility, or reviewing frontend code quality.19 installs frontend-designGuidance for distinctive, intentional visual design when building new UI or reshaping an existing one. Helps with aesthetic direction, typography, and making choices that don't read as templated defaults.18 installs using-superpowersUse when starting any conversation - establishes how to find and use skills, requiring skill invocation before ANY response including clarifying questions17 installs senior-backendDesigns and implements backend systems including REST APIs, microservices, database architectures, authentication flows, and security hardening. Use when the user asks to "design REST APIs", "optimize database queries", "implement authentication", "build microservices", "review backend code", "set up GraphQL", "handle database migrations", or "load test APIs". Covers Node.js/Express/Fastify development, PostgreSQL optimization, API security, and backend architecture patterns.12 installs

Frequently asked questions about agent-platform-eval-flywheel

What does the agent-platform-eval-flywheel skill do?

How do I install the agent-platform-eval-flywheel skill?

Run `npx @skills-hub-ai/cli install google-cloud-skills-agent-platform-eval-flywheel` from your terminal. The CLI writes the SKILL.md to the correct location for your AI tool (e.g. ~/.claude/skills/google-cloud-skills-agent-platform-eval-flywheel/ for Claude Code or ~/.cursor/skills/ for Cursor with --target cursor) and adds it to your project's .skills.json lockfile.

Which AI tools does agent-platform-eval-flywheel work with?

agent-platform-eval-flywheel runs in Claude Code. It follows the open Agent Skills standard (SKILL.md), so the same skill works in every supported tool without modification.

Is the agent-platform-eval-flywheel skill free?

Yes. Every skill on skills-hub.ai is free and open-source. There are no premium tiers, paywalls, or usage limits. You only pay for whatever AI assistant you're already using.

How do I use agent-platform-eval-flywheel after installing it?

In Claude Code, type `/google-cloud-skills-agent-platform-eval-flywheel` (or whatever slash command the skill registers) and the AI follows the skill's instructions immediately. You can also reference it by name in natural language, your AI loads the skill into context when relevant.

Can I share the agent-platform-eval-flywheel skill with my team?

Yes. Commit your project's .skills.json lockfile and teammates run `npx @skills-hub-ai/cli install` (no args) to install every skill at the exact version you pinned. Organization-scoped installs work via skills-hub.ai organizations.