spark-python-data-source

Build custom Python data sources for Apache Spark using the PySpark DataSource API — batch and streaming readers/writers for external systems. Use this skill whenever someone wants to connect Spark to an external system (database, API, message queue, custom protocol), build a Spark connector or plugin in Python, implement a DataSourceReader or DataSourceWriter, pull data from or push data to a system via Spark, or work with the PySpark DataSource API in any way. Even if they just say "read from X in Spark" or "write DataFrame to Y" and there's no native connector, this skill applies.

v1.0.0New

#databricks

Unsigned, install at your own risk

Unverified

This skill has no cryptographic signature attached. We can't verify the contents match what the publisher intended.

Install this skill

Run this command in your terminal. No account required — it auto-detects your AI tool and installs the skill file.

npx @skills-hub-ai/cli install databricks-spark-python-data-source

Or download directly:

Browse all CLI commands →

Setup by platform

Claude Code

~/.claude/skills/<skill>/SKILL.md

Setup guide →

Install

One-click setup for your editor

Run in your project root

npx @skills-hub-ai/cli install databricks-spark-python-data-source --target claude-code

Instructions

This skill doesn’t include stateful context yet, instructions only. Learn about stateful skills.

Security

Loading security scan...

Reviews (0)

View full changelog & diffs →

Browse all →

databricks-synthetic-data-genGenerate realistic synthetic data using Spark + Faker (strongly recommended). Supports serverless execution, multiple output formats (Parquet/JSON/CSV/Delta), and scales from thousands to millions of rows. For small datasets (<10K rows), can optionally generate locally and upload to volumes. Use when user mentions 'synthetic data', 'test data', 'generate data', 'demo dataset', 'Faker', or 'sample data'.0 installs databricks-zerobus-ingestBuild Zerobus Ingest clients for near real-time data ingestion into Databricks Delta tables via gRPC. Use when creating producers that write directly to Unity Catalog tables without a message bus, working with the Zerobus Ingest SDK in Python/Java/Go/TypeScript/Rust, generating Protobuf schemas from UC tables, or implementing stream-based ingestion with ACK handling and retry logic.0 installs databricks-ai-runtimeDatabricks AI Runtime (`air`) CLI — the command-line tool for submitting and managing GPU training workloads on Databricks serverless compute. Use for: running `air` workloads, custom Docker image setup, environment configuration, and troubleshooting `air` jobs.0 installs databricks-ai-functionsUse Databricks built-in AI Functions (ai_classify, ai_extract, ai_summarize, ai_mask, ai_translate, ai_fix_grammar, ai_gen, ai_analyze_sentiment, ai_similarity, ai_parse_document, ai_query, ai_forecast) to add AI capabilities directly to SQL and PySpark pipelines without managing model endpoints. Also covers document parsing and building custom RAG pipelines (parse → chunk → index → query).0 installs databricks-dbsqlDatabricks SQL (DBSQL) advanced features and SQL warehouse capabilities. This skill MUST be invoked when the user mentions: "DBSQL", "Databricks SQL", "SQL warehouse", "SQL scripting", "stored procedure", "CALL procedure", "materialized view", "CREATE MATERIALIZED VIEW", "pipe syntax", "|>", "geospatial", "H3", "ST_", "spatial SQL", "collation", "COLLATE", "ai_query", "ai_classify", "ai_extract", "ai_gen", "AI function", "http_request", "remote_query", "read_files", "Lakehouse Federation", "recursive CTE", "WITH RECURSIVE", "multi-statement transaction", "temp table", "temporary view", "pipe operator". SHOULD also invoke when the user asks about SQL best practices, data modeling patterns, or advanced SQL features on Databricks.0 installs databricksDatabricks CLI operations: auth, profiles, data exploration, and bundles. Contains up-to-date guidelines for Databricks-related CLI tasks.0 installs

More from Databricks

View source →

databricks-python-sdkDatabricks development guidance including Python SDK, Databricks Connect, CLI, and REST API. Use when working with databricks-sdk, databricks-connect, or Databricks APIs.1 installs databricks-coreDatabricks CLI operations and the parent/entry-point skill for Databricks CLI use: authentication, profile selection, and bundles. Load this first for CLI, auth, profile, and bundle tasks, then load the matching product skill. For finding or exploring data, answering questions about the data, or generating SQL, load the databricks-data-discovery skill (it routes to Genie One). Contains up-to-date guidelines for Databricks-related CLI tasks.0 installs databricks-dabsCreate, configure, validate, deploy, run, and manage Declarative Automation Bundles (DABs, formerly Databricks Asset Bundles). Use when working with Databricks resources via DABs including dashboards, jobs, pipelines, alerts, volumes, and apps.0 installs databricks-lakebaseDatabricks Lakebase Postgres: projects, scaling, connectivity, Lakebase synced tables, and Data API. Use when asked about Lakebase databases, OLTP storage, or connecting apps to Postgres on Databricks.0 installs databricks-app-designDesign the UX of custom-code Databricks Apps (AppKit/React) data screens — KPI/overview pages, reports, charts, tables, and Genie/chat data assistants — mapped to concrete AppKit components. Use when BUILDING or reviewing the UI of an AppKit/React app that displays data or answers data questions: choosing genre, layout, charts, KPIs, semantic color, required states (loading/empty/error), IBCS notation, and AI-result trust (showing generated SQL/sources for Genie/chat). A plain "create a dashboard" request means a managed AI/BI (Lakeview) dashboard → use databricks-aibi-dashboards, NOT this skill. Also NOT for non-data frontend (forms, settings, auth, marketing) or scaffolding/build/deploy (→ databricks-apps). Complements databricks-apps; use it alongside whenever a custom app has a chart, table, KPI, report, or Genie/chat/AI surface.0 installs databricks-pipelinesDevelop Lakeflow Spark Declarative Pipelines (formerly Delta Live Tables) on Databricks. Use when building batch or streaming data pipelines with Python or SQL. Invoke BEFORE starting implementation.0 installs

More Build skills

Browse category →

ui-design-systemUI design system toolkit for Senior UI Designer including design token generation, component documentation, responsive design calculations, and developer handoff tools. Use for creating design systems, maintaining visual consistency, and facilitating design-dev collaboration.56 installs self-improving-agentCurate Claude Code's auto-memory into durable project knowledge. Analyze MEMORY.md for patterns, promote proven learnings to CLAUDE.md and .claude/rules/, extract recurring solutions into reusable skills. Use when: (1) reviewing what Claude has learned about your project, (2) graduating a pattern from notes to enforced rules, (3) turning a debugging solution into a skill, (4) checking memory health and capacity.30 installs senior-frontendFrontend development skill for React, Next.js, TypeScript, and Tailwind CSS applications. Use when building React components, optimizing Next.js performance, analyzing bundle sizes, scaffolding frontend projects, implementing accessibility, or reviewing frontend code quality.18 installs frontend-designGuidance for distinctive, intentional visual design when building new UI or reshaping an existing one. Helps with aesthetic direction, typography, and making choices that don't read as templated defaults.17 installs using-superpowersUse when starting any conversation - establishes how to find and use skills, requiring skill invocation before ANY response including clarifying questions17 installs senior-backendDesigns and implements backend systems including REST APIs, microservices, database architectures, authentication flows, and security hardening. Use when the user asks to "design REST APIs", "optimize database queries", "implement authentication", "build microservices", "review backend code", "set up GraphQL", "handle database migrations", or "load test APIs". Covers Node.js/Express/Fastify development, PostgreSQL optimization, API security, and backend architecture patterns.12 installs

Frequently asked questions about spark-python-data-source

What does the spark-python-data-source skill do?

How do I install the spark-python-data-source skill?

Run `npx @skills-hub-ai/cli install databricks-spark-python-data-source` from your terminal. The CLI writes the SKILL.md to the correct location for your AI tool (e.g. ~/.claude/skills/databricks-spark-python-data-source/ for Claude Code or ~/.cursor/skills/ for Cursor with --target cursor) and adds it to your project's .skills.json lockfile.

Which AI tools does spark-python-data-source work with?

spark-python-data-source runs in Claude Code. It follows the open Agent Skills standard (SKILL.md), so the same skill works in every supported tool without modification.

Is the spark-python-data-source skill free?

Yes. Every skill on skills-hub.ai is free and open-source. There are no premium tiers, paywalls, or usage limits. You only pay for whatever AI assistant you're already using.

How do I use spark-python-data-source after installing it?

In Claude Code, type `/databricks-spark-python-data-source` (or whatever slash command the skill registers) and the AI follows the skill's instructions immediately. You can also reference it by name in natural language, your AI loads the skill into context when relevant.

Can I share the spark-python-data-source skill with my team?

Yes. Commit your project's .skills.json lockfile and teammates run `npx @skills-hub-ai/cli install` (no args) to install every skill at the exact version you pinned. Organization-scoped installs work via skills-hub.ai organizations.