zhongwei/gh-jamie-bitflight-claude-skills-development-skills

Files

Zhongwei Li 5007abf04b Initial commit

2025-11-29 18:49:58 +08:00

20 KiB

Raw Blame History

title, library_name, pypi_package, category, python_compatibility, last_updated, official_docs, official_repository, maintenance_status

title	library_name	pypi_package	category	python_compatibility	last_updated	official_docs	official_repository	maintenance_status
Datasette: Instant JSON API for Your SQLite Data	datasette	datasette	data_exploration	3.10+	2025-11-02	https://docs.datasette.io	https://github.com/simonw/datasette	active

Datasette - Instant Data Publishing and Exploration

Executive Summary

Datasette is an open-source tool for exploring and publishing data. It transforms any SQLite database into an interactive website with a full JSON API, requiring zero code. Designed for data journalists, museum curators, archivists, local governments, scientists, and researchers, Datasette makes data sharing and exploration accessible to anyone with data to publish.

Core Value Proposition: Take data of any shape or size and instantly publish it as an explorable website with a corresponding API, without writing application code.

Official Information

Repository: https://github.com/simonw/datasette @ simonw/datasette
PyPI: datasette @ https://pypi.org/project/datasette/
Current Development Version: 1.0a19 (alpha)
Current Stable Version: 0.65.1
Documentation: https://docs.datasette.io/ @ docs.datasette.io
License: Apache License 2.0 @ https://github.com/simonw/datasette/blob/main/LICENSE
Maintenance Status: Actively maintained (647 open issues, last updated 2025-10-21)
Community: Discord @ https://datasette.io/discord, Newsletter @ https://datasette.substack.com/

What Problem Does Datasette Solve?

The Problem

Organizations and individuals have valuable data in SQLite databases, CSV files, or other formats, but:

Building a web interface to explore data requires significant development effort
Creating APIs for data access requires backend development expertise
Publishing data in an accessible, explorable format is time-consuming
Sharing data insights requires custom visualization tools
Data exploration often requires SQL knowledge or specialized tools

The Solution

Datasette provides:

Instant Web Interface: Automatic web UI for any SQLite database
Automatic API: Full JSON API with no code required
SQL Query Interface: Built-in SQL editor with query sharing
Plugin Ecosystem: 300+ plugins for extending functionality @ https://datasette.io/plugins
One-Command Publishing: Deploy to cloud platforms with a single command
Zero-Setup Exploration: Browse, filter, and facet data immediately

What Would Be Reinventing the Wheel

Without Datasette, you would need to build:

Custom web application for data browsing
RESTful API endpoints for data access
SQL query interface with security controls
Data export functionality (JSON, CSV)
Full-text search integration
Authentication and authorization system
Pagination and filtering logic
Deployment configuration and hosting setup

Example: Publishing a dataset of 100,000 records would require weeks of development work. With Datasette: datasette publish cloudrun mydata.db --service=mydata

Real-World Usage Patterns

Pattern 1: Publishing Open Data (Government/Research)

Context: @ https://github.com/simonw/covid-19-datasette

# Convert CSV to SQLite
csvs-to-sqlite covid-data.csv covid.db

# Publish to Cloud Run with metadata
datasette publish cloudrun covid.db \
  --service=covid-tracker \
  --metadata metadata.json \
  --install=datasette-vega

Use Case: Local governments publishing COVID-19 statistics, election results, or public records.

Pattern 2: Personal Data Archives (Dogsheep Pattern)

Context: @ https://github.com/dogsheep

# Export Twitter data to SQLite
twitter-to-sqlite user-timeline twitter.db

# Export GitHub activity
github-to-sqlite repos github.db

# Export Apple Health data
healthkit-to-sqlite export.zip health.db

# Explore everything together
datasette twitter.db github.db health.db --crossdb

Use Case: Personal data liberation - exploring your own data from various platforms.

Pattern 3: Data Journalism and Investigation

Context: @ https://github.com/simonw/laion-aesthetic-datasette

# Load and explore LAION training data
import sqlite_utils

db = sqlite_utils.Database("images.db")
db["images"].insert_all(image_data)
db["images"].enable_fts(["caption", "url"])

# Launch with custom template
datasette images.db \
  --template-dir templates/ \
  --metadata metadata.json

Use Case: Exploring large datasets like Stable Diffusion training data, analyzing patterns.

Pattern 4: Internal Tools and Dashboards

Context: @ https://github.com/rclement/datasette-dashboards

# datasette.yaml - Configure dashboards
databases:
  analytics:
    queries:
      daily_users:
        sql: |
          SELECT date, count(*) as users
          FROM events
          WHERE event_type = 'login'
          GROUP BY date
          ORDER BY date DESC
        title: Daily Active Users

Installation:

datasette install datasette-dashboards
datasette analytics.db --config datasette.yaml

Use Case: Building internal analytics dashboards without BI tools.

Pattern 5: API Backend for Applications

Context: @ https://github.com/simonw/datasette-graphql

# Install GraphQL plugin
datasette install datasette-graphql

# Launch with authentication
datasette data.db \
  --root \
  --cors \
  --setting default_cache_ttl 3600

GraphQL Query:

{
  products(first: 10, where: { price_gt: 100 }) {
    nodes {
      id
      name
      price
    }
  }
}

Use Case: Using Datasette as a read-only API backend for mobile/web apps.

Integration Patterns

Core Data Integrations

SQLite Native:

import sqlite3
conn = sqlite3.connect('data.db')
# Datasette reads directly

CSV/JSON Import via sqlite-utils @ https://github.com/simonw/sqlite-utils:

sqlite-utils insert data.db records records.json
csvs-to-sqlite *.csv data.db

Database Migration via db-to-sqlite @ https://github.com/simonw/db-to-sqlite:

# Export from PostgreSQL
db-to-sqlite "postgresql://user:pass@host/db" data.db --table=events

# Export from MySQL
db-to-sqlite "mysql://user:pass@host/db" data.db --all

Companion Libraries

sqlite-utils: Database manipulation @ https://github.com/simonw/sqlite-utils
csvs-to-sqlite: CSV import @ https://github.com/simonw/csvs-to-sqlite
datasette-extract: AI-powered data extraction @ https://github.com/datasette/datasette-extract
datasette-parquet: Parquet/DuckDB support @ https://github.com/cldellow/datasette-parquet

Deployment Patterns

Cloud Run @ https://docs.datasette.io/en/stable/publish.html:

datasette publish cloudrun data.db \
  --service=myapp \
  --install=datasette-vega \
  --install=datasette-cluster-map \
  --metadata metadata.json

Vercel via datasette-publish-vercel @ https://github.com/simonw/datasette-publish-vercel:

pip install datasette-publish-vercel
datasette publish vercel data.db --project my-data

Fly.io via datasette-publish-fly @ https://github.com/simonw/datasette-publish-fly:

pip install datasette-publish-fly
datasette publish fly data.db --app=my-datasette

Docker:

FROM datasetteproject/datasette
COPY *.db /data/
RUN datasette install datasette-vega
CMD datasette serve /data/*.db --host 0.0.0.0 --cors

Python Version Compatibility

Official Support Matrix

Python Version	Status	Notes
3.10	Minimum Required	@ setup.py python_requires=">=3.10"
3.11	✅ Fully Supported	Recommended for production
3.12	✅ Fully Supported	Tested in CI
3.13	✅ Fully Supported	Tested in CI
3.14	✅ Fully Supported	Tested in CI
3.9 and below	❌ Not Supported	Deprecated as of v1.0

Version-Specific Considerations

Python 3.10+:

Uses importlib.metadata for plugin loading
Native match/case statements in codebase (likely in v1.0+)
Type hints using modern syntax

Python 3.11+ Benefits:

Better async performance (important for ASGI)
Faster startup times
Improved error messages

No Breaking Changes Expected: Datasette maintains backward compatibility within major versions.

Usage Examples

Basic Usage

# Install
pip install datasette
# or
brew install datasette

# Serve a database
datasette data.db

# Open in browser automatically
datasette data.db -o

# Serve multiple databases
datasette db1.db db2.db db3.db

# Enable cross-database queries
datasette db1.db db2.db --crossdb

Configuration Example

metadata.json @ https://docs.datasette.io/en/stable/metadata.html:

{
  "title": "My Data Project",
  "description": "Exploring public datasets",
  "license": "CC BY 4.0",
  "license_url": "https://creativecommons.org/licenses/by/4.0/",
  "source": "Data Sources",
  "source_url": "https://example.com/sources",
  "databases": {
    "mydb": {
      "tables": {
        "events": {
          "title": "Event Log",
          "description": "System event records",
          "hidden": false
        }
      }
    }
  }
}

datasette.yaml @ https://docs.datasette.io/en/stable/configuration.html:

settings:
  default_page_size: 50
  sql_time_limit_ms: 3500
  max_returned_rows: 2000

plugins:
  datasette-cluster-map:
    latitude_column: lat
    longitude_column: lng

databases:
  mydb:
    queries:
      popular_events:
        sql: |
          SELECT event_type, COUNT(*) as count
          FROM events
          GROUP BY event_type
          ORDER BY count DESC
          LIMIT 10
        title: Most Popular Events

Plugin Development Example

Simple Plugin @ https://docs.datasette.io/en/stable/writing_plugins.html:

from datasette import hookimpl

@hookimpl
def prepare_connection(conn):
    """Add custom SQL functions"""
    conn.create_function("is_even", 1, lambda x: x % 2 == 0)

@hookimpl
def extra_template_vars(request):
    """Add variables to templates"""
    return {
        "custom_message": "Hello from plugin!"
    }

setup.py:

setup(
    name="datasette-my-plugin",
    version="0.1",
    py_modules=["datasette_my_plugin"],
    entry_points={
        "datasette": [
            "my_plugin = datasette_my_plugin"
        ]
    },
    install_requires=["datasette>=0.60"],
)

Advanced: Python API Usage

Programmatic Access @ https://docs.datasette.io/en/stable/internals.html:

from datasette.app import Datasette
import asyncio

async def explore_data():
    # Initialize Datasette
    ds = Datasette(files=["data.db"])

    # Execute query
    result = await ds.execute(
        "data",
        "SELECT * FROM users WHERE age > :age",
        {"age": 18}
    )

    # Access rows
    for row in result.rows:
        print(dict(row))

    # Get table info
    db = ds.get_database("data")
    tables = await db.table_names()
    print(f"Tables: {tables}")

asyncio.run(explore_data())

Testing Plugins

pytest Example @ https://docs.datasette.io/en/stable/testing_plugins.html:

import pytest
from datasette.app import Datasette

@pytest.mark.asyncio
async def test_homepage():
    ds = Datasette(memory=True)
    await ds.invoke_startup()

    response = await ds.client.get("/")
    assert response.status_code == 200
    assert "<!DOCTYPE html>" in response.text

@pytest.mark.asyncio
async def test_json_api():
    ds = Datasette(memory=True)

    # Create test data
    db = ds.add_database(Database(ds, memory_name="test"))
    await db.execute_write(
        "CREATE TABLE items (id INTEGER PRIMARY KEY, name TEXT)"
    )

    # Query via API
    response = await ds.client.get("/test/items.json")
    assert response.status_code == 200
    data = response.json()
    assert data["rows"] == []

When NOT to Use Datasette

❌ Scenarios Where Datasette Is Inappropriate

High-Write Applications
- Datasette is optimized for read-heavy workloads
- SQLite has write limitations with concurrent access
- Better Alternative: PostgreSQL with PostgREST, or Django REST Framework
Real-Time Collaborative Editing
- No built-in support for concurrent data editing
- Read-only by default (writes require plugins)
- Better Alternative: Airtable, Retool, or custom CRUD application
Large-Scale Data Warehousing
- SQLite works well up to ~100GB, struggles beyond
- Not designed for massive analytical workloads
- Better Alternative: DuckDB with MotherDuck, or BigQuery with Looker
Complex BI Dashboards
- Limited visualization capabilities without plugins
- Not a replacement for full BI platforms
- Better Alternative: Apache Superset @ https://github.com/apache/superset, Metabase @ https://github.com/metabase/metabase, or Grafana
Transactional Systems
- Not designed for OLTP workloads
- Limited transaction support
- Better Alternative: Django ORM with PostgreSQL, or FastAPI with SQLAlchemy
User Authentication and Authorization
- Basic auth support, but not a full auth system
- RBAC requires plugins and configuration
- Better Alternative: Use Datasette behind proxy with auth, or use Metabase for built-in user management
Non-Relational Data
- Optimized for relational SQLite data
- Document stores require workarounds
- Better Alternative: MongoDB with Mongo Express, or Elasticsearch with Kibana

⚠️ Use With Caution

Sensitive Data Without Proper Access Controls
- Default is public access
- Requires careful permission configuration
- Mitigation: Use --root for admin access, configure permissions @ https://docs.datasette.io/en/stable/authentication.html
Production Without Rate Limiting
- No built-in rate limiting
- Can be overwhelmed by traffic
- Mitigation: Deploy behind reverse proxy with rate limiting, or use Cloud Run with concurrency limits

Decision Matrix

✅ Use Datasette When

Scenario	Why Datasette Excels
Publishing static/semi-static datasets	Zero-code instant publication
Data journalism and investigation	SQL interface + full-text search + shareable queries
Personal data exploration (Dogsheep)	Cross-database queries, plugin ecosystem
Internal read-only dashboards	Fast setup, minimal infrastructure
Prototyping data APIs	Instant JSON API, no backend code
Open data portals	Built-in metadata, documentation, CSV export
SQLite file exploration	Best-in-class SQLite web interface
Low-traffic reference data	Excellent for datasets < 100GB

❌ Don't Use Datasette When

Scenario	Why It's Not Suitable	Better Alternative
Building a CRUD application	Read-focused, limited write support	Django, FastAPI + SQLAlchemy
Real-time analytics	Not designed for streaming data	InfluxDB, TimescaleDB
Multi-tenant SaaS app	Limited isolation, no row-level security	PostgreSQL + RLS
Heavy concurrent writes	SQLite write limitations	PostgreSQL, MySQL
Terabyte-scale data	SQLite size constraints	DuckDB, BigQuery, Snowflake
Enterprise BI with governance	Limited data modeling layer	Looker, dbt + Metabase
Complex visualization needs	Basic charts without plugins	Apache Superset, Tableau
Document/graph data	Relational focus	MongoDB, Neo4j

Comparison with Alternatives

vs. Apache Superset @ https://github.com/apache/superset

When to use Superset over Datasette:

Need advanced visualizations (50+ chart types vs. basic plugins)
Enterprise BI with complex dashboards
Multiple data source types (not just SQLite)
Large team collaboration with RBAC

When to use Datasette over Superset:

Simpler deployment and setup
Focus on data exploration over dashboarding
Primarily working with SQLite databases
Want instant API alongside web interface

vs. Metabase @ https://github.com/metabase/metabase

When to use Metabase over Datasette:

Need business user-friendly query builder
Want built-in email reports and scheduling
Require user management and permissions UI
Need mobile app support

When to use Datasette over Metabase:

Working primarily with SQLite
Want plugin extensibility
Need instant deployment (lighter weight)
Want API-first design

vs. Custom Flask/FastAPI Application

When to build custom over Datasette:

Complex business logic required
Heavy write operations
Custom authentication flows
Specific UX requirements

When to use Datasette over custom:

Rapid prototyping (hours vs. weeks)
Standard data exploration needs
Focus on data, not application development
Leverage plugin ecosystem

Key Insights and Recommendations

Core Strengths

Speed to Value: From data to published website in minutes
Plugin Ecosystem: 300+ plugins for extending functionality @ https://datasette.io/plugins
API-First Design: JSON API is a first-class citizen
Deployment Simplicity: One command to cloud platforms
Open Source Community: Active development, responsive maintainer

Best Practices

Use sqlite-utils for data prep @ https://github.com/simonw/sqlite-utils:

sqlite-utils insert data.db table data.json --pk=id
sqlite-utils enable-fts data.db table column1 column2

Configure permissions properly:

databases:
  private:
    allow:
      id: admin_user

Use immutable mode for static data:
```
datasette data.db --immutable
```

Leverage canned queries for common patterns:

queries:
  search:
    sql: SELECT * FROM items WHERE name LIKE :query

Install datasette-hashed-urls for caching @ https://github.com/simonw/datasette-hashed-urls:
```
datasette install datasette-hashed-urls
```

Migration Path

From spreadsheets to Datasette:

csvs-to-sqlite data.csv data.db
datasette data.db

From PostgreSQL to Datasette:

db-to-sqlite "postgresql://user:pass@host/db" data.db
datasette data.db

From Datasette to production app:

Use Datasette for prototyping and exploration
Migrate to FastAPI/Django when write operations become critical
Keep Datasette for read-only reporting interface

Summary

Datasette excels at making data instantly explorable and shareable. It's the fastest path from data to published website with API. Use it for read-heavy workflows, data journalism, personal data archives, and rapid prototyping. Avoid it for write-heavy applications, enterprise BI, or large-scale data warehousing.

TL;DR: If you have data and want to publish it or explore it quickly without writing application code, use Datasette. If you need complex transactions, real-time collaboration, or enterprise BI features, choose a different tool.

References

Official Documentation @ https://docs.datasette.io/
GitHub Repository @ https://github.com/simonw/datasette
Plugin Directory @ https://datasette.io/plugins
Context7 Documentation @ /simonw/datasette (949 code snippets)
Dogsheep Project @ https://github.com/dogsheep (Personal data toolkit)
Datasette Lite (WebAssembly) @ https://lite.datasette.io/
Community Discord @ https://datasette.io/discord
Newsletter @ https://datasette.substack.com/

20 KiB Raw Blame History