--- title: "Datasette: Instant JSON API for Your SQLite Data" library_name: datasette pypi_package: datasette category: data_exploration python_compatibility: "3.10+" last_updated: "2025-11-02" official_docs: "https://docs.datasette.io" official_repository: "https://github.com/simonw/datasette" maintenance_status: "active" --- # Datasette - Instant Data Publishing and Exploration ## Executive Summary Datasette is an open-source tool for exploring and publishing data. It transforms any SQLite database into an interactive website with a full JSON API, requiring zero code. Designed for data journalists, museum curators, archivists, local governments, scientists, and researchers, Datasette makes data sharing and exploration accessible to anyone with data to publish. **Core Value Proposition**: Take data of any shape or size and instantly publish it as an explorable website with a corresponding API, without writing application code. ## Official Information - **Repository**: @ simonw/datasette - **PyPI**: `datasette` @ - **Current Development Version**: 1.0a19 (alpha) - **Current Stable Version**: 0.65.1 - **Documentation**: @ docs.datasette.io - **License**: Apache License 2.0 @ - **Maintenance Status**: Actively maintained (647 open issues, last updated 2025-10-21) - **Community**: Discord @ , Newsletter @ ## What Problem Does Datasette Solve? ### The Problem Organizations and individuals have valuable data in SQLite databases, CSV files, or other formats, but: - Building a web interface to explore data requires significant development effort - Creating APIs for data access requires backend development expertise - Publishing data in an accessible, explorable format is time-consuming - Sharing data insights requires custom visualization tools - Data exploration often requires SQL knowledge or specialized tools ### The Solution Datasette provides: 1. **Instant Web Interface**: Automatic web UI for any SQLite database 2. **Automatic API**: Full JSON API with no code required 3. **SQL Query Interface**: Built-in SQL editor with query sharing 4. **Plugin Ecosystem**: 300+ plugins for extending functionality @ 5. **One-Command Publishing**: Deploy to cloud platforms with a single command 6. **Zero-Setup Exploration**: Browse, filter, and facet data immediately ### What Would Be Reinventing the Wheel Without Datasette, you would need to build: - Custom web application for data browsing - RESTful API endpoints for data access - SQL query interface with security controls - Data export functionality (JSON, CSV) - Full-text search integration - Authentication and authorization system - Pagination and filtering logic - Deployment configuration and hosting setup **Example**: Publishing a dataset of 100,000 records would require weeks of development work. With Datasette: `datasette publish cloudrun mydata.db --service=mydata` ## Real-World Usage Patterns ### Pattern 1: Publishing Open Data (Government/Research) **Context**: @ ```bash # Convert CSV to SQLite csvs-to-sqlite covid-data.csv covid.db # Publish to Cloud Run with metadata datasette publish cloudrun covid.db \ --service=covid-tracker \ --metadata metadata.json \ --install=datasette-vega ``` **Use Case**: Local governments publishing COVID-19 statistics, election results, or public records. ### Pattern 2: Personal Data Archives (Dogsheep Pattern) **Context**: @ ```bash # Export Twitter data to SQLite twitter-to-sqlite user-timeline twitter.db # Export GitHub activity github-to-sqlite repos github.db # Export Apple Health data healthkit-to-sqlite export.zip health.db # Explore everything together datasette twitter.db github.db health.db --crossdb ``` **Use Case**: Personal data liberation - exploring your own data from various platforms. ### Pattern 3: Data Journalism and Investigation **Context**: @ ```python # Load and explore LAION training data import sqlite_utils db = sqlite_utils.Database("images.db") db["images"].insert_all(image_data) db["images"].enable_fts(["caption", "url"]) # Launch with custom template datasette images.db \ --template-dir templates/ \ --metadata metadata.json ``` **Use Case**: Exploring large datasets like Stable Diffusion training data, analyzing patterns. ### Pattern 4: Internal Tools and Dashboards **Context**: @ ```yaml # datasette.yaml - Configure dashboards databases: analytics: queries: daily_users: sql: | SELECT date, count(*) as users FROM events WHERE event_type = 'login' GROUP BY date ORDER BY date DESC title: Daily Active Users ``` **Installation**: ```bash datasette install datasette-dashboards datasette analytics.db --config datasette.yaml ``` **Use Case**: Building internal analytics dashboards without BI tools. ### Pattern 5: API Backend for Applications **Context**: @ ```bash # Install GraphQL plugin datasette install datasette-graphql # Launch with authentication datasette data.db \ --root \ --cors \ --setting default_cache_ttl 3600 ``` **GraphQL Query**: ```graphql { products(first: 10, where: { price_gt: 100 }) { nodes { id name price } } } ``` **Use Case**: Using Datasette as a read-only API backend for mobile/web apps. ## Integration Patterns ### Core Data Integrations 1. **SQLite Native**: ```python import sqlite3 conn = sqlite3.connect('data.db') # Datasette reads directly ``` 2. **CSV/JSON Import** via `sqlite-utils` @ : ```bash sqlite-utils insert data.db records records.json csvs-to-sqlite *.csv data.db ``` 3. **Database Migration** via `db-to-sqlite` @ : ```bash # Export from PostgreSQL db-to-sqlite "postgresql://user:pass@host/db" data.db --table=events # Export from MySQL db-to-sqlite "mysql://user:pass@host/db" data.db --all ``` ### Companion Libraries - **sqlite-utils**: Database manipulation @ - **csvs-to-sqlite**: CSV import @ - **datasette-extract**: AI-powered data extraction @ - **datasette-parquet**: Parquet/DuckDB support @ ### Deployment Patterns **Cloud Run** @ : ```bash datasette publish cloudrun data.db \ --service=myapp \ --install=datasette-vega \ --install=datasette-cluster-map \ --metadata metadata.json ``` **Vercel** via `datasette-publish-vercel` @ : ```bash pip install datasette-publish-vercel datasette publish vercel data.db --project my-data ``` **Fly.io** via `datasette-publish-fly` @ : ```bash pip install datasette-publish-fly datasette publish fly data.db --app=my-datasette ``` **Docker**: ```dockerfile FROM datasetteproject/datasette COPY *.db /data/ RUN datasette install datasette-vega CMD datasette serve /data/*.db --host 0.0.0.0 --cors ``` ## Python Version Compatibility ### Official Support Matrix | Python Version | Status | Notes | | -------------- | -------------------- | ----------------------------------- | | 3.10 | **Minimum Required** | @ setup.py python_requires=">=3.10" | | 3.11 | ✅ Fully Supported | Recommended for production | | 3.12 | ✅ Fully Supported | Tested in CI | | 3.13 | ✅ Fully Supported | Tested in CI | | 3.14 | ✅ Fully Supported | Tested in CI | | 3.9 and below | ❌ Not Supported | Deprecated as of v1.0 | ### Version-Specific Considerations **Python 3.10+**: - Uses `importlib.metadata` for plugin loading - Native `match/case` statements in codebase (likely in v1.0+) - Type hints using modern syntax **Python 3.11+ Benefits**: - Better async performance (important for ASGI) - Faster startup times - Improved error messages **No Breaking Changes Expected**: Datasette maintains backward compatibility within major versions. ## Usage Examples ### Basic Usage ```bash # Install pip install datasette # or brew install datasette # Serve a database datasette data.db # Open in browser automatically datasette data.db -o # Serve multiple databases datasette db1.db db2.db db3.db # Enable cross-database queries datasette db1.db db2.db --crossdb ``` ### Configuration Example **metadata.json** @ : ```json { "title": "My Data Project", "description": "Exploring public datasets", "license": "CC BY 4.0", "license_url": "https://creativecommons.org/licenses/by/4.0/", "source": "Data Sources", "source_url": "https://example.com/sources", "databases": { "mydb": { "tables": { "events": { "title": "Event Log", "description": "System event records", "hidden": false } } } } } ``` **datasette.yaml** @ : ```yaml settings: default_page_size: 50 sql_time_limit_ms: 3500 max_returned_rows: 2000 plugins: datasette-cluster-map: latitude_column: lat longitude_column: lng databases: mydb: queries: popular_events: sql: | SELECT event_type, COUNT(*) as count FROM events GROUP BY event_type ORDER BY count DESC LIMIT 10 title: Most Popular Events ``` ### Plugin Development Example **Simple Plugin** @ : ```python from datasette import hookimpl @hookimpl def prepare_connection(conn): """Add custom SQL functions""" conn.create_function("is_even", 1, lambda x: x % 2 == 0) @hookimpl def extra_template_vars(request): """Add variables to templates""" return { "custom_message": "Hello from plugin!" } ``` **setup.py**: ```python setup( name="datasette-my-plugin", version="0.1", py_modules=["datasette_my_plugin"], entry_points={ "datasette": [ "my_plugin = datasette_my_plugin" ] }, install_requires=["datasette>=0.60"], ) ``` ### Advanced: Python API Usage **Programmatic Access** @ : ```python from datasette.app import Datasette import asyncio async def explore_data(): # Initialize Datasette ds = Datasette(files=["data.db"]) # Execute query result = await ds.execute( "data", "SELECT * FROM users WHERE age > :age", {"age": 18} ) # Access rows for row in result.rows: print(dict(row)) # Get table info db = ds.get_database("data") tables = await db.table_names() print(f"Tables: {tables}") asyncio.run(explore_data()) ``` ### Testing Plugins **pytest Example** @ : ```python import pytest from datasette.app import Datasette @pytest.mark.asyncio async def test_homepage(): ds = Datasette(memory=True) await ds.invoke_startup() response = await ds.client.get("/") assert response.status_code == 200 assert "" in response.text @pytest.mark.asyncio async def test_json_api(): ds = Datasette(memory=True) # Create test data db = ds.add_database(Database(ds, memory_name="test")) await db.execute_write( "CREATE TABLE items (id INTEGER PRIMARY KEY, name TEXT)" ) # Query via API response = await ds.client.get("/test/items.json") assert response.status_code == 200 data = response.json() assert data["rows"] == [] ``` ## When NOT to Use Datasette ### ❌ Scenarios Where Datasette Is Inappropriate 1. **High-Write Applications** - Datasette is optimized for read-heavy workloads - SQLite has write limitations with concurrent access - **Better Alternative**: PostgreSQL with PostgREST, or Django REST Framework 2. **Real-Time Collaborative Editing** - No built-in support for concurrent data editing - Read-only by default (writes require plugins) - **Better Alternative**: Airtable, Retool, or custom CRUD application 3. **Large-Scale Data Warehousing** - SQLite works well up to ~100GB, struggles beyond - Not designed for massive analytical workloads - **Better Alternative**: DuckDB with MotherDuck, or BigQuery with Looker 4. **Complex BI Dashboards** - Limited visualization capabilities without plugins - Not a replacement for full BI platforms - **Better Alternative**: Apache Superset @ , Metabase @ , or Grafana 5. **Transactional Systems** - Not designed for OLTP workloads - Limited transaction support - **Better Alternative**: Django ORM with PostgreSQL, or FastAPI with SQLAlchemy 6. **User Authentication and Authorization** - Basic auth support, but not a full auth system - RBAC requires plugins and configuration - **Better Alternative**: Use Datasette behind proxy with auth, or use Metabase for built-in user management 7. **Non-Relational Data** - Optimized for relational SQLite data - Document stores require workarounds - **Better Alternative**: MongoDB with Mongo Express, or Elasticsearch with Kibana ### ⚠️ Use With Caution 1. **Sensitive Data Without Proper Access Controls** - Default is public access - Requires careful permission configuration - **Mitigation**: Use `--root` for admin access, configure permissions @ 2. **Production Without Rate Limiting** - No built-in rate limiting - Can be overwhelmed by traffic - **Mitigation**: Deploy behind reverse proxy with rate limiting, or use Cloud Run with concurrency limits ## Decision Matrix ### ✅ Use Datasette When | Scenario | Why Datasette Excels | | -------------------------------------- | ---------------------------------------------------- | | Publishing static/semi-static datasets | Zero-code instant publication | | Data journalism and investigation | SQL interface + full-text search + shareable queries | | Personal data exploration (Dogsheep) | Cross-database queries, plugin ecosystem | | Internal read-only dashboards | Fast setup, minimal infrastructure | | Prototyping data APIs | Instant JSON API, no backend code | | Open data portals | Built-in metadata, documentation, CSV export | | SQLite file exploration | Best-in-class SQLite web interface | | Low-traffic reference data | Excellent for datasets < 100GB | ### ❌ Don't Use Datasette When | Scenario | Why It's Not Suitable | Better Alternative | | ----------------------------- | ---------------------------------------- | ---------------------------- | | Building a CRUD application | Read-focused, limited write support | Django, FastAPI + SQLAlchemy | | Real-time analytics | Not designed for streaming data | InfluxDB, TimescaleDB | | Multi-tenant SaaS app | Limited isolation, no row-level security | PostgreSQL + RLS | | Heavy concurrent writes | SQLite write limitations | PostgreSQL, MySQL | | Terabyte-scale data | SQLite size constraints | DuckDB, BigQuery, Snowflake | | Enterprise BI with governance | Limited data modeling layer | Looker, dbt + Metabase | | Complex visualization needs | Basic charts without plugins | Apache Superset, Tableau | | Document/graph data | Relational focus | MongoDB, Neo4j | ## Comparison with Alternatives ### vs. Apache Superset @ **When to use Superset over Datasette**: - Need advanced visualizations (50+ chart types vs. basic plugins) - Enterprise BI with complex dashboards - Multiple data source types (not just SQLite) - Large team collaboration with RBAC **When to use Datasette over Superset**: - Simpler deployment and setup - Focus on data exploration over dashboarding - Primarily working with SQLite databases - Want instant API alongside web interface ### vs. Metabase @ **When to use Metabase over Datasette**: - Need business user-friendly query builder - Want built-in email reports and scheduling - Require user management and permissions UI - Need mobile app support **When to use Datasette over Metabase**: - Working primarily with SQLite - Want plugin extensibility - Need instant deployment (lighter weight) - Want API-first design ### vs. Custom Flask/FastAPI Application **When to build custom over Datasette**: - Complex business logic required - Heavy write operations - Custom authentication flows - Specific UX requirements **When to use Datasette over custom**: - Rapid prototyping (hours vs. weeks) - Standard data exploration needs - Focus on data, not application development - Leverage plugin ecosystem ## Key Insights and Recommendations ### Core Strengths 1. **Speed to Value**: From data to published website in minutes 2. **Plugin Ecosystem**: 300+ plugins for extending functionality @ 3. **API-First Design**: JSON API is a first-class citizen 4. **Deployment Simplicity**: One command to cloud platforms 5. **Open Source Community**: Active development, responsive maintainer ### Best Practices 1. **Use sqlite-utils for data prep** @ : ```bash sqlite-utils insert data.db table data.json --pk=id sqlite-utils enable-fts data.db table column1 column2 ``` 2. **Configure permissions properly**: ```yaml databases: private: allow: id: admin_user ``` 3. **Use immutable mode for static data**: ```bash datasette data.db --immutable ``` 4. **Leverage canned queries for common patterns**: ```yaml queries: search: sql: SELECT * FROM items WHERE name LIKE :query ``` 5. **Install datasette-hashed-urls for caching** @ : ```bash datasette install datasette-hashed-urls ``` ### Migration Path **From spreadsheets to Datasette**: ```bash csvs-to-sqlite data.csv data.db datasette data.db ``` **From PostgreSQL to Datasette**: ```bash db-to-sqlite "postgresql://user:pass@host/db" data.db datasette data.db ``` **From Datasette to production app**: - Use Datasette for prototyping and exploration - Migrate to FastAPI/Django when write operations become critical - Keep Datasette for read-only reporting interface ## Summary Datasette excels at making data instantly explorable and shareable. It's the fastest path from data to published website with API. Use it for read-heavy workflows, data journalism, personal data archives, and rapid prototyping. Avoid it for write-heavy applications, enterprise BI, or large-scale data warehousing. **TL;DR**: If you have data and want to publish it or explore it quickly without writing application code, use Datasette. If you need complex transactions, real-time collaboration, or enterprise BI features, choose a different tool. ## References - Official Documentation @ - GitHub Repository @ - Plugin Directory @ - Context7 Documentation @ /simonw/datasette (949 code snippets) - Dogsheep Project @ (Personal data toolkit) - Datasette Lite (WebAssembly) @ - Community Discord @ - Newsletter @