# LaminDB Setup & Deployment This document covers installation, configuration, instance management, storage options, and deployment strategies for LaminDB. ## Installation ### Basic Installation ```bash # Install LaminDB pip install lamindb # Or with pip3 pip3 install lamindb ``` ### Installation with Extras Install optional dependencies for specific functionality: ```bash # Google Cloud Platform support pip install 'lamindb[gcp]' # Flow cytometry formats pip install 'lamindb[fcs]' # Array storage and streaming (Zarr support) pip install 'lamindb[zarr]' # AWS S3 support (usually included by default) pip install 'lamindb[aws]' # Multiple extras pip install 'lamindb[gcp,zarr,fcs]' ``` ### Module Plugins ```bash # Biological ontologies (Bionty) pip install bionty # Wet lab functionality pip install lamindb-wetlab # Clinical data (OMOP CDM) pip install lamindb-clinical ``` ### Verify Installation ```python import lamindb as ln print(ln.__version__) # Check available modules import bionty as bt print(bt.__version__) ``` ## Authentication ### Creating an Account 1. Visit https://lamin.ai 2. Sign up for a free account 3. Navigate to account settings to generate an API key ### Logging In ```bash # Login with API key lamin login # You'll be prompted to enter your API key # API key is stored locally at ~/.lamin/ ``` ### Authentication Details **Data Privacy:** LaminDB authentication only collects basic metadata (email, user information). Your actual data remains private and is not sent to LaminDB servers. **Local vs Cloud:** Authentication is required even for local-only usage to enable collaboration features and instance management. ## Instance Initialization ### Local SQLite Instance For local development and small datasets: ```bash # Initialize in current directory lamin init --storage ./mydata # Initialize in specific directory lamin init --storage /path/to/data # Initialize with specific modules lamin init --storage ./mydata --modules bionty # Initialize with multiple modules lamin init --storage ./mydata --modules bionty,wetlab ``` ### Cloud Storage with SQLite Use cloud storage but local SQLite database: ```bash # AWS S3 lamin init --storage s3://my-bucket/path # Google Cloud Storage lamin init --storage gs://my-bucket/path # S3-compatible (MinIO, Cloudflare R2) lamin init --storage 's3://bucket?endpoint_url=http://endpoint:9000' ``` ### Cloud Storage with PostgreSQL For production deployments: ```bash # S3 + PostgreSQL lamin init --storage s3://my-bucket/path \ --db postgresql://user:password@hostname:5432/dbname \ --modules bionty # GCS + PostgreSQL lamin init --storage gs://my-bucket/path \ --db postgresql://user:password@hostname:5432/dbname \ --modules bionty ``` ### Instance Naming ```bash # Specify instance name lamin init --storage ./mydata --name my-project # Default name uses directory name lamin init --storage ./mydata # Instance name: "mydata" ``` ## Connecting to Instances ### Connect to Your Own Instance ```bash # By name lamin connect my-project # By full path lamin connect account_handle/my-project ``` ### Connect to Shared Instance ```bash # Connect to someone else's instance lamin connect other-user/their-project # Requires appropriate permissions ``` ### Switching Between Instances ```bash # List available instances lamin info # Switch instance lamin connect another-instance # Close current instance lamin close ``` ## Storage Configuration ### Local Storage **Advantages:** - Fast access - No internet required - Simple setup **Setup:** ```bash lamin init --storage ./data ``` ### AWS S3 Storage **Advantages:** - Scalable - Collaborative - Durable **Setup:** ```bash # Set credentials export AWS_ACCESS_KEY_ID=your_key_id export AWS_SECRET_ACCESS_KEY=your_secret_key export AWS_DEFAULT_REGION=us-east-1 # Initialize lamin init --storage s3://my-bucket/project-data \ --db postgresql://user:pwd@host:5432/db ``` **S3 Permissions Required:** ```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::my-bucket/*", "arn:aws:s3:::my-bucket" ] } ] } ``` ### Google Cloud Storage **Setup:** ```bash # Authenticate gcloud auth application-default login # Or use service account export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json # Initialize lamin init --storage gs://my-bucket/project-data \ --db postgresql://user:pwd@host:5432/db ``` ### S3-Compatible Storage For MinIO, Cloudflare R2, or other S3-compatible services: ```bash # MinIO example export AWS_ACCESS_KEY_ID=minioadmin export AWS_SECRET_ACCESS_KEY=minioadmin lamin init --storage 's3://my-bucket?endpoint_url=http://minio.example.com:9000' # Cloudflare R2 example export AWS_ACCESS_KEY_ID=your_r2_access_key export AWS_SECRET_ACCESS_KEY=your_r2_secret_key lamin init --storage 's3://bucket?endpoint_url=https://account-id.r2.cloudflarestorage.com' ``` ## Database Configuration ### SQLite (Default) **Advantages:** - No separate database server - Simple setup - Good for development **Limitations:** - Not suitable for concurrent writes - Limited scalability **Setup:** ```bash # SQLite is default lamin init --storage ./data # Database stored at ./data/.lamindb/ ``` ### PostgreSQL **Advantages:** - Production-ready - Concurrent access - Better performance at scale **Setup:** ```bash # Full connection string lamin init --storage s3://bucket/path \ --db postgresql://username:password@hostname:5432/database # With SSL lamin init --storage s3://bucket/path \ --db "postgresql://user:pwd@host:5432/db?sslmode=require" ``` **PostgreSQL Versions:** Compatible with PostgreSQL 12+ ### Database Schema Management ```bash # Check current schema version lamin migrate check # Upgrade schema lamin migrate deploy # View migration history lamin migrate history ``` ## Cache Configuration ### Cache Directory LaminDB maintains a local cache for cloud files: ```python import lamindb as ln # View cache location print(ln.settings.cache_dir) ``` ### Configure Cache Location ```bash # Set cache directory lamin cache set /path/to/cache # View current cache settings lamin cache get ``` ### System-Wide Cache (Multi-User) For shared systems with multiple users: ```bash # Create system settings file sudo mkdir -p /system/settings sudo nano /system/settings/system.env ``` Add to `system.env`: ```bash lamindb_cache_path=/shared/cache/lamindb ``` Ensure permissions: ```bash sudo chmod 755 /shared/cache/lamindb sudo chown -R shared-user:shared-group /shared/cache/lamindb ``` ### Cache Management ```python import lamindb as ln # Clear cache for specific artifact artifact = ln.Artifact.get(key="data.h5ad") artifact.delete_cache() # Check if artifact is cached if artifact.is_cached(): print("Already cached") # Manually clear entire cache import shutil shutil.rmtree(ln.settings.cache_dir) ``` ## Settings Management ### View Current Settings ```python import lamindb as ln # User settings print(ln.setup.settings.user) # User(handle='username', email='user@email.com', name='Full Name') # Instance settings print(ln.setup.settings.instance) # Instance(name='my-project', storage='s3://bucket/path') ``` ### Configure Settings ```bash # Set development directory for relative keys lamin settings set dev-dir /path/to/project # Configure git sync lamin settings set sync-git-repo https://github.com/user/repo.git # View all settings lamin settings ``` ### Environment Variables ```bash # Cache directory export LAMIN_CACHE_DIR=/path/to/cache # Settings directory export LAMIN_SETTINGS_DIR=/path/to/settings # Git sync export LAMINDB_SYNC_GIT_REPO=https://github.com/user/repo.git ``` ## Instance Management ### Viewing Instance Information ```bash # Current instance info lamin info # List all instances lamin ls # View instance details lamin instance details ``` ### Instance Collaboration ```bash # Set instance visibility (requires LaminHub) lamin instance set-visibility public lamin instance set-visibility private # Invite collaborators (requires LaminHub) lamin instance invite user@email.com ``` ### Instance Migration ```bash # Backup instance lamin backup create # Restore from backup lamin backup restore backup_id # Export instance metadata lamin export instance-metadata.json ``` ### Deleting Instances ```bash # Delete instance (preserves data, removes metadata) lamin delete --force instance-name # This only removes the LaminDB metadata # Actual data in storage location remains ``` ## Production Deployment Patterns ### Pattern 1: Local Development → Cloud Production **Development:** ```bash # Local development lamin init --storage ./dev-data --modules bionty ``` **Production:** ```bash # Cloud production lamin init --storage s3://prod-bucket/data \ --db postgresql://user:pwd@db-host:5432/prod-db \ --modules bionty \ --name production ``` **Migration:** Export artifacts from dev, import to prod ```python # Export from dev artifacts = ln.Artifact.filter().all() for artifact in artifacts: artifact.export("/tmp/export/") # Switch to prod lamin connect production # Import to prod for file in Path("/tmp/export/").glob("*"): ln.Artifact(str(file), key=file.name).save() ``` ### Pattern 2: Multi-Region Deployment Deploy instances in multiple regions for data sovereignty: ```bash # US instance lamin init --storage s3://us-bucket/data \ --db postgresql://user:pwd@us-db:5432/db \ --name us-production # EU instance lamin init --storage s3://eu-bucket/data \ --db postgresql://user:pwd@eu-db:5432/db \ --name eu-production ``` ### Pattern 3: Shared Storage, Personal Instances Multiple users, shared data: ```bash # Shared storage with user-specific DB lamin init --storage s3://shared-bucket/data \ --db postgresql://user1:pwd@host:5432/user1_db \ --name user1-workspace lamin init --storage s3://shared-bucket/data \ --db postgresql://user2:pwd@host:5432/user2_db \ --name user2-workspace ``` ## Performance Optimization ### Database Performance ```python # Use connection pooling for PostgreSQL # Configure in database server settings # Optimize queries with indexes # LaminDB creates indexes automatically for common queries ``` ### Storage Performance ```bash # Use appropriate storage classes # S3: STANDARD for frequent access, INTELLIGENT_TIERING for mixed access # Configure multipart upload thresholds export AWS_CLI_FILE_IO_BANDWIDTH=100MB ``` ### Cache Optimization ```python # Pre-cache frequently used artifacts artifacts = ln.Artifact.filter(key__startswith="reference/") for artifact in artifacts: artifact.cache() # Download to cache # Use backed mode for large arrays adata = artifact.backed() # Don't load into memory ``` ## Security Best Practices 1. **Credentials Management:** - Use environment variables, not hardcoded credentials - Use IAM roles on AWS/GCP instead of access keys - Rotate credentials regularly 2. **Access Control:** - Use PostgreSQL for multi-user access control - Configure storage bucket policies - Enable audit logging 3. **Network Security:** - Use SSL/TLS for database connections - Use VPCs for cloud deployments - Restrict IP addresses when possible 4. **Data Protection:** - Enable encryption at rest (S3, GCS) - Use encryption in transit (HTTPS, SSL) - Implement backup strategies ## Monitoring and Maintenance ### Health Checks ```python import lamindb as ln # Check database connection try: ln.Artifact.filter().count() print("✓ Database connected") except Exception as e: print(f"✗ Database error: {e}") # Check storage access try: test_artifact = ln.Artifact("test.txt", key="healthcheck.txt").save() test_artifact.delete(permanent=True) print("✓ Storage accessible") except Exception as e: print(f"✗ Storage error: {e}") ``` ### Logging ```python # Enable debug logging import logging logging.basicConfig(level=logging.DEBUG) # LaminDB operations will produce detailed logs ``` ### Backup Strategy ```bash # Regular database backups (PostgreSQL) pg_dump -h hostname -U username -d database > backup_$(date +%Y%m%d).sql # Storage backups (S3 versioning) aws s3api put-bucket-versioning \ --bucket my-bucket \ --versioning-configuration Status=Enabled # Metadata export lamin export metadata_backup.json ``` ## Troubleshooting ### Common Issues **Issue: Cannot connect to instance** ```bash # Check instance exists lamin ls # Verify authentication lamin login # Re-connect lamin connect instance-name ``` **Issue: Storage permissions denied** ```bash # Check AWS credentials aws s3 ls s3://your-bucket/ # Check GCS credentials gsutil ls gs://your-bucket/ # Verify IAM permissions ``` **Issue: Database connection error** ```bash # Test PostgreSQL connection psql postgresql://user:pwd@host:5432/db # Check database version compatibility lamin migrate check ``` **Issue: Cache full** ```python # Clear cache import lamindb as ln import shutil shutil.rmtree(ln.settings.cache_dir) # Set larger cache location lamin cache set /larger/disk/cache ``` ## Upgrade and Migration ### Upgrading LaminDB ```bash # Upgrade to latest version pip install --upgrade lamindb # Upgrade database schema lamin migrate deploy ``` ### Schema Compatibility Check the compatibility matrix to ensure your database schema version is compatible with your installed LaminDB version. ### Breaking Changes Major version upgrades may require migration: ```bash # Check for breaking changes lamin migrate check # Review migration plan lamin migrate plan # Execute migration lamin migrate deploy ``` ## Best Practices 1. **Start local, scale cloud**: Develop locally, deploy to cloud for production 2. **Use PostgreSQL for production**: SQLite is only for development 3. **Configure appropriate cache**: Size cache based on working set 4. **Enable versioning**: Use S3/GCS versioning for data protection 5. **Monitor costs**: Track storage and compute costs in cloud deployments 6. **Document configuration**: Keep infrastructure-as-code for reproducibility 7. **Test backups**: Regularly verify backup and restore procedures 8. **Set up monitoring**: Implement health checks and alerting 9. **Use modules strategically**: Only install needed plugins to reduce complexity 10. **Plan for scale**: Consider concurrent users and data growth