Initial commit
This commit is contained in:
405
skills/llm-docs-optimizer/examples/sample_llmstxt.md
Normal file
405
skills/llm-docs-optimizer/examples/sample_llmstxt.md
Normal file
@@ -0,0 +1,405 @@
|
||||
# Example: llms.txt Generation for Different Project Types
|
||||
|
||||
This document shows examples of llms.txt files generated for different types of projects, demonstrating how to structure the file based on project characteristics.
|
||||
|
||||
---
|
||||
|
||||
## Example 1: Python Library (Data Processing)
|
||||
|
||||
### Project Context
|
||||
A Python library called "DataFlow" for stream data processing with multiple output formats.
|
||||
|
||||
### Generated llms.txt
|
||||
|
||||
```markdown
|
||||
# DataFlow
|
||||
|
||||
> DataFlow is a Python library for processing data streams with real-time transformations
|
||||
> and multiple output formats. It provides efficient stream processing with lazy evaluation
|
||||
> and built-in error handling.
|
||||
|
||||
Key features:
|
||||
- Fast stream processing with lazy evaluation
|
||||
- Support for CSV, JSON, Parquet, and custom formats
|
||||
- Built-in error handling and recovery
|
||||
- Zero-dependency core library
|
||||
- Extensible plugin system
|
||||
|
||||
## Documentation
|
||||
|
||||
- [Quick Start Guide](https://github.com/example/dataflow/blob/main/docs/quickstart.md): Get up and running in 5 minutes
|
||||
- [Core Concepts](https://github.com/example/dataflow/blob/main/docs/concepts.md): Understanding streams, transformations, and processing
|
||||
- [Configuration Guide](https://github.com/example/dataflow/blob/main/docs/configuration.md): All configuration options explained
|
||||
|
||||
## API Reference
|
||||
|
||||
- [Stream API](https://github.com/example/dataflow/blob/main/docs/api/stream.md): Stream creation and manipulation methods
|
||||
- [Transformations](https://github.com/example/dataflow/blob/main/docs/api/transforms.md): Built-in transformation functions
|
||||
- [Exports](https://github.com/example/dataflow/blob/main/docs/api/exports.md): Output format specifications
|
||||
|
||||
## Examples
|
||||
|
||||
- [Basic Usage](https://github.com/example/dataflow/blob/main/examples/basic.md): Simple stream processing examples
|
||||
- [Common Patterns](https://github.com/example/dataflow/blob/main/examples/patterns.md): Filtering, mapping, and aggregation
|
||||
- [Error Handling](https://github.com/example/dataflow/blob/main/examples/errors.md): Handling failures and recovery
|
||||
- [Advanced Usage](https://github.com/example/dataflow/blob/main/examples/advanced.md): Parallel processing and custom plugins
|
||||
|
||||
## Development
|
||||
|
||||
- [Contributing Guide](https://github.com/example/dataflow/blob/main/CONTRIBUTING.md): How to contribute to DataFlow
|
||||
- [Development Setup](https://github.com/example/dataflow/blob/main/docs/development.md): Setting up local development environment
|
||||
- [Testing](https://github.com/example/dataflow/blob/main/docs/testing.md): Running and writing tests
|
||||
|
||||
## Optional
|
||||
|
||||
- [DataFlow Blog](https://dataflow.example.com/blog/): Latest updates and tutorials
|
||||
- [Changelog](https://github.com/example/dataflow/blob/main/CHANGELOG.md): Version history and release notes
|
||||
- [Performance Benchmarks](https://github.com/example/dataflow/blob/main/docs/performance.md): Benchmark results and optimization tips
|
||||
```
|
||||
|
||||
### Why This Structure?
|
||||
|
||||
- **Blockquote**: Clearly explains what DataFlow is and its main value proposition
|
||||
- **Key Features**: Bullet list highlights important capabilities
|
||||
- **Documentation**: Essential guides for getting started and understanding core concepts
|
||||
- **API Reference**: Organized by major components (Stream, Transformations, Exports)
|
||||
- **Examples**: Progressive from basic to advanced, includes error handling
|
||||
- **Development**: Resources for contributors
|
||||
- **Optional**: Secondary resources like blog and benchmarks
|
||||
|
||||
---
|
||||
|
||||
## Example 2: CLI Tool (Developer Tool)
|
||||
|
||||
### Project Context
|
||||
A command-line tool called "BuildKit" for managing build processes and deployment pipelines.
|
||||
|
||||
### Generated llms.txt
|
||||
|
||||
```markdown
|
||||
# BuildKit
|
||||
|
||||
> BuildKit is a CLI tool for managing build processes, running tests, and deploying
|
||||
> applications across multiple environments. It provides a unified interface for common
|
||||
> development workflows.
|
||||
|
||||
BuildKit follows these principles:
|
||||
- Convention over configuration
|
||||
- Fast feedback loops
|
||||
- Environment parity
|
||||
- Reproducible builds
|
||||
|
||||
## Getting Started
|
||||
|
||||
- [Installation](https://buildkit.dev/docs/install.md): Installing BuildKit on macOS, Linux, and Windows
|
||||
- [Quick Start](https://buildkit.dev/docs/quickstart.md): Your first BuildKit project in 5 minutes
|
||||
- [Core Concepts](https://buildkit.dev/docs/concepts.md): Understanding tasks, pipelines, and environments
|
||||
|
||||
## Commands
|
||||
|
||||
- [build](https://buildkit.dev/docs/commands/build.md): Build your project with automatic dependency detection
|
||||
- [test](https://buildkit.dev/docs/commands/test.md): Run tests with parallel execution
|
||||
- [deploy](https://buildkit.dev/docs/commands/deploy.md): Deploy to staging or production
|
||||
- [watch](https://buildkit.dev/docs/commands/watch.md): Watch for changes and rebuild automatically
|
||||
- [All Commands](https://buildkit.dev/docs/commands/): Complete command reference
|
||||
|
||||
## Configuration
|
||||
|
||||
- [buildkit.yml](https://buildkit.dev/docs/config.md): Configuration file reference
|
||||
- [Environment Variables](https://buildkit.dev/docs/env.md): Environment-specific configuration
|
||||
- [Plugins](https://buildkit.dev/docs/plugins.md): Extending BuildKit with custom plugins
|
||||
|
||||
## Examples
|
||||
|
||||
- [Node.js Projects](https://buildkit.dev/examples/nodejs.md): Building and deploying Node.js apps
|
||||
- [Python Projects](https://buildkit.dev/examples/python.md): Python application workflows
|
||||
- [Monorepos](https://buildkit.dev/examples/monorepo.md): Managing multiple packages
|
||||
- [CI/CD Integration](https://buildkit.dev/examples/ci.md): Using BuildKit in CI/CD pipelines
|
||||
|
||||
## Optional
|
||||
|
||||
- [BuildKit Blog](https://buildkit.dev/blog/): Tutorials and case studies
|
||||
- [Plugin Directory](https://buildkit.dev/plugins/): Community plugins
|
||||
- [Troubleshooting](https://buildkit.dev/docs/troubleshooting.md): Common issues and solutions
|
||||
```
|
||||
|
||||
### Why This Structure?
|
||||
|
||||
- **Principles**: Shows design philosophy upfront
|
||||
- **Getting Started**: Installation and quickstart are priority for CLI tools
|
||||
- **Commands**: Individual command documentation (most important for CLI tools)
|
||||
- **Configuration**: Clear section for config files and customization
|
||||
- **Examples**: Language/framework-specific guides
|
||||
- **Optional**: Community resources and troubleshooting
|
||||
|
||||
---
|
||||
|
||||
## Example 3: Web Framework
|
||||
|
||||
### Project Context
|
||||
A web framework called "FastWeb" for building modern web applications.
|
||||
|
||||
### Generated llms.txt
|
||||
|
||||
```markdown
|
||||
# FastWeb
|
||||
|
||||
> FastWeb is a modern web framework for building full-stack applications with Python.
|
||||
> It provides server-side rendering, API routes, and built-in database support with
|
||||
> zero configuration required.
|
||||
|
||||
FastWeb features:
|
||||
- File-based routing with automatic code splitting
|
||||
- Server-side rendering (SSR) and static site generation (SSG)
|
||||
- Built-in API routes and middleware
|
||||
- Real-time capabilities with WebSockets
|
||||
- TypeScript-first with excellent type inference
|
||||
|
||||
## Documentation
|
||||
|
||||
- [Getting Started](https://fastweb.dev/docs/getting-started.md): Create your first FastWeb app
|
||||
- [Routing](https://fastweb.dev/docs/routing.md): File-based routing and dynamic routes
|
||||
- [Data Fetching](https://fastweb.dev/docs/data.md): Loading data on server and client
|
||||
- [Rendering](https://fastweb.dev/docs/rendering.md): SSR, SSG, and client-side rendering
|
||||
- [API Routes](https://fastweb.dev/docs/api.md): Building REST and GraphQL APIs
|
||||
|
||||
## Guides
|
||||
|
||||
- [Authentication](https://fastweb.dev/guides/auth.md): User authentication and authorization
|
||||
- [Database Integration](https://fastweb.dev/guides/database.md): Working with databases
|
||||
- [Deployment](https://fastweb.dev/guides/deployment.md): Deploying to production
|
||||
- [Testing](https://fastweb.dev/guides/testing.md): Unit and integration testing
|
||||
- [Performance](https://fastweb.dev/guides/performance.md): Optimization best practices
|
||||
|
||||
## API Reference
|
||||
|
||||
- [Configuration](https://fastweb.dev/api/config.md): fastweb.config.js options
|
||||
- [CLI](https://fastweb.dev/api/cli.md): Command-line interface reference
|
||||
- [Components](https://fastweb.dev/api/components.md): Built-in components
|
||||
- [Hooks](https://fastweb.dev/api/hooks.md): React-style hooks API
|
||||
- [Utilities](https://fastweb.dev/api/utils.md): Helper functions and utilities
|
||||
|
||||
## Examples
|
||||
|
||||
- [Blog](https://fastweb.dev/examples/blog.md): Building a blog with markdown
|
||||
- [E-commerce](https://fastweb.dev/examples/ecommerce.md): Product catalog and checkout
|
||||
- [Dashboard](https://fastweb.dev/examples/dashboard.md): Admin dashboard with charts
|
||||
- [Real-time Chat](https://fastweb.dev/examples/chat.md): WebSocket-based chat app
|
||||
|
||||
## Integrations
|
||||
|
||||
- [Databases](https://fastweb.dev/integrations/databases.md): PostgreSQL, MySQL, MongoDB
|
||||
- [CSS Frameworks](https://fastweb.dev/integrations/css.md): Tailwind, Bootstrap, etc.
|
||||
- [Analytics](https://fastweb.dev/integrations/analytics.md): Google Analytics, Plausible
|
||||
- [CMS](https://fastweb.dev/integrations/cms.md): Headless CMS integrations
|
||||
|
||||
## Optional
|
||||
|
||||
- [FastWeb Blog](https://fastweb.dev/blog/): Tutorials and announcements
|
||||
- [Showcase](https://fastweb.dev/showcase/): Sites built with FastWeb
|
||||
- [Community](https://fastweb.dev/community/): Discord, GitHub discussions
|
||||
- [Changelog](https://fastweb.dev/changelog/): Version history
|
||||
```
|
||||
|
||||
### Why This Structure?
|
||||
|
||||
- **Framework Features**: Lists core capabilities upfront
|
||||
- **Documentation**: Core framework concepts and features
|
||||
- **Guides**: Task-oriented how-to guides (authentication, deployment, etc.)
|
||||
- **API Reference**: Technical reference for configuration and APIs
|
||||
- **Examples**: Complete application examples
|
||||
- **Integrations**: Third-party tool integration guides
|
||||
- **Optional**: Community and showcase resources
|
||||
|
||||
---
|
||||
|
||||
## Example 4: Claude Skill
|
||||
|
||||
### Project Context
|
||||
A Claude skill for optimizing documentation (this project!).
|
||||
|
||||
### Generated llms.txt
|
||||
|
||||
```markdown
|
||||
# c7score-optimizer
|
||||
|
||||
> A Claude skill that optimizes project documentation and README files to score highly
|
||||
> on Context7's c7score benchmark, making docs more effective for AI-assisted coding tools.
|
||||
> Also generates llms.txt files for projects.
|
||||
|
||||
The skill provides:
|
||||
- Documentation analysis and quality assessment
|
||||
- Question-driven content restructuring
|
||||
- Code snippet enhancement with context
|
||||
- llms.txt file generation
|
||||
- Python analysis script for automated scanning
|
||||
|
||||
## Documentation
|
||||
|
||||
- [README](https://github.com/example/c7score-optimizer/blob/main/README.md): Overview, installation, and usage
|
||||
- [Skill Definition](https://github.com/example/c7score-optimizer/blob/main/SKILL.md): Complete skill workflow and instructions
|
||||
- [Changelog](https://github.com/example/c7score-optimizer/blob/main/CHANGELOG.md): Version history and updates
|
||||
|
||||
## Reference Materials
|
||||
|
||||
- [C7Score Metrics](https://github.com/example/c7score-optimizer/blob/main/references/c7score_metrics.md): Understanding the c7score benchmark
|
||||
- [Optimization Patterns](https://github.com/example/c7score-optimizer/blob/main/references/optimization_patterns.md): 20+ transformation patterns
|
||||
- [llms.txt Format](https://github.com/example/c7score-optimizer/blob/main/references/llmstxt_format.md): Complete llms.txt specification
|
||||
|
||||
## Examples
|
||||
|
||||
- [README Optimization](https://github.com/example/c7score-optimizer/blob/main/examples/sample_readme.md): Before/after documentation transformation
|
||||
- [llms.txt Generation](https://github.com/example/c7score-optimizer/blob/main/examples/sample_llmstxt.md): Generated llms.txt examples
|
||||
|
||||
## Development
|
||||
|
||||
- [Analysis Script](https://github.com/example/c7score-optimizer/blob/main/scripts/analyze_docs.py): Python tool for documentation scanning
|
||||
- [Contributing](https://github.com/example/c7score-optimizer/blob/main/CONTRIBUTING.md): How to contribute improvements
|
||||
|
||||
## Optional
|
||||
|
||||
- [Context7 c7score](https://www.context7.ai/c7score): Official c7score benchmark
|
||||
- [llmstxt.org](https://llmstxt.org/): Official llms.txt specification
|
||||
- [Claude Code Docs](https://docs.claude.com/claude-code): Claude Code documentation
|
||||
```
|
||||
|
||||
### Why This Structure?
|
||||
|
||||
- **Skill Capabilities**: Clear explanation of what the skill does
|
||||
- **Documentation**: Essential files (README, SKILL.md, CHANGELOG)
|
||||
- **Reference Materials**: Detailed specifications and patterns
|
||||
- **Examples**: Practical before/after demonstrations
|
||||
- **Development**: Tools and contribution guides
|
||||
- **Optional**: External resources and official documentation
|
||||
|
||||
---
|
||||
|
||||
## Key Patterns Across All Examples
|
||||
|
||||
### 1. Strong Opening
|
||||
Every example has:
|
||||
- Clear H1 with project name
|
||||
- Informative blockquote explaining what it is
|
||||
- Key features/principles in bullets
|
||||
|
||||
### 2. Logical Section Progression
|
||||
Common pattern:
|
||||
1. **Getting Started / Documentation** (high priority)
|
||||
2. **API / Commands / Core Features** (high priority)
|
||||
3. **Guides / Examples** (practical applications)
|
||||
4. **Development / Contributing** (for contributors)
|
||||
5. **Optional** (secondary resources)
|
||||
|
||||
### 3. Descriptive Links
|
||||
All links include:
|
||||
- Clear, action-oriented titles
|
||||
- Helpful descriptions after colons
|
||||
- Context about what each resource contains
|
||||
|
||||
### 4. Full URLs
|
||||
All examples use complete URLs with protocol:
|
||||
- ✅ `https://example.com/docs/guide.md`
|
||||
- ❌ `/docs/guide.md`
|
||||
- ❌ `../guide.md`
|
||||
|
||||
### 5. Markdown-First
|
||||
Prefer linking to `.md` files:
|
||||
- ✅ `docs/guide.md`
|
||||
- ⚠️ `docs/guide.html` (acceptable if no .md available)
|
||||
|
||||
---
|
||||
|
||||
## Decision Tree: What Sections to Include?
|
||||
|
||||
### For Libraries/Packages
|
||||
- **Must have**: Documentation, API Reference, Examples
|
||||
- **Should have**: Getting Started, Development
|
||||
- **Nice to have**: Guides, Integrations, Optional
|
||||
|
||||
### For CLI Tools
|
||||
- **Must have**: Getting Started, Commands, Examples
|
||||
- **Should have**: Configuration, Development
|
||||
- **Nice to have**: Plugins, Troubleshooting, Optional
|
||||
|
||||
### For Frameworks
|
||||
- **Must have**: Documentation, Guides, API Reference, Examples
|
||||
- **Should have**: Integrations, Getting Started
|
||||
- **Nice to have**: Showcase, Optional
|
||||
|
||||
### For Skills/Plugins
|
||||
- **Must have**: Documentation, Reference Materials
|
||||
- **Should have**: Examples, Development
|
||||
- **Nice to have**: Optional (external resources)
|
||||
|
||||
---
|
||||
|
||||
## Common Customizations by Project Type
|
||||
|
||||
### Open Source Project
|
||||
Add to Optional:
|
||||
- Contributing guide
|
||||
- Code of conduct
|
||||
- Governance
|
||||
- Roadmap
|
||||
|
||||
### Commercial Product
|
||||
Add sections:
|
||||
- Pricing/Plans
|
||||
- Support
|
||||
- Enterprise features
|
||||
- Migration guides
|
||||
|
||||
### Educational Resource
|
||||
Add sections:
|
||||
- Tutorials
|
||||
- Video courses
|
||||
- Exercises
|
||||
- Certification
|
||||
|
||||
### Research Project
|
||||
Add sections:
|
||||
- Papers
|
||||
- Datasets
|
||||
- Experiments
|
||||
- Citations
|
||||
|
||||
---
|
||||
|
||||
## Anti-Patterns to Avoid
|
||||
|
||||
### ❌ Too Granular
|
||||
```markdown
|
||||
## Installation
|
||||
- [macOS Install](url)
|
||||
- [Linux Install](url)
|
||||
- [Windows Install](url)
|
||||
- [Docker Install](url)
|
||||
```
|
||||
Better: One "Installation" link covering all platforms
|
||||
|
||||
### ❌ No Descriptions
|
||||
```markdown
|
||||
- [Guide](url)
|
||||
- [Docs](url)
|
||||
- [API](url)
|
||||
```
|
||||
Better: Add helpful context after colons
|
||||
|
||||
### ❌ Outdated Links
|
||||
```markdown
|
||||
- [Guide](https://example.com/v1/guide.md)
|
||||
```
|
||||
Better: Link to latest version or version-agnostic URLs
|
||||
|
||||
### ❌ Relative URLs
|
||||
```markdown
|
||||
- [Guide](../docs/guide.md)
|
||||
```
|
||||
Better: Use full URLs with protocol
|
||||
|
||||
### ❌ Too Much Content
|
||||
Don't paste entire documentation. Link to it.
|
||||
|
||||
---
|
||||
|
||||
Use these examples as templates when generating llms.txt files for different project types!
|
||||
352
skills/llm-docs-optimizer/examples/sample_readme.md
Normal file
352
skills/llm-docs-optimizer/examples/sample_readme.md
Normal file
@@ -0,0 +1,352 @@
|
||||
# Example: Before and After C7Score Optimization
|
||||
|
||||
This example demonstrates the transformation of a typical README into an optimized version that scores higher on the c7score benchmark.
|
||||
|
||||
---
|
||||
|
||||
## BEFORE Optimization (Low c7score)
|
||||
|
||||
```markdown
|
||||
# DataFlow
|
||||
|
||||

|
||||

|
||||
Last updated: March 15, 2024
|
||||
|
||||
DataFlow is a Python library for processing data streams.
|
||||
|
||||
## Features
|
||||
|
||||
- Stream processing
|
||||
- Data transformation
|
||||
- Multiple output formats
|
||||
- Error handling
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
pip install dataflow
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
```python
|
||||
from dataflow import Stream
|
||||
s = Stream()
|
||||
```
|
||||
|
||||
## API
|
||||
|
||||
See the API documentation for details.
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
```
|
||||
|
||||
### Issues with the BEFORE version:
|
||||
- ❌ Metadata badges and timestamps (reduce c7score)
|
||||
- ❌ Vague descriptions without concrete examples
|
||||
- ❌ Code snippets lack context and imports
|
||||
- ❌ No question-driven structure
|
||||
- ❌ Missing error handling examples
|
||||
- ❌ No progressive complexity
|
||||
- ❌ Features listed without demonstrations
|
||||
- ❌ Incomplete quickstart example
|
||||
|
||||
---
|
||||
|
||||
## AFTER Optimization (High c7score)
|
||||
|
||||
```markdown
|
||||
# DataFlow
|
||||
|
||||
A Python library for processing data streams with real-time transformations and multiple output formats.
|
||||
|
||||
## Quick Start
|
||||
|
||||
### How do I install DataFlow?
|
||||
|
||||
```bash
|
||||
pip install dataflow
|
||||
```
|
||||
|
||||
For development installation:
|
||||
```bash
|
||||
git clone https://github.com/example/dataflow.git
|
||||
cd dataflow
|
||||
pip install -e ".[dev]"
|
||||
```
|
||||
|
||||
### How do I process my first data stream?
|
||||
|
||||
```python
|
||||
from dataflow import Stream, Processor
|
||||
|
||||
# Create a stream and add data
|
||||
stream = Stream()
|
||||
stream.add_data([1, 2, 3, 4, 5])
|
||||
|
||||
# Process the stream
|
||||
processor = Processor()
|
||||
result = processor.transform(stream, lambda x: x * 2)
|
||||
|
||||
print(result.to_list()) # Output: [2, 4, 6, 8, 10]
|
||||
```
|
||||
|
||||
### How do I handle different data sources?
|
||||
|
||||
```python
|
||||
from dataflow import Stream, FileSource, APISource
|
||||
|
||||
# Load from a file
|
||||
file_stream = Stream.from_source(
|
||||
FileSource('data.csv', format='csv')
|
||||
)
|
||||
|
||||
# Load from an API
|
||||
api_stream = Stream.from_source(
|
||||
APISource('https://api.example.com/data', auth_token='your_token')
|
||||
)
|
||||
|
||||
# Process both streams
|
||||
for item in file_stream:
|
||||
print(f"Processing: {item}")
|
||||
```
|
||||
|
||||
## Common Use Cases
|
||||
|
||||
### How do I transform data in a stream?
|
||||
|
||||
```python
|
||||
from dataflow import Stream, Transformer
|
||||
|
||||
stream = Stream.from_list([
|
||||
{'name': 'Alice', 'age': 30},
|
||||
{'name': 'Bob', 'age': 25}
|
||||
])
|
||||
|
||||
# Apply transformations
|
||||
transformed = stream.map(lambda x: {
|
||||
**x,
|
||||
'age_group': 'adult' if x['age'] >= 18 else 'minor'
|
||||
})
|
||||
|
||||
for person in transformed:
|
||||
print(person)
|
||||
# Output: {'name': 'Alice', 'age': 30, 'age_group': 'adult'}
|
||||
# {'name': 'Bob', 'age': 25, 'age_group': 'adult'}
|
||||
```
|
||||
|
||||
### How do I filter data in a stream?
|
||||
|
||||
```python
|
||||
from dataflow import Stream
|
||||
|
||||
stream = Stream.from_list([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
|
||||
|
||||
# Filter even numbers
|
||||
evens = stream.filter(lambda x: x % 2 == 0)
|
||||
|
||||
print(evens.to_list()) # Output: [2, 4, 6, 8, 10]
|
||||
```
|
||||
|
||||
### How do I export stream data to different formats?
|
||||
|
||||
```python
|
||||
from dataflow import Stream
|
||||
|
||||
stream = Stream.from_list([
|
||||
{'id': 1, 'value': 100},
|
||||
{'id': 2, 'value': 200}
|
||||
])
|
||||
|
||||
# Export to JSON
|
||||
stream.to_json('output.json', indent=2)
|
||||
|
||||
# Export to CSV
|
||||
stream.to_csv('output.csv', headers=True)
|
||||
|
||||
# Export to parquet (requires pyarrow)
|
||||
stream.to_parquet('output.parquet', compression='snappy')
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
### How do I handle errors during stream processing?
|
||||
|
||||
```python
|
||||
from dataflow import Stream, ProcessingError
|
||||
|
||||
stream = Stream.from_list([1, 2, 'invalid', 4, 5])
|
||||
|
||||
try:
|
||||
result = stream.map(lambda x: x * 2)
|
||||
except ProcessingError as e:
|
||||
print(f"Error processing stream: {e}")
|
||||
print(f"Failed at item: {e.item}")
|
||||
print(f"Position: {e.position}")
|
||||
```
|
||||
|
||||
### How do I skip invalid items in a stream?
|
||||
|
||||
```python
|
||||
from dataflow import Stream
|
||||
|
||||
stream = Stream.from_list([1, 2, 'invalid', 4, 5])
|
||||
|
||||
# Skip invalid items with error handling
|
||||
def safe_transform(x):
|
||||
try:
|
||||
return x * 2
|
||||
except TypeError:
|
||||
return None
|
||||
|
||||
result = stream.map(safe_transform).filter(lambda x: x is not None)
|
||||
print(result.to_list()) # Output: [2, 4, 8, 10]
|
||||
```
|
||||
|
||||
## Advanced Usage
|
||||
|
||||
### How do I chain multiple transformations?
|
||||
|
||||
```python
|
||||
from dataflow import Stream
|
||||
|
||||
result = (
|
||||
Stream.from_list([1, 2, 3, 4, 5])
|
||||
.filter(lambda x: x > 2) # Keep values > 2
|
||||
.map(lambda x: x * 2) # Double them
|
||||
.map(lambda x: {'value': x}) # Convert to dict
|
||||
.to_list()
|
||||
)
|
||||
|
||||
print(result)
|
||||
# Output: [{'value': 6}, {'value': 8}, {'value': 10}]
|
||||
```
|
||||
|
||||
### How do I process streams in parallel?
|
||||
|
||||
```python
|
||||
from dataflow import Stream, ParallelProcessor
|
||||
|
||||
stream = Stream.from_list(range(1000))
|
||||
|
||||
# Process with 4 worker threads
|
||||
processor = ParallelProcessor(workers=4)
|
||||
result = processor.map(stream, lambda x: expensive_operation(x))
|
||||
|
||||
for item in result:
|
||||
print(item)
|
||||
```
|
||||
|
||||
### How do I aggregate data from a stream?
|
||||
|
||||
```python
|
||||
from dataflow import Stream
|
||||
|
||||
stream = Stream.from_list([1, 2, 3, 4, 5])
|
||||
|
||||
# Calculate sum
|
||||
total = stream.reduce(lambda acc, x: acc + x, initial=0)
|
||||
print(f"Sum: {total}") # Output: Sum: 15
|
||||
|
||||
# Calculate average
|
||||
count = stream.count()
|
||||
average = total / count
|
||||
print(f"Average: {average}") # Output: Average: 3.0
|
||||
```
|
||||
|
||||
## Performance
|
||||
|
||||
### How do I optimize stream processing performance?
|
||||
|
||||
```python
|
||||
from dataflow import Stream, BufferedStream
|
||||
|
||||
# Use buffering for better performance
|
||||
stream = BufferedStream.from_source(
|
||||
source=large_data_source,
|
||||
buffer_size=1000 # Process in chunks of 1000
|
||||
)
|
||||
|
||||
# Enable lazy evaluation
|
||||
result = stream.lazy().map(transform_fn).filter(filter_fn)
|
||||
|
||||
# Only evaluate when needed
|
||||
final_data = result.to_list()
|
||||
```
|
||||
|
||||
## Requirements
|
||||
|
||||
- Python 3.7 or higher
|
||||
- Optional dependencies:
|
||||
- `pyarrow` for Parquet support
|
||||
- `pandas` for DataFrame integration
|
||||
|
||||
## Installation Options
|
||||
|
||||
Standard installation:
|
||||
```bash
|
||||
pip install dataflow
|
||||
```
|
||||
|
||||
With optional dependencies:
|
||||
```bash
|
||||
pip install dataflow[parquet] # For Parquet support
|
||||
pip install dataflow[pandas] # For pandas integration
|
||||
pip install dataflow[all] # All optional features
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
MIT License - see LICENSE file for details
|
||||
```
|
||||
|
||||
### Improvements in the AFTER version:
|
||||
- ✅ Removed metadata badges and timestamps
|
||||
- ✅ Question-driven headers throughout
|
||||
- ✅ Complete code examples with imports and context
|
||||
- ✅ Progressive complexity (basic → advanced)
|
||||
- ✅ Error handling examples
|
||||
- ✅ Multiple use cases demonstrated
|
||||
- ✅ Concrete outputs shown in comments
|
||||
- ✅ Installation options clearly explained
|
||||
- ✅ Common questions answered with working code
|
||||
|
||||
---
|
||||
|
||||
## C7Score Impact Estimate
|
||||
|
||||
### BEFORE Version Metrics:
|
||||
- Question-Snippet Matching: ~40/100 (incomplete examples, poor alignment)
|
||||
- LLM Evaluation: ~50/100 (vague descriptions)
|
||||
- Formatting: ~70/100 (basic markdown, code blocks present)
|
||||
- Metadata Removal: ~30/100 (badges and timestamps present)
|
||||
- Initialization Examples: ~50/100 (incomplete quickstart)
|
||||
|
||||
**Estimated BEFORE c7score: ~45/100**
|
||||
|
||||
### AFTER Version Metrics:
|
||||
- Question-Snippet Matching: ~90/100 (excellent Q&A alignment)
|
||||
- LLM Evaluation: ~95/100 (comprehensive, clear)
|
||||
- Formatting: ~95/100 (proper structure, complete blocks)
|
||||
- Metadata Removal: ~100/100 (all noise removed)
|
||||
- Initialization Examples: ~95/100 (complete, progressive)
|
||||
|
||||
**Estimated AFTER c7score: ~92/100**
|
||||
|
||||
---
|
||||
|
||||
## Key Transformation Patterns Used
|
||||
|
||||
1. **Question Headers**: "Installation" → "How do I install DataFlow?"
|
||||
2. **Complete Examples**: Added imports, setup, and expected outputs
|
||||
3. **Progressive Complexity**: Basic → Common → Advanced sections
|
||||
4. **Error Scenarios**: Dedicated error handling examples
|
||||
5. **Concrete Outputs**: Included actual output in code comments
|
||||
6. **Noise Removal**: Stripped badges and timestamps
|
||||
7. **Context Addition**: Every snippet is runnable as-is
|
||||
8. **Multiple Paths**: Showed different ways to achieve goals
|
||||
|
||||
Use this example as a template for optimizing your own documentation!
|
||||
Reference in New Issue
Block a user