Initial commit
This commit is contained in:
352
skills/llm-docs-optimizer/examples/sample_readme.md
Normal file
352
skills/llm-docs-optimizer/examples/sample_readme.md
Normal file
@@ -0,0 +1,352 @@
|
||||
# Example: Before and After C7Score Optimization
|
||||
|
||||
This example demonstrates the transformation of a typical README into an optimized version that scores higher on the c7score benchmark.
|
||||
|
||||
---
|
||||
|
||||
## BEFORE Optimization (Low c7score)
|
||||
|
||||
```markdown
|
||||
# DataFlow
|
||||
|
||||

|
||||

|
||||
Last updated: March 15, 2024
|
||||
|
||||
DataFlow is a Python library for processing data streams.
|
||||
|
||||
## Features
|
||||
|
||||
- Stream processing
|
||||
- Data transformation
|
||||
- Multiple output formats
|
||||
- Error handling
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
pip install dataflow
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
```python
|
||||
from dataflow import Stream
|
||||
s = Stream()
|
||||
```
|
||||
|
||||
## API
|
||||
|
||||
See the API documentation for details.
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
```
|
||||
|
||||
### Issues with the BEFORE version:
|
||||
- ❌ Metadata badges and timestamps (reduce c7score)
|
||||
- ❌ Vague descriptions without concrete examples
|
||||
- ❌ Code snippets lack context and imports
|
||||
- ❌ No question-driven structure
|
||||
- ❌ Missing error handling examples
|
||||
- ❌ No progressive complexity
|
||||
- ❌ Features listed without demonstrations
|
||||
- ❌ Incomplete quickstart example
|
||||
|
||||
---
|
||||
|
||||
## AFTER Optimization (High c7score)
|
||||
|
||||
```markdown
|
||||
# DataFlow
|
||||
|
||||
A Python library for processing data streams with real-time transformations and multiple output formats.
|
||||
|
||||
## Quick Start
|
||||
|
||||
### How do I install DataFlow?
|
||||
|
||||
```bash
|
||||
pip install dataflow
|
||||
```
|
||||
|
||||
For development installation:
|
||||
```bash
|
||||
git clone https://github.com/example/dataflow.git
|
||||
cd dataflow
|
||||
pip install -e ".[dev]"
|
||||
```
|
||||
|
||||
### How do I process my first data stream?
|
||||
|
||||
```python
|
||||
from dataflow import Stream, Processor
|
||||
|
||||
# Create a stream and add data
|
||||
stream = Stream()
|
||||
stream.add_data([1, 2, 3, 4, 5])
|
||||
|
||||
# Process the stream
|
||||
processor = Processor()
|
||||
result = processor.transform(stream, lambda x: x * 2)
|
||||
|
||||
print(result.to_list()) # Output: [2, 4, 6, 8, 10]
|
||||
```
|
||||
|
||||
### How do I handle different data sources?
|
||||
|
||||
```python
|
||||
from dataflow import Stream, FileSource, APISource
|
||||
|
||||
# Load from a file
|
||||
file_stream = Stream.from_source(
|
||||
FileSource('data.csv', format='csv')
|
||||
)
|
||||
|
||||
# Load from an API
|
||||
api_stream = Stream.from_source(
|
||||
APISource('https://api.example.com/data', auth_token='your_token')
|
||||
)
|
||||
|
||||
# Process both streams
|
||||
for item in file_stream:
|
||||
print(f"Processing: {item}")
|
||||
```
|
||||
|
||||
## Common Use Cases
|
||||
|
||||
### How do I transform data in a stream?
|
||||
|
||||
```python
|
||||
from dataflow import Stream, Transformer
|
||||
|
||||
stream = Stream.from_list([
|
||||
{'name': 'Alice', 'age': 30},
|
||||
{'name': 'Bob', 'age': 25}
|
||||
])
|
||||
|
||||
# Apply transformations
|
||||
transformed = stream.map(lambda x: {
|
||||
**x,
|
||||
'age_group': 'adult' if x['age'] >= 18 else 'minor'
|
||||
})
|
||||
|
||||
for person in transformed:
|
||||
print(person)
|
||||
# Output: {'name': 'Alice', 'age': 30, 'age_group': 'adult'}
|
||||
# {'name': 'Bob', 'age': 25, 'age_group': 'adult'}
|
||||
```
|
||||
|
||||
### How do I filter data in a stream?
|
||||
|
||||
```python
|
||||
from dataflow import Stream
|
||||
|
||||
stream = Stream.from_list([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
|
||||
|
||||
# Filter even numbers
|
||||
evens = stream.filter(lambda x: x % 2 == 0)
|
||||
|
||||
print(evens.to_list()) # Output: [2, 4, 6, 8, 10]
|
||||
```
|
||||
|
||||
### How do I export stream data to different formats?
|
||||
|
||||
```python
|
||||
from dataflow import Stream
|
||||
|
||||
stream = Stream.from_list([
|
||||
{'id': 1, 'value': 100},
|
||||
{'id': 2, 'value': 200}
|
||||
])
|
||||
|
||||
# Export to JSON
|
||||
stream.to_json('output.json', indent=2)
|
||||
|
||||
# Export to CSV
|
||||
stream.to_csv('output.csv', headers=True)
|
||||
|
||||
# Export to parquet (requires pyarrow)
|
||||
stream.to_parquet('output.parquet', compression='snappy')
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
### How do I handle errors during stream processing?
|
||||
|
||||
```python
|
||||
from dataflow import Stream, ProcessingError
|
||||
|
||||
stream = Stream.from_list([1, 2, 'invalid', 4, 5])
|
||||
|
||||
try:
|
||||
result = stream.map(lambda x: x * 2)
|
||||
except ProcessingError as e:
|
||||
print(f"Error processing stream: {e}")
|
||||
print(f"Failed at item: {e.item}")
|
||||
print(f"Position: {e.position}")
|
||||
```
|
||||
|
||||
### How do I skip invalid items in a stream?
|
||||
|
||||
```python
|
||||
from dataflow import Stream
|
||||
|
||||
stream = Stream.from_list([1, 2, 'invalid', 4, 5])
|
||||
|
||||
# Skip invalid items with error handling
|
||||
def safe_transform(x):
|
||||
try:
|
||||
return x * 2
|
||||
except TypeError:
|
||||
return None
|
||||
|
||||
result = stream.map(safe_transform).filter(lambda x: x is not None)
|
||||
print(result.to_list()) # Output: [2, 4, 8, 10]
|
||||
```
|
||||
|
||||
## Advanced Usage
|
||||
|
||||
### How do I chain multiple transformations?
|
||||
|
||||
```python
|
||||
from dataflow import Stream
|
||||
|
||||
result = (
|
||||
Stream.from_list([1, 2, 3, 4, 5])
|
||||
.filter(lambda x: x > 2) # Keep values > 2
|
||||
.map(lambda x: x * 2) # Double them
|
||||
.map(lambda x: {'value': x}) # Convert to dict
|
||||
.to_list()
|
||||
)
|
||||
|
||||
print(result)
|
||||
# Output: [{'value': 6}, {'value': 8}, {'value': 10}]
|
||||
```
|
||||
|
||||
### How do I process streams in parallel?
|
||||
|
||||
```python
|
||||
from dataflow import Stream, ParallelProcessor
|
||||
|
||||
stream = Stream.from_list(range(1000))
|
||||
|
||||
# Process with 4 worker threads
|
||||
processor = ParallelProcessor(workers=4)
|
||||
result = processor.map(stream, lambda x: expensive_operation(x))
|
||||
|
||||
for item in result:
|
||||
print(item)
|
||||
```
|
||||
|
||||
### How do I aggregate data from a stream?
|
||||
|
||||
```python
|
||||
from dataflow import Stream
|
||||
|
||||
stream = Stream.from_list([1, 2, 3, 4, 5])
|
||||
|
||||
# Calculate sum
|
||||
total = stream.reduce(lambda acc, x: acc + x, initial=0)
|
||||
print(f"Sum: {total}") # Output: Sum: 15
|
||||
|
||||
# Calculate average
|
||||
count = stream.count()
|
||||
average = total / count
|
||||
print(f"Average: {average}") # Output: Average: 3.0
|
||||
```
|
||||
|
||||
## Performance
|
||||
|
||||
### How do I optimize stream processing performance?
|
||||
|
||||
```python
|
||||
from dataflow import Stream, BufferedStream
|
||||
|
||||
# Use buffering for better performance
|
||||
stream = BufferedStream.from_source(
|
||||
source=large_data_source,
|
||||
buffer_size=1000 # Process in chunks of 1000
|
||||
)
|
||||
|
||||
# Enable lazy evaluation
|
||||
result = stream.lazy().map(transform_fn).filter(filter_fn)
|
||||
|
||||
# Only evaluate when needed
|
||||
final_data = result.to_list()
|
||||
```
|
||||
|
||||
## Requirements
|
||||
|
||||
- Python 3.7 or higher
|
||||
- Optional dependencies:
|
||||
- `pyarrow` for Parquet support
|
||||
- `pandas` for DataFrame integration
|
||||
|
||||
## Installation Options
|
||||
|
||||
Standard installation:
|
||||
```bash
|
||||
pip install dataflow
|
||||
```
|
||||
|
||||
With optional dependencies:
|
||||
```bash
|
||||
pip install dataflow[parquet] # For Parquet support
|
||||
pip install dataflow[pandas] # For pandas integration
|
||||
pip install dataflow[all] # All optional features
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
MIT License - see LICENSE file for details
|
||||
```
|
||||
|
||||
### Improvements in the AFTER version:
|
||||
- ✅ Removed metadata badges and timestamps
|
||||
- ✅ Question-driven headers throughout
|
||||
- ✅ Complete code examples with imports and context
|
||||
- ✅ Progressive complexity (basic → advanced)
|
||||
- ✅ Error handling examples
|
||||
- ✅ Multiple use cases demonstrated
|
||||
- ✅ Concrete outputs shown in comments
|
||||
- ✅ Installation options clearly explained
|
||||
- ✅ Common questions answered with working code
|
||||
|
||||
---
|
||||
|
||||
## C7Score Impact Estimate
|
||||
|
||||
### BEFORE Version Metrics:
|
||||
- Question-Snippet Matching: ~40/100 (incomplete examples, poor alignment)
|
||||
- LLM Evaluation: ~50/100 (vague descriptions)
|
||||
- Formatting: ~70/100 (basic markdown, code blocks present)
|
||||
- Metadata Removal: ~30/100 (badges and timestamps present)
|
||||
- Initialization Examples: ~50/100 (incomplete quickstart)
|
||||
|
||||
**Estimated BEFORE c7score: ~45/100**
|
||||
|
||||
### AFTER Version Metrics:
|
||||
- Question-Snippet Matching: ~90/100 (excellent Q&A alignment)
|
||||
- LLM Evaluation: ~95/100 (comprehensive, clear)
|
||||
- Formatting: ~95/100 (proper structure, complete blocks)
|
||||
- Metadata Removal: ~100/100 (all noise removed)
|
||||
- Initialization Examples: ~95/100 (complete, progressive)
|
||||
|
||||
**Estimated AFTER c7score: ~92/100**
|
||||
|
||||
---
|
||||
|
||||
## Key Transformation Patterns Used
|
||||
|
||||
1. **Question Headers**: "Installation" → "How do I install DataFlow?"
|
||||
2. **Complete Examples**: Added imports, setup, and expected outputs
|
||||
3. **Progressive Complexity**: Basic → Common → Advanced sections
|
||||
4. **Error Scenarios**: Dedicated error handling examples
|
||||
5. **Concrete Outputs**: Included actual output in code comments
|
||||
6. **Noise Removal**: Stripped badges and timestamps
|
||||
7. **Context Addition**: Every snippet is runnable as-is
|
||||
8. **Multiple Paths**: Showed different ways to achieve goals
|
||||
|
||||
Use this example as a template for optimizing your own documentation!
|
||||
Reference in New Issue
Block a user