Initial commit

2025-11-29 17:52:13 +08:00
commit 4b20ee9596
10 changed files with 3079 additions and 0 deletions
--- a/skills/llm-docs-optimizer/examples/sample_readme.md
+++ b/skills/llm-docs-optimizer/examples/sample_readme.md
@@ -0,0 +1,352 @@
+# Example: Before and After C7Score Optimization
+
+This example demonstrates the transformation of a typical README into an optimized version that scores higher on the c7score benchmark.
+
+---
+
+## BEFORE Optimization (Low c7score)
+
+```markdown
+# DataFlow
+
+![Build Status](https://img.shields.io/badge/build-passing-brightgreen)
+![Version](https://img.shields.io/badge/version-2.3.1-blue)
+Last updated: March 15, 2024
+
+DataFlow is a Python library for processing data streams.
+
+## Features
+
+- Stream processing
+- Data transformation
+- Multiple output formats
+- Error handling
+
+## Installation
+
+```bash
+pip install dataflow
+```
+
+## Usage
+
+```python
+from dataflow import Stream
+s = Stream()
+```
+
+## API
+
+See the API documentation for details.
+
+## License
+
+MIT
+```
+
+### Issues with the BEFORE version:
+- ❌ Metadata badges and timestamps (reduce c7score)
+- ❌ Vague descriptions without concrete examples
+- ❌ Code snippets lack context and imports
+- ❌ No question-driven structure
+- ❌ Missing error handling examples
+- ❌ No progressive complexity
+- ❌ Features listed without demonstrations
+- ❌ Incomplete quickstart example
+
+---
+
+## AFTER Optimization (High c7score)
+
+```markdown
+# DataFlow
+
+A Python library for processing data streams with real-time transformations and multiple output formats.
+
+## Quick Start
+
+### How do I install DataFlow?
+
+```bash
+pip install dataflow
+```
+
+For development installation:
+```bash
+git clone https://github.com/example/dataflow.git
+cd dataflow
+pip install -e ".[dev]"
+```
+
+### How do I process my first data stream?
+
+```python
+from dataflow import Stream, Processor
+
+# Create a stream and add data
+stream = Stream()
+stream.add_data([1, 2, 3, 4, 5])
+
+# Process the stream
+processor = Processor()
+result = processor.transform(stream, lambda x: x * 2)
+
+print(result.to_list())  # Output: [2, 4, 6, 8, 10]
+```
+
+### How do I handle different data sources?
+
+```python
+from dataflow import Stream, FileSource, APISource
+
+# Load from a file
+file_stream = Stream.from_source(
+    FileSource('data.csv', format='csv')
+)
+
+# Load from an API
+api_stream = Stream.from_source(
+    APISource('https://api.example.com/data', auth_token='your_token')
+)
+
+# Process both streams
+for item in file_stream:
+    print(f"Processing: {item}")
+```
+
+## Common Use Cases
+
+### How do I transform data in a stream?
+
+```python
+from dataflow import Stream, Transformer
+
+stream = Stream.from_list([
+    {'name': 'Alice', 'age': 30},
+    {'name': 'Bob', 'age': 25}
+])
+
+# Apply transformations
+transformed = stream.map(lambda x: {
+    **x,
+    'age_group': 'adult' if x['age'] >= 18 else 'minor'
+})
+
+for person in transformed:
+    print(person)
+# Output: {'name': 'Alice', 'age': 30, 'age_group': 'adult'}
+#         {'name': 'Bob', 'age': 25, 'age_group': 'adult'}
+```
+
+### How do I filter data in a stream?
+
+```python
+from dataflow import Stream
+
+stream = Stream.from_list([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
+
+# Filter even numbers
+evens = stream.filter(lambda x: x % 2 == 0)
+
+print(evens.to_list())  # Output: [2, 4, 6, 8, 10]
+```
+
+### How do I export stream data to different formats?
+
+```python
+from dataflow import Stream
+
+stream = Stream.from_list([
+    {'id': 1, 'value': 100},
+    {'id': 2, 'value': 200}
+])
+
+# Export to JSON
+stream.to_json('output.json', indent=2)
+
+# Export to CSV
+stream.to_csv('output.csv', headers=True)
+
+# Export to parquet (requires pyarrow)
+stream.to_parquet('output.parquet', compression='snappy')
+```
+
+## Error Handling
+
+### How do I handle errors during stream processing?
+
+```python
+from dataflow import Stream, ProcessingError
+
+stream = Stream.from_list([1, 2, 'invalid', 4, 5])
+
+try:
+    result = stream.map(lambda x: x * 2)
+except ProcessingError as e:
+    print(f"Error processing stream: {e}")
+    print(f"Failed at item: {e.item}")
+    print(f"Position: {e.position}")
+```
+
+### How do I skip invalid items in a stream?
+
+```python
+from dataflow import Stream
+
+stream = Stream.from_list([1, 2, 'invalid', 4, 5])
+
+# Skip invalid items with error handling
+def safe_transform(x):
+    try:
+        return x * 2
+    except TypeError:
+        return None
+
+result = stream.map(safe_transform).filter(lambda x: x is not None)
+print(result.to_list())  # Output: [2, 4, 8, 10]
+```
+
+## Advanced Usage
+
+### How do I chain multiple transformations?
+
+```python
+from dataflow import Stream
+
+result = (
+    Stream.from_list([1, 2, 3, 4, 5])
+    .filter(lambda x: x > 2)           # Keep values > 2
+    .map(lambda x: x * 2)               # Double them
+    .map(lambda x: {'value': x})        # Convert to dict
+    .to_list()
+)
+
+print(result)
+# Output: [{'value': 6}, {'value': 8}, {'value': 10}]
+```
+
+### How do I process streams in parallel?
+
+```python
+from dataflow import Stream, ParallelProcessor
+
+stream = Stream.from_list(range(1000))
+
+# Process with 4 worker threads
+processor = ParallelProcessor(workers=4)
+result = processor.map(stream, lambda x: expensive_operation(x))
+
+for item in result:
+    print(item)
+```
+
+### How do I aggregate data from a stream?
+
+```python
+from dataflow import Stream
+
+stream = Stream.from_list([1, 2, 3, 4, 5])
+
+# Calculate sum
+total = stream.reduce(lambda acc, x: acc + x, initial=0)
+print(f"Sum: {total}")  # Output: Sum: 15
+
+# Calculate average
+count = stream.count()
+average = total / count
+print(f"Average: {average}")  # Output: Average: 3.0
+```
+
+## Performance
+
+### How do I optimize stream processing performance?
+
+```python
+from dataflow import Stream, BufferedStream
+
+# Use buffering for better performance
+stream = BufferedStream.from_source(
+    source=large_data_source,
+    buffer_size=1000  # Process in chunks of 1000
+)
+
+# Enable lazy evaluation
+result = stream.lazy().map(transform_fn).filter(filter_fn)
+
+# Only evaluate when needed
+final_data = result.to_list()
+```
+
+## Requirements
+
+- Python 3.7 or higher
+- Optional dependencies:
+  - `pyarrow` for Parquet support
+  - `pandas` for DataFrame integration
+
+## Installation Options
+
+Standard installation:
+```bash
+pip install dataflow
+```
+
+With optional dependencies:
+```bash
+pip install dataflow[parquet]  # For Parquet support
+pip install dataflow[pandas]   # For pandas integration
+pip install dataflow[all]      # All optional features
+```
+
+## License
+
+MIT License - see LICENSE file for details
+```
+
+### Improvements in the AFTER version:
+- ✅ Removed metadata badges and timestamps
+- ✅ Question-driven headers throughout
+- ✅ Complete code examples with imports and context
+- ✅ Progressive complexity (basic → advanced)
+- ✅ Error handling examples
+- ✅ Multiple use cases demonstrated
+- ✅ Concrete outputs shown in comments
+- ✅ Installation options clearly explained
+- ✅ Common questions answered with working code
+
+---
+
+## C7Score Impact Estimate
+
+### BEFORE Version Metrics:
+- Question-Snippet Matching: ~40/100 (incomplete examples, poor alignment)
+- LLM Evaluation: ~50/100 (vague descriptions)
+- Formatting: ~70/100 (basic markdown, code blocks present)
+- Metadata Removal: ~30/100 (badges and timestamps present)
+- Initialization Examples: ~50/100 (incomplete quickstart)
+
+**Estimated BEFORE c7score: ~45/100**
+
+### AFTER Version Metrics:
+- Question-Snippet Matching: ~90/100 (excellent Q&A alignment)
+- LLM Evaluation: ~95/100 (comprehensive, clear)
+- Formatting: ~95/100 (proper structure, complete blocks)
+- Metadata Removal: ~100/100 (all noise removed)
+- Initialization Examples: ~95/100 (complete, progressive)
+
+**Estimated AFTER c7score: ~92/100**
+
+---
+
+## Key Transformation Patterns Used
+
+1. **Question Headers**: "Installation" → "How do I install DataFlow?"
+2. **Complete Examples**: Added imports, setup, and expected outputs
+3. **Progressive Complexity**: Basic → Common → Advanced sections
+4. **Error Scenarios**: Dedicated error handling examples
+5. **Concrete Outputs**: Included actual output in code comments
+6. **Noise Removal**: Stripped badges and timestamps
+7. **Context Addition**: Every snippet is runnable as-is
+8. **Multiple Paths**: Showed different ways to achieve goals
+
+Use this example as a template for optimizing your own documentation!