Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:35:51 +08:00
commit d6a8021492
6 changed files with 740 additions and 0 deletions

View File

@@ -0,0 +1,164 @@
---
name: xml-element-extractor
description: Extract specific XML elements from source files using Python and optionally format with xmllint. This skill should be used when users need to isolate a single XML element (like InstrumentVector with Id="0") from a larger XML document, preserving the complete element structure including opening and closing tags.
---
# XML Element Extractor
## Overview
This skill enables precise extraction of XML elements from source files using Python's standard library (xml.etree.ElementTree), with optional xmllint formatting for clean output. Use this skill when you need to isolate a specific XML element from a larger document while maintaining its complete structure.
## ⚠️ Agent Guidelines
**DO NOT READ XML FILES DIRECTLY**
- **Never use Read tool on XML files** - Large files can exceed context limits
- **Never display XML content directly** - Always use the extraction script
- **Always use the Python extraction script** - Optimized for efficient XML processing
- **Process XML externally** - Let the script handle parsing, not the agent
This prevents context overflow and maintains performance.
**Key advantages of the Python implementation:**
- **Maximum compatibility**: Uses only Python standard library, no third-party dependencies
- **Robust parsing**: Proper XML parsing handles complex structures, special characters, and encoding
- **Attribute order insensitive**: Works regardless of attribute order in XML tags
- **Better error handling**: Provides clear error messages and graceful failure handling
- **Cross-platform**: Works consistently across different operating systems
## Quick Start
To extract an XML element:
1. Identify the source XML file path
2. Choose destination file path for the extracted element
3. Specify the exact opening tag (including attributes) of the element to extract
4. Execute the extraction script
5. Verify the output file contains the desired element
## Extraction Process
### Step 1: Prepare Input Parameters
Gather the following parameters:
- **Source XML file**: Path to the input XML file containing the element to extract
- **Destination XML file**: Path where the extracted element will be saved
- **Element tag**: Exact opening tag including all attributes (e.g., `<InstrumentVector Id="0">`)
### Step 2: Execute Extraction Script
Run the extraction script with the three parameters:
```bash
python3 scripts/extract_xml_element.py <source_file> <dest_file> <element_tag>
```
Example:
```bash
python3 scripts/extract_xml_element.py source.xml dest.xml '<InstrumentVector Id="0">'
```
**Note:** The script is implemented in Python but maintains the `.py` extension for compatibility with existing workflows. The shebang line ensures it executes with Python.
### Step 3: Verify Output
The script will:
- Extract the first matching XML element from the source file
- Save it to the destination file
- Optionally format the output with xmllint if available
- Report success or failure with appropriate error messages
## Error Handling
The script provides basic error handling for common issues:
- **Missing source file**: Displays error if source file doesn't exist
- **Unreadable source file**: Displays error if source file cannot be read
- **No matching element**: Displays error if the specified element tag is not found
- **Failed extraction**: Displays error if the extraction process fails
## XML Formatting
If xmllint is available on the system, the script will automatically format the extracted XML with:
- 2-space indentation
- Proper XML structure validation
- Clean, readable output format
If xmllint is not available, the extracted element will be saved in its original formatting.
## Usage Examples
### Basic Extraction
```bash
# Extract InstrumentVector with Id="0"
python3 scripts/extract_xml_element.py live_set.xml instrument_vector.xml '<InstrumentVector Id="0">'
# Extract MidiTrack element
python3 scripts/extract_xml_element.py tracks.xml midi_track.xml '<MidiTrack Id="42">'
# Extract complex elements with quotes and special characters
python3 scripts/extract_xml_element.py live_set.xml device.xml '<MxDeviceAudioEffect Id="5">'
```
### Error Cases
```bash
# This will fail - source file doesn't exist
python3 scripts/extract_xml_element.py missing.xml output.xml '<Element>'
# This will fail - no matching element found
python3 scripts/extract_xml_element.py source.xml output.xml '<NonExistentTag>'
# This will fail - malformed XML
python3 scripts/extract_xml_element.py malformed.xml output.xml '<Element>'
```
## Resources
### scripts/
**extract_xml_element.py**: Main Python script that performs XML element extraction using Python's standard library. The script:
- Takes three parameters: source file, destination file, and element tag
- Validates input parameters and file accessibility
- Uses xml.etree.ElementTree for robust XML parsing and element extraction
- Handles complex XML structures, special characters, and attribute order variations
- Optionally formats output with xmllint if available, with Python minidom fallback
- Provides clear error messages for common failure cases
- Cross-platform compatible (Windows, macOS, Linux)
### references/
**python_xml.md**: Reference documentation for XML element extraction process, including explanations of the Python parsing logic and troubleshooting guides for common extraction issues.
## Technical Details
### Python XML Processing
The script uses Python's standard library with the following key components:
**XML Parsing:**
- `xml.etree.ElementTree.parse()`: Parses XML files into structured element trees
- `tree.iter(tag_name)`: Iterates through all elements with matching tag names
- `ET.tostring()`: Converts elements back to XML strings
**Tag Matching Algorithm:**
- **Attribute order normalization**: Sorts attributes for consistent comparison
- **Regex-based tag parsing**: Extracts tag names and attributes using regular expressions
- **Exact matching**: Ensures precise element identification including all attributes
**Error Detection:**
- XML parsing validation using `ET.ParseError`
- File I/O error handling with proper exception catching
- Empty output file detection and cleanup
- Timeout handling for external tool calls
**Formatting Options:**
- **Primary**: xmllint with `XMLLINT_INDENT` environment variable for 2-space indentation
- **Fallback**: Python's `xml.dom.minidom` for cross-platform compatibility
### Cross-Platform Compatibility
The Python implementation provides consistent behavior across:
- **Windows**: Uses subprocess calls with proper path handling
- **macOS**: Native Unix-style subprocess execution
- **Linux**: Standard Python subprocess behavior
No platform-specific tools are required, making the skill truly portable.