--- name: xml-element-extractor description: Extract specific XML elements from source files using Python and optionally format with xmllint. This skill should be used when users need to isolate a single XML element (like InstrumentVector with Id="0") from a larger XML document, preserving the complete element structure including opening and closing tags. --- # XML Element Extractor ## Overview This skill enables precise extraction of XML elements from source files using Python's standard library (xml.etree.ElementTree), with optional xmllint formatting for clean output. Use this skill when you need to isolate a specific XML element from a larger document while maintaining its complete structure. ## ⚠️ Agent Guidelines **DO NOT READ XML FILES DIRECTLY** - **Never use Read tool on XML files** - Large files can exceed context limits - **Never display XML content directly** - Always use the extraction script - **Always use the Python extraction script** - Optimized for efficient XML processing - **Process XML externally** - Let the script handle parsing, not the agent This prevents context overflow and maintains performance. **Key advantages of the Python implementation:** - **Maximum compatibility**: Uses only Python standard library, no third-party dependencies - **Robust parsing**: Proper XML parsing handles complex structures, special characters, and encoding - **Attribute order insensitive**: Works regardless of attribute order in XML tags - **Better error handling**: Provides clear error messages and graceful failure handling - **Cross-platform**: Works consistently across different operating systems ## Quick Start To extract an XML element: 1. Identify the source XML file path 2. Choose destination file path for the extracted element 3. Specify the exact opening tag (including attributes) of the element to extract 4. Execute the extraction script 5. Verify the output file contains the desired element ## Extraction Process ### Step 1: Prepare Input Parameters Gather the following parameters: - **Source XML file**: Path to the input XML file containing the element to extract - **Destination XML file**: Path where the extracted element will be saved - **Element tag**: Exact opening tag including all attributes (e.g., ``) ### Step 2: Execute Extraction Script Run the extraction script with the three parameters: ```bash python3 scripts/extract_xml_element.py ``` Example: ```bash python3 scripts/extract_xml_element.py source.xml dest.xml '' ``` **Note:** The script is implemented in Python but maintains the `.py` extension for compatibility with existing workflows. The shebang line ensures it executes with Python. ### Step 3: Verify Output The script will: - Extract the first matching XML element from the source file - Save it to the destination file - Optionally format the output with xmllint if available - Report success or failure with appropriate error messages ## Error Handling The script provides basic error handling for common issues: - **Missing source file**: Displays error if source file doesn't exist - **Unreadable source file**: Displays error if source file cannot be read - **No matching element**: Displays error if the specified element tag is not found - **Failed extraction**: Displays error if the extraction process fails ## XML Formatting If xmllint is available on the system, the script will automatically format the extracted XML with: - 2-space indentation - Proper XML structure validation - Clean, readable output format If xmllint is not available, the extracted element will be saved in its original formatting. ## Usage Examples ### Basic Extraction ```bash # Extract InstrumentVector with Id="0" python3 scripts/extract_xml_element.py live_set.xml instrument_vector.xml '' # Extract MidiTrack element python3 scripts/extract_xml_element.py tracks.xml midi_track.xml '' # Extract complex elements with quotes and special characters python3 scripts/extract_xml_element.py live_set.xml device.xml '' ``` ### Error Cases ```bash # This will fail - source file doesn't exist python3 scripts/extract_xml_element.py missing.xml output.xml '' # This will fail - no matching element found python3 scripts/extract_xml_element.py source.xml output.xml '' # This will fail - malformed XML python3 scripts/extract_xml_element.py malformed.xml output.xml '' ``` ## Resources ### scripts/ **extract_xml_element.py**: Main Python script that performs XML element extraction using Python's standard library. The script: - Takes three parameters: source file, destination file, and element tag - Validates input parameters and file accessibility - Uses xml.etree.ElementTree for robust XML parsing and element extraction - Handles complex XML structures, special characters, and attribute order variations - Optionally formats output with xmllint if available, with Python minidom fallback - Provides clear error messages for common failure cases - Cross-platform compatible (Windows, macOS, Linux) ### references/ **python_xml.md**: Reference documentation for XML element extraction process, including explanations of the Python parsing logic and troubleshooting guides for common extraction issues. ## Technical Details ### Python XML Processing The script uses Python's standard library with the following key components: **XML Parsing:** - `xml.etree.ElementTree.parse()`: Parses XML files into structured element trees - `tree.iter(tag_name)`: Iterates through all elements with matching tag names - `ET.tostring()`: Converts elements back to XML strings **Tag Matching Algorithm:** - **Attribute order normalization**: Sorts attributes for consistent comparison - **Regex-based tag parsing**: Extracts tag names and attributes using regular expressions - **Exact matching**: Ensures precise element identification including all attributes **Error Detection:** - XML parsing validation using `ET.ParseError` - File I/O error handling with proper exception catching - Empty output file detection and cleanup - Timeout handling for external tool calls **Formatting Options:** - **Primary**: xmllint with `XMLLINT_INDENT` environment variable for 2-space indentation - **Fallback**: Python's `xml.dom.minidom` for cross-platform compatibility ### Cross-Platform Compatibility The Python implementation provides consistent behavior across: - **Windows**: Uses subprocess calls with proper path handling - **macOS**: Native Unix-style subprocess execution - **Linux**: Standard Python subprocess behavior No platform-specific tools are required, making the skill truly portable.