Initial commit
This commit is contained in:
18
.claude-plugin/plugin.json
Normal file
18
.claude-plugin/plugin.json
Normal file
@@ -0,0 +1,18 @@
|
||||
{
|
||||
"name": "breenix-development",
|
||||
"description": "Core kernel development: fast debug loops, log analysis, systematic debugging, code quality checks, memory debugging, boot analysis, and legacy migration",
|
||||
"version": "0.0.0-2025.11.28",
|
||||
"author": {
|
||||
"name": "Ryan Breen",
|
||||
"email": "ryan@breen.com"
|
||||
},
|
||||
"skills": [
|
||||
"./skills/breenix-kernel-debug-loop",
|
||||
"./skills/breenix-log-analysis",
|
||||
"./skills/breenix-systematic-debugging",
|
||||
"./skills/breenix-code-quality-check",
|
||||
"./skills/breenix-memory-debugging",
|
||||
"./skills/breenix-boot-analysis",
|
||||
"./skills/breenix-legacy-migration"
|
||||
]
|
||||
}
|
||||
3
README.md
Normal file
3
README.md
Normal file
@@ -0,0 +1,3 @@
|
||||
# breenix-development
|
||||
|
||||
Core kernel development: fast debug loops, log analysis, systematic debugging, code quality checks, memory debugging, boot analysis, and legacy migration
|
||||
72
plugin.lock.json
Normal file
72
plugin.lock.json
Normal file
@@ -0,0 +1,72 @@
|
||||
{
|
||||
"$schema": "internal://schemas/plugin.lock.v1.json",
|
||||
"pluginId": "gh:ryanbreen/breenix:breenix-development",
|
||||
"normalized": {
|
||||
"repo": null,
|
||||
"ref": "refs/tags/v20251128.0",
|
||||
"commit": "1774fe2fa832b81198a10163252b61a2f786a03d",
|
||||
"treeHash": "53708199f6e8b8d96c27c695d5938812dc5ef908cbe1c63b00abd89e3327051d",
|
||||
"generatedAt": "2025-11-28T10:28:06.263570Z",
|
||||
"toolVersion": "publish_plugins.py@0.2.0"
|
||||
},
|
||||
"origin": {
|
||||
"remote": "git@github.com:zhongweili/42plugin-data.git",
|
||||
"branch": "master",
|
||||
"commit": "aa1497ed0949fd50e99e70d6324a29c5b34f9390",
|
||||
"repoRoot": "/Users/zhongweili/projects/openmind/42plugin-data"
|
||||
},
|
||||
"manifest": {
|
||||
"name": "breenix-development",
|
||||
"description": "Core kernel development: fast debug loops, log analysis, systematic debugging, code quality checks, memory debugging, boot analysis, and legacy migration"
|
||||
},
|
||||
"content": {
|
||||
"files": [
|
||||
{
|
||||
"path": "README.md",
|
||||
"sha256": "fd814454a9d8445f54cce46434285199b3e4065e3e45c932c038e8ac8fc9e3e0"
|
||||
},
|
||||
{
|
||||
"path": ".claude-plugin/plugin.json",
|
||||
"sha256": "2475f2943eeb402024b90730c34593f12587c46106bf86de292f90449bcd2149"
|
||||
},
|
||||
{
|
||||
"path": "skills/breenix-systematic-debugging/SKILL.md",
|
||||
"sha256": "37e1669de393c5ad61be59df5d86cbbbd42b7f383544ee648861b911157cb9ee"
|
||||
},
|
||||
{
|
||||
"path": "skills/breenix-boot-analysis/SKILL.md",
|
||||
"sha256": "4acaf29352fe284bf7a2199b2fe20ac81ecad23efa1fed451533159e78d87283"
|
||||
},
|
||||
{
|
||||
"path": "skills/breenix-legacy-migration/SKILL.md",
|
||||
"sha256": "007b9836ebf39e5545b532a5aa4a0b2e51fd2b6c2402be1233935f7a69251b81"
|
||||
},
|
||||
{
|
||||
"path": "skills/breenix-code-quality-check/SKILL.md",
|
||||
"sha256": "451d904ee65531e188d8771082c64796601e8f6366261f116ee70eb43df20ffb"
|
||||
},
|
||||
{
|
||||
"path": "skills/breenix-memory-debugging/SKILL.md",
|
||||
"sha256": "05074786424014efcbd4f4ff2e5ed43dccd57d786667d6841de08ba530f7bd92"
|
||||
},
|
||||
{
|
||||
"path": "skills/breenix-kernel-debug-loop/SKILL.md",
|
||||
"sha256": "b0f7e323c15fed53778e159f2d177e0570f9e41b781c5bd1bcc97541a28101b9"
|
||||
},
|
||||
{
|
||||
"path": "skills/breenix-kernel-debug-loop/scripts/quick_debug.py",
|
||||
"sha256": "11122a8cc2ea1d4eeca89e3d6c5f89feef07933db08c54e1383c2b8753a7a507"
|
||||
},
|
||||
{
|
||||
"path": "skills/breenix-log-analysis/SKILL.md",
|
||||
"sha256": "2a0acdab042e81091c370be19e642fb18332857fbb3c52b837f2b5d7526f9781"
|
||||
}
|
||||
],
|
||||
"dirSha256": "53708199f6e8b8d96c27c695d5938812dc5ef908cbe1c63b00abd89e3327051d"
|
||||
},
|
||||
"security": {
|
||||
"scannedAt": null,
|
||||
"scannerVersion": null,
|
||||
"flags": []
|
||||
}
|
||||
}
|
||||
622
skills/breenix-boot-analysis/SKILL.md
Normal file
622
skills/breenix-boot-analysis/SKILL.md
Normal file
@@ -0,0 +1,622 @@
|
||||
---
|
||||
name: boot-analysis
|
||||
description: This skill should be used when analyzing the Breenix kernel boot sequence, verifying initialization order, timing boot stages, identifying boot failures, optimizing boot time, or understanding the boot process from bootloader handoff to kernel ready state.
|
||||
---
|
||||
|
||||
# Boot Sequence Analysis for Breenix
|
||||
|
||||
Analyze and optimize the kernel boot process from bootloader to kernel ready.
|
||||
|
||||
## Purpose
|
||||
|
||||
Understanding the boot sequence is critical for debugging initialization issues, optimizing boot time, and ensuring proper subsystem ordering. This skill provides tools for analyzing boot logs, verifying checkpoint progression, and identifying boot failures.
|
||||
|
||||
## When to Use
|
||||
|
||||
- **Boot failures**: Kernel hangs or crashes during initialization
|
||||
- **Initialization order issues**: Subsystems initialized in wrong order
|
||||
- **Boot time optimization**: Reducing time from bootloader to ready
|
||||
- **Checkpoint verification**: Confirming all subsystems initialize correctly
|
||||
- **Boot regression analysis**: New code breaks boot sequence
|
||||
- **Understanding boot flow**: Learning how kernel initialization works
|
||||
|
||||
## Breenix Boot Sequence
|
||||
|
||||
### Phase 1: Bootloader Handoff
|
||||
|
||||
**What happens**:
|
||||
- Bootloader (bootloader crate) loads kernel
|
||||
- Sets up initial page tables
|
||||
- Provides memory map
|
||||
- Transfers control to kernel entry point
|
||||
|
||||
**Entry point**: `kernel/src/main.rs` `kernel_main()`
|
||||
|
||||
**Initial state**:
|
||||
```rust
|
||||
// CPU in Long Mode (64-bit)
|
||||
// Interrupts disabled
|
||||
// Paging enabled (bootloader setup)
|
||||
// Stack ready
|
||||
// Physical memory mapped at offset
|
||||
```
|
||||
|
||||
**Typical log output**:
|
||||
```
|
||||
[Bootloader messages]
|
||||
Loading kernel...
|
||||
Jumping to kernel entry point...
|
||||
```
|
||||
|
||||
### Phase 2: Early Initialization
|
||||
|
||||
**Subsystems initialized** (in order):
|
||||
|
||||
**1. Logger**
|
||||
```
|
||||
[ INFO] Breenix OS starting...
|
||||
```
|
||||
- Serial output configured
|
||||
- Framebuffer initialized
|
||||
- Log level set
|
||||
|
||||
**2. GDT (Global Descriptor Table)**
|
||||
```
|
||||
[ INFO] GDT initialized
|
||||
```
|
||||
- Kernel/user code segments
|
||||
- Kernel/user data segments
|
||||
- TSS (Task State Segment)
|
||||
|
||||
**3. IDT (Interrupt Descriptor Table)**
|
||||
```
|
||||
[ INFO] IDT initialized
|
||||
```
|
||||
- Exception handlers (divide by zero, page fault, etc.)
|
||||
- Interrupt handlers (timer, keyboard)
|
||||
- Double fault handler with IST stack
|
||||
|
||||
**4. PIC (Programmable Interrupt Controller)**
|
||||
```
|
||||
[ INFO] PIC initialized
|
||||
```
|
||||
- Remapped to avoid conflicts
|
||||
- All interrupts masked initially
|
||||
|
||||
### Phase 3: Memory Subsystem
|
||||
|
||||
**5. Frame Allocator**
|
||||
```
|
||||
[ INFO] Physical memory: 94 MiB usable
|
||||
[DEBUG] Frame allocator initialized
|
||||
```
|
||||
- Reads bootloader memory map
|
||||
- Identifies usable regions
|
||||
- Initializes frame tracking
|
||||
|
||||
**6. Heap Allocator**
|
||||
```
|
||||
[ INFO] Heap: 1024 KiB
|
||||
```
|
||||
- Sets up kernel heap
|
||||
- Enables dynamic allocation
|
||||
- #[global_allocator] now functional
|
||||
|
||||
**7. Virtual Memory**
|
||||
```
|
||||
[DEBUG] Page table initialized
|
||||
```
|
||||
- Kernel page table setup
|
||||
- Higher-half kernel mapping
|
||||
- Recursive mapping if used
|
||||
|
||||
**8. Kernel Stacks**
|
||||
```
|
||||
[DEBUG] Kernel stack allocator initialized
|
||||
```
|
||||
- Stack bitmap allocator
|
||||
- Guard pages configured
|
||||
- IST stacks for exceptions
|
||||
|
||||
### Phase 4: Device Drivers
|
||||
|
||||
**9. Timer (PIT)**
|
||||
```
|
||||
[ INFO] Timer initialized at 100 Hz
|
||||
```
|
||||
- Configures Programmable Interval Timer
|
||||
- Sets interrupt frequency
|
||||
- Starts tick counting
|
||||
|
||||
**10. RTC (Real-Time Clock)**
|
||||
```
|
||||
[ INFO] RTC initialized: 2025-10-23 12:34:56 UTC
|
||||
```
|
||||
- Reads hardware clock
|
||||
- Caches boot time
|
||||
- Enables wall-clock time APIs
|
||||
|
||||
**11. Serial Input**
|
||||
```
|
||||
[ INFO] Serial input interrupts enabled
|
||||
```
|
||||
- UART receive interrupts
|
||||
- Input buffer ready
|
||||
- Command processing available
|
||||
|
||||
**12. Keyboard**
|
||||
```
|
||||
[ INFO] Keyboard initialized
|
||||
```
|
||||
- PS/2 keyboard driver
|
||||
- Scancode processing
|
||||
- Key event generation
|
||||
|
||||
### Phase 5: System Infrastructure
|
||||
|
||||
**13. Interrupts Enabled**
|
||||
```
|
||||
[ INFO] Enabling interrupts...
|
||||
```
|
||||
- Unmasks timer interrupt
|
||||
- Unmasks keyboard interrupt
|
||||
- System becomes responsive
|
||||
|
||||
**14. System Calls**
|
||||
```
|
||||
[ INFO] System call infrastructure initialized
|
||||
```
|
||||
- INT 0x80 handler registered
|
||||
- Syscall dispatcher ready
|
||||
- SWAPGS configured
|
||||
|
||||
**15. Threading**
|
||||
```
|
||||
[ INFO] Threading subsystem initialized
|
||||
```
|
||||
- Scheduler initialized
|
||||
- Idle thread created
|
||||
- Context switch infrastructure ready
|
||||
|
||||
**16. Process Management**
|
||||
```
|
||||
[ INFO] Process management initialized
|
||||
```
|
||||
- Process manager ready
|
||||
- PID allocation working
|
||||
- Fork/exec infrastructure initialized
|
||||
|
||||
### Phase 6: Testing (if enabled)
|
||||
|
||||
**17. POST (Power-On Self Test)**
|
||||
```
|
||||
[ INFO] Running POST tests...
|
||||
=== Memory Test ===
|
||||
✅ MEMORY TEST COMPLETE
|
||||
...
|
||||
🎯 KERNEL_POST_TESTS_COMPLETE 🎯
|
||||
```
|
||||
- Validates subsystems
|
||||
- Runs self-checks
|
||||
- Confirms kernel health
|
||||
|
||||
**18. Userspace Tests (if configured)**
|
||||
```
|
||||
RING3_SMOKE: creating hello_time userspace process
|
||||
[ INFO] Process created: PID 1
|
||||
USERSPACE OUTPUT: Hello from userspace!
|
||||
```
|
||||
- Creates test processes
|
||||
- Verifies userspace execution
|
||||
- Tests system calls
|
||||
|
||||
### Phase 7: Kernel Ready
|
||||
|
||||
**Final state**:
|
||||
```
|
||||
[ INFO] Kernel initialization complete
|
||||
[ INFO] System ready
|
||||
```
|
||||
- All subsystems operational
|
||||
- Ready for interactive use or more tests
|
||||
- Idle loop or wait for input
|
||||
|
||||
## Boot Analysis Techniques
|
||||
|
||||
### Technique 1: Extract Boot Timeline
|
||||
|
||||
**Using log-analysis skill:**
|
||||
|
||||
```bash
|
||||
# Get all initialization messages in order
|
||||
grep "initialized\|INITIALIZED\|Initializing" logs/breenix_20251023_*.log
|
||||
|
||||
# Or more comprehensive
|
||||
grep -E "INFO|WARN|ERROR" logs/latest.log | less
|
||||
```
|
||||
|
||||
**Expected sequence**:
|
||||
1. GDT initialized
|
||||
2. IDT initialized
|
||||
3. PIC initialized
|
||||
4. Physical memory info
|
||||
5. Timer initialized
|
||||
6. RTC initialized
|
||||
7. Interrupts enabled
|
||||
8. Threading initialized
|
||||
9. Process management initialized
|
||||
|
||||
### Technique 2: Find Boot Checkpoint Failures
|
||||
|
||||
**Identify last successful checkpoint:**
|
||||
|
||||
```bash
|
||||
# Find last "initialized" message
|
||||
grep "initialized" logs/breenix_*.log | tail -10
|
||||
|
||||
# Or find last successful operation
|
||||
grep "SUCCESS\|✅\|complete" logs/breenix_*.log | tail -10
|
||||
```
|
||||
|
||||
**If boot hangs**:
|
||||
- Last checkpoint shows how far boot progressed
|
||||
- Next subsystem is where hang occurs
|
||||
- Focus debugging on that subsystem
|
||||
|
||||
### Technique 3: Compare Boot Sequences
|
||||
|
||||
**Working vs broken boot:**
|
||||
|
||||
```bash
|
||||
# Extract initialization sequence
|
||||
grep "initialized\|Initializing" working.log > working_boot.txt
|
||||
grep "initialized\|Initializing" broken.log > broken_boot.txt
|
||||
|
||||
# Compare
|
||||
diff -u working_boot.txt broken_boot.txt
|
||||
```
|
||||
|
||||
**Look for**:
|
||||
- Missing initialization steps
|
||||
- Different initialization order
|
||||
- New error messages
|
||||
- Stops at different point
|
||||
|
||||
### Technique 4: Time Boot Stages
|
||||
|
||||
**Add timing checkpoints:**
|
||||
|
||||
```rust
|
||||
let start = kernel::time::get_monotonic_ms();
|
||||
|
||||
// Initialize subsystem
|
||||
gdt::init();
|
||||
|
||||
let elapsed = kernel::time::get_monotonic_ms() - start;
|
||||
log::info!("GDT initialization took {}ms", elapsed);
|
||||
```
|
||||
|
||||
**Analyze timing:**
|
||||
- Which stages are slow?
|
||||
- Where can we optimize?
|
||||
- Any unexpected delays?
|
||||
|
||||
### Technique 5: Verify Subsystem Dependencies
|
||||
|
||||
**Check initialization order:**
|
||||
|
||||
```rust
|
||||
// Memory must be initialized before heap
|
||||
assert!(frame_allocator.is_initialized());
|
||||
heap::init(); // Safe now
|
||||
|
||||
// GDT must be before IDT
|
||||
gdt::init();
|
||||
idt::init(); // Can reference GDT segments
|
||||
|
||||
// Interrupts must be off during sensitive operations
|
||||
assert!(!are_enabled());
|
||||
```
|
||||
|
||||
## Common Boot Issues
|
||||
|
||||
### Issue 1: Boot Hang
|
||||
|
||||
**Symptoms**:
|
||||
- Kernel boots partway then stops
|
||||
- No error message, just hangs
|
||||
- Some subsystems initialized, others not
|
||||
|
||||
**Diagnosis**:
|
||||
```bash
|
||||
# Find last successful operation
|
||||
grep "initialized\|complete" logs/latest.log | tail -1
|
||||
|
||||
# Check if interrupts were enabled prematurely
|
||||
grep "Enabling interrupts" logs/latest.log
|
||||
|
||||
# Look for infinite loops
|
||||
grep "WARN\|ERROR" logs/latest.log
|
||||
```
|
||||
|
||||
**Common causes**:
|
||||
1. **Interrupts enabled too early**
|
||||
- Timer interrupt fires before handler ready
|
||||
- Solution: Ensure all handlers registered before enabling
|
||||
|
||||
2. **Deadlock during initialization**
|
||||
- Lock acquired, never released
|
||||
- Solution: Check lock usage during boot
|
||||
|
||||
3. **Infinite loop in subsystem init**
|
||||
- Waiting for condition that never happens
|
||||
- Solution: Add timeouts or debug why condition fails
|
||||
|
||||
**Fix patterns**:
|
||||
```rust
|
||||
// Add checkpoint logging
|
||||
log::info!("About to initialize subsystem X");
|
||||
subsystem_x::init();
|
||||
log::info!("Subsystem X initialized successfully");
|
||||
|
||||
// If hangs between checkpoints, focus on subsystem_x::init()
|
||||
```
|
||||
|
||||
### Issue 2: Boot Panic
|
||||
|
||||
**Symptoms**:
|
||||
```
|
||||
PANIC: [message]
|
||||
Stack trace: ...
|
||||
```
|
||||
|
||||
**Diagnosis**:
|
||||
```bash
|
||||
# Get panic message
|
||||
grep "PANIC" logs/latest.log
|
||||
|
||||
# Get context
|
||||
grep -B20 "PANIC" logs/latest.log
|
||||
```
|
||||
|
||||
**Common causes**:
|
||||
1. **Assertion failure**
|
||||
```rust
|
||||
assert!(condition); // Failed during boot
|
||||
```
|
||||
Check if assertion is correct or if precondition not met
|
||||
|
||||
2. **Unwrap on None/Err**
|
||||
```rust
|
||||
let value = option.unwrap(); // Panic if None
|
||||
```
|
||||
Use proper error handling during boot
|
||||
|
||||
3. **Out of memory**
|
||||
```
|
||||
allocation error: Layout { ... }
|
||||
```
|
||||
Increase heap size or defer allocation
|
||||
|
||||
### Issue 3: Wrong Initialization Order
|
||||
|
||||
**Symptoms**:
|
||||
- Later subsystem fails
|
||||
- "Not initialized" error
|
||||
- Double fault or page fault
|
||||
|
||||
**Example**:
|
||||
```rust
|
||||
// BAD - heap used before initialization
|
||||
use alloc::vec::Vec;
|
||||
let v = Vec::new(); // Panic! Heap not initialized yet
|
||||
heap::init();
|
||||
|
||||
// GOOD
|
||||
heap::init();
|
||||
use alloc::vec::Vec;
|
||||
let v = Vec::new(); // OK
|
||||
```
|
||||
|
||||
**Diagnosis**:
|
||||
- Check dependency chain
|
||||
- Verify order matches requirements
|
||||
- Look for use-before-init patterns
|
||||
|
||||
### Issue 4: Boot Regression
|
||||
|
||||
**Symptoms**:
|
||||
- Kernel booted before, now doesn't
|
||||
- Recent change broke boot
|
||||
- Used to reach checkpoint X, now stops at Y
|
||||
|
||||
**Diagnosis**:
|
||||
```bash
|
||||
# Find which commit broke it
|
||||
git bisect start
|
||||
git bisect bad HEAD
|
||||
git bisect good last_working_commit
|
||||
|
||||
# Test each commit
|
||||
kernel-debug-loop/scripts/quick_debug.py \
|
||||
--signal "KERNEL READY" \
|
||||
--timeout 15
|
||||
```
|
||||
|
||||
**Fix**:
|
||||
- Identify the breaking commit
|
||||
- Understand what it changed
|
||||
- Fix or revert
|
||||
|
||||
## Boot Optimization
|
||||
|
||||
### Current Boot Time
|
||||
|
||||
**Measure with kernel-debug-loop:**
|
||||
|
||||
```bash
|
||||
# Time to specific checkpoint
|
||||
kernel-debug-loop/scripts/quick_debug.py \
|
||||
--signal "Kernel initialization complete" \
|
||||
--timeout 30
|
||||
```
|
||||
|
||||
**Typical times** (approximate):
|
||||
- Early init (GDT, IDT, PIC): ~10ms
|
||||
- Memory subsystem: ~50ms
|
||||
- Drivers (timer, keyboard): ~20ms
|
||||
- Threading/processes: ~10ms
|
||||
- POST tests: ~100ms (if enabled)
|
||||
- Total boot: ~200-500ms to ready state
|
||||
|
||||
### Optimization Strategies
|
||||
|
||||
**1. Defer non-critical initialization**
|
||||
```rust
|
||||
// Don't initialize during boot if not needed
|
||||
// keyboard::init(); // Defer until first use
|
||||
|
||||
// Or lazy initialization
|
||||
pub fn get_keyboard() -> &'static Keyboard {
|
||||
static INIT: Once = Once::new();
|
||||
INIT.call_once(|| {
|
||||
keyboard::init();
|
||||
});
|
||||
&KEYBOARD
|
||||
}
|
||||
```
|
||||
|
||||
**2. Parallelize independent operations**
|
||||
```rust
|
||||
// Currently serial:
|
||||
timer::init();
|
||||
rtc::init();
|
||||
keyboard::init();
|
||||
|
||||
// Could be parallel (if truly independent):
|
||||
// Note: Difficult in kernel without threading during boot
|
||||
```
|
||||
|
||||
**3. Reduce logging verbosity**
|
||||
```rust
|
||||
// Debug builds: verbose
|
||||
#[cfg(debug_assertions)]
|
||||
log::debug!("Detailed info");
|
||||
|
||||
// Release builds: minimal
|
||||
#[cfg(not(debug_assertions))]
|
||||
log::info!("Essential info only");
|
||||
```
|
||||
|
||||
**4. Optimize expensive operations**
|
||||
```rust
|
||||
// Identify slow operations with timing
|
||||
let start = time::get_monotonic_ms();
|
||||
expensive_operation();
|
||||
let elapsed = time::get_monotonic_ms() - start;
|
||||
if elapsed > 10 {
|
||||
log::warn!("Slow operation: {}ms", elapsed);
|
||||
}
|
||||
```
|
||||
|
||||
## Integration with Other Skills
|
||||
|
||||
### With kernel-debug-loop
|
||||
```bash
|
||||
# Fast iteration on boot fixes
|
||||
kernel-debug-loop/scripts/quick_debug.py \
|
||||
--signal "BOOT_CHECKPOINT" \
|
||||
--timeout 10
|
||||
```
|
||||
|
||||
### With log-analysis
|
||||
```bash
|
||||
# Extract boot sequence
|
||||
echo '"initialized"' > /tmp/log-query.txt
|
||||
./scripts/find-in-logs
|
||||
|
||||
# Find boot failures
|
||||
echo '"PANIC|FAULT|ERROR"' > /tmp/log-query.txt
|
||||
./scripts/find-in-logs
|
||||
```
|
||||
|
||||
### With systematic-debugging
|
||||
Document boot issues:
|
||||
```markdown
|
||||
# Problem
|
||||
Kernel hangs after "Enabling interrupts"
|
||||
|
||||
# Root Cause
|
||||
Timer interrupt handler called before scheduler ready
|
||||
|
||||
# Solution
|
||||
Initialize scheduler before enabling interrupts
|
||||
|
||||
# Evidence
|
||||
Before: Hang
|
||||
After: Boot completes successfully
|
||||
```
|
||||
|
||||
## Boot Checkpoints Reference
|
||||
|
||||
Essential checkpoints every boot should reach:
|
||||
|
||||
```
|
||||
[✓] GDT initialized
|
||||
[✓] IDT initialized
|
||||
[✓] PIC initialized
|
||||
[✓] Physical memory detected
|
||||
[✓] Heap initialized
|
||||
[✓] Timer initialized
|
||||
[✓] Interrupts enabled
|
||||
[✓] Threading initialized
|
||||
[✓] Kernel ready
|
||||
```
|
||||
|
||||
If boot stops before reaching a checkpoint, debug that subsystem.
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Log initialization**: Every subsystem should log successful init
|
||||
2. **Check prerequisites**: Verify dependencies before initializing
|
||||
3. **Fail fast**: Panic early if critical init fails
|
||||
4. **Add checkpoints**: Mark progress through boot sequence
|
||||
5. **Time operations**: Identify bottlenecks
|
||||
6. **Test changes**: Verify boot still works after changes
|
||||
7. **Document order**: Comment why init order matters
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### Key Boot Files
|
||||
```
|
||||
kernel/src/main.rs - Entry point, boot orchestration
|
||||
kernel/src/gdt.rs - GDT initialization
|
||||
kernel/src/interrupts/mod.rs - IDT initialization
|
||||
kernel/src/memory/frame_allocator.rs - Physical memory
|
||||
kernel/src/time/timer.rs - Timer initialization
|
||||
kernel/src/time/rtc.rs - RTC initialization
|
||||
```
|
||||
|
||||
### Boot Signals to Watch
|
||||
```
|
||||
"GDT initialized"
|
||||
"IDT initialized"
|
||||
"Physical memory:"
|
||||
"Heap:"
|
||||
"Timer initialized"
|
||||
"Enabling interrupts"
|
||||
"Threading subsystem initialized"
|
||||
"Kernel initialization complete"
|
||||
```
|
||||
|
||||
## Summary
|
||||
|
||||
Boot analysis requires:
|
||||
- Understanding the complete boot sequence
|
||||
- Identifying checkpoints and dependencies
|
||||
- Using logs to diagnose failures
|
||||
- Comparing working vs broken boots
|
||||
- Optimizing slow operations
|
||||
- Ensuring proper initialization order
|
||||
|
||||
A well-understood boot sequence makes debugging initialization issues straightforward.
|
||||
206
skills/breenix-code-quality-check/SKILL.md
Normal file
206
skills/breenix-code-quality-check/SKILL.md
Normal file
@@ -0,0 +1,206 @@
|
||||
---
|
||||
name: code-quality-check
|
||||
description: This skill should be used before committing code to ensure it meets Breenix quality standards. Use for running clippy checks, fixing compiler warnings, verifying no log side-effects, checking for dead code, and enforcing project coding standards from CLAUDE.md.
|
||||
---
|
||||
|
||||
# Code Quality Checks for Breenix
|
||||
|
||||
Pre-commit code quality verification for Breenix kernel development.
|
||||
|
||||
## Purpose
|
||||
|
||||
Breenix enforces strict code quality standards. This skill provides the checks and fixes required before committing code.
|
||||
|
||||
## Core Quality Standards (from CLAUDE.md)
|
||||
|
||||
1. **Fix ALL compiler warnings before committing**
|
||||
2. **Fix ALL clippy warnings**
|
||||
3. **Use proper patterns** (e.g., `Once`) to avoid unsafe warnings
|
||||
4. **Only `#[allow(dead_code)]`** for legitimate API functions
|
||||
|
||||
## Pre-Commit Checklist
|
||||
|
||||
Before every commit:
|
||||
|
||||
```bash
|
||||
# 1. Build kernel and check for warnings
|
||||
cd kernel
|
||||
cargo build --target x86_64-unknown-none 2>&1 | grep warning
|
||||
|
||||
# 2. Run clippy
|
||||
cargo clippy --target x86_64-unknown-none
|
||||
|
||||
# 3. Run tests (if modifying core subsystems)
|
||||
cd ..
|
||||
cargo test
|
||||
|
||||
# 4. Check for log side-effects (manual)
|
||||
grep -R "log::trace!.*(" kernel/src/ | grep -vE '\".*\"' | grep -vE '\.(as_|to_|into_|len|is_|get)'
|
||||
```
|
||||
|
||||
## Clippy Configuration
|
||||
|
||||
### Project-Specific Clippy Flags
|
||||
|
||||
```bash
|
||||
cd kernel
|
||||
RUSTFLAGS="-Aclippy::redundant_closure_for_method_calls" \
|
||||
cargo clippy --target x86_64-unknown-none \
|
||||
-- -Dclippy::debug_assert_with_mut_call \
|
||||
-Dclippy::print_stdout \
|
||||
-Wclippy::suspicious_operation_groupings
|
||||
```
|
||||
|
||||
### What These Check
|
||||
|
||||
- **`debug_assert_with_mut_call`**: Prevent side-effects in debug assertions
|
||||
- **`print_stdout`**: No print!/println! in kernel (use log! macros)
|
||||
- **`suspicious_operation_groupings`**: Catch likely logic errors
|
||||
|
||||
## Common Issues and Fixes
|
||||
|
||||
### Compiler Warnings
|
||||
|
||||
**Unused imports**:
|
||||
```rust
|
||||
// BAD
|
||||
use x86_64::{VirtAddr, PageTable, PageTableFlags}; // PageTable unused
|
||||
|
||||
// GOOD
|
||||
use x86_64::{VirtAddr, PageTableFlags};
|
||||
```
|
||||
|
||||
**Unused variables**:
|
||||
```rust
|
||||
// BAD
|
||||
let result = some_function(); // result unused
|
||||
|
||||
// GOOD
|
||||
let _result = some_function(); // Explicitly unused
|
||||
// OR
|
||||
some_function(); // Don't capture if not needed
|
||||
```
|
||||
|
||||
**Dead code**:
|
||||
```rust
|
||||
// BAD - function never called
|
||||
fn helper_function() { ... }
|
||||
|
||||
// GOOD - remove it
|
||||
// OR add #[allow(dead_code)] if it's part of a public API
|
||||
|
||||
// GOOD - legitimate API function
|
||||
#[allow(dead_code)] // Part of public allocator API
|
||||
pub fn dealloc_stack(&mut self, stack_id: usize) { ... }
|
||||
```
|
||||
|
||||
### Clippy Warnings
|
||||
|
||||
**Redundant closure**:
|
||||
```rust
|
||||
// BAD
|
||||
items.map(|x| x.to_string())
|
||||
|
||||
// GOOD (but we allow this via RUSTFLAGS)
|
||||
items.map(ToString::to_string)
|
||||
```
|
||||
|
||||
**Debug assert with mutation**:
|
||||
```rust
|
||||
// BAD - side effect in assertion
|
||||
debug_assert!(list.pop().is_some());
|
||||
|
||||
// GOOD - separate the effect
|
||||
let item = list.pop();
|
||||
debug_assert!(item.is_some());
|
||||
```
|
||||
|
||||
### Log Side-Effects
|
||||
|
||||
**Problem**: Function calls in log statements execute even when logging disabled.
|
||||
|
||||
```rust
|
||||
// BAD - get_state() called even if TRACE disabled
|
||||
log::trace!("State: {:?}", get_state());
|
||||
|
||||
// GOOD - only format if needed
|
||||
let state = get_state();
|
||||
log::trace!("State: {:?}", state);
|
||||
|
||||
// BETTER - for expensive operations
|
||||
if log::log_enabled!(log::Level::Trace) {
|
||||
let state = expensive_get_state();
|
||||
log::trace!("State: {:?}", state);
|
||||
}
|
||||
```
|
||||
|
||||
## CI Code Quality Workflow
|
||||
|
||||
The `.github/workflows/code-quality.yml` runs these checks automatically:
|
||||
|
||||
1. **Clippy checks** with project-specific flags
|
||||
2. **Log side-effects scan** for trace statements with function calls
|
||||
3. **Complex log expression check** for multi-argument format strings
|
||||
4. **Log level regression guard** - ensures feature flags control log level
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### Before Commit
|
||||
|
||||
```bash
|
||||
# Full quality check
|
||||
cd kernel
|
||||
cargo build --target x86_64-unknown-none 2>&1 | tee /tmp/build-warnings.txt
|
||||
cargo clippy --target x86_64-unknown-none 2>&1 | tee /tmp/clippy-warnings.txt
|
||||
|
||||
# Review warnings
|
||||
grep warning /tmp/build-warnings.txt
|
||||
less /tmp/clippy-warnings.txt
|
||||
|
||||
# Fix all warnings before committing!
|
||||
```
|
||||
|
||||
### Common Warning Fixes
|
||||
|
||||
| Warning | Fix |
|
||||
|---------|-----|
|
||||
| unused import | Remove from use statement |
|
||||
| unused variable | Prefix with _ or remove |
|
||||
| dead code | Remove or add #[allow(dead_code)] for API |
|
||||
| redundant closure | Allow via RUSTFLAGS or fix |
|
||||
| print_stdout | Replace print! with log::info! |
|
||||
| debug_assert mutation | Extract to separate statement |
|
||||
|
||||
## Integration with Git Workflow
|
||||
|
||||
```bash
|
||||
# Before committing
|
||||
git status # See what you're about to commit
|
||||
cd kernel
|
||||
cargo clippy --target x86_64-unknown-none # Fix all warnings
|
||||
|
||||
# Then commit
|
||||
git add kernel/src/...
|
||||
git commit -m "Fix: ..."
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Fix warnings as you go**: Don't accumulate them
|
||||
2. **Run clippy frequently**: Catch issues early
|
||||
3. **Use proper logging**: log! macros, not print!
|
||||
4. **Avoid side-effects in logs**: Especially in trace/debug
|
||||
5. **Comment allowed dead code**: Explain why it's part of the API
|
||||
6. **Use feature flags**: Control debug vs release behavior
|
||||
7. **Test before committing**: cargo test if touching core code
|
||||
|
||||
## Summary
|
||||
|
||||
Code quality standards enforce:
|
||||
- Zero compiler warnings
|
||||
- Zero clippy warnings
|
||||
- No side-effects in log statements
|
||||
- Appropriate use of #[allow] attributes
|
||||
- Proper logging practices
|
||||
|
||||
Run checks before every commit to maintain high code quality.
|
||||
221
skills/breenix-kernel-debug-loop/SKILL.md
Normal file
221
skills/breenix-kernel-debug-loop/SKILL.md
Normal file
@@ -0,0 +1,221 @@
|
||||
---
|
||||
name: kernel-debug-loop
|
||||
description: This skill should be used when performing fast iterative kernel debugging, running time-bound kernel sessions to detect specific log signals or test kernel behavior. Use for rapid feedback cycles during kernel development, boot sequence analysis, or feature verification.
|
||||
---
|
||||
|
||||
# Kernel Debug Loop
|
||||
|
||||
Fast iterative kernel debugging with signal detection and time-bounded execution.
|
||||
|
||||
## Purpose
|
||||
|
||||
This skill provides a rapid feedback loop for kernel development by running the Breenix kernel for short, time-bounded sessions (default 15 seconds) while monitoring logs in real-time for specific signals. The kernel terminates immediately when the expected signal is detected, or when the timeout expires, enabling fast iteration cycles during debugging.
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Use this skill when:
|
||||
|
||||
- **Iterative debugging**: Testing kernel changes with quick feedback loops
|
||||
- **Boot sequence analysis**: Verifying the kernel reaches specific initialization checkpoints
|
||||
- **Signal detection**: Waiting for specific kernel log messages before proceeding
|
||||
- **Behavior verification**: Confirming the kernel responds correctly to tests or inputs
|
||||
- **Fast failure detection**: Identifying boot failures or hangs quickly without waiting for full timeout
|
||||
- **Checkpoint validation**: Ensuring the kernel reaches expected states during execution
|
||||
|
||||
## How to Use
|
||||
|
||||
### Basic Usage
|
||||
|
||||
The skill provides the `quick_debug.py` script for time-bounded kernel runs with optional signal detection.
|
||||
|
||||
**Run kernel with signal detection:**
|
||||
|
||||
```bash
|
||||
kernel-debug-loop/scripts/quick_debug.py --signal "KERNEL_INITIALIZED"
|
||||
```
|
||||
|
||||
This runs the kernel for up to 15 seconds, terminating immediately when "KERNEL_INITIALIZED" appears in the logs.
|
||||
|
||||
**Run kernel with custom timeout:**
|
||||
|
||||
```bash
|
||||
kernel-debug-loop/scripts/quick_debug.py --timeout 30
|
||||
```
|
||||
|
||||
Runs the kernel for up to 30 seconds without specific signal detection.
|
||||
|
||||
**Run in BIOS mode:**
|
||||
|
||||
```bash
|
||||
kernel-debug-loop/scripts/quick_debug.py --signal "Boot complete" --mode bios
|
||||
```
|
||||
|
||||
**Quiet mode (kernel output only):**
|
||||
|
||||
```bash
|
||||
kernel-debug-loop/scripts/quick_debug.py --signal "READY" --quiet
|
||||
```
|
||||
|
||||
Suppresses progress messages, showing only kernel output.
|
||||
|
||||
### Common Signals to Watch For
|
||||
|
||||
Based on Breenix's test infrastructure, common signals include:
|
||||
|
||||
- `🎯 KERNEL_POST_TESTS_COMPLETE 🎯` - All runtime tests completed
|
||||
- `KERNEL_INITIALIZED` - Basic kernel initialization complete
|
||||
- `USER_PROCESS_STARTED` - User process execution began
|
||||
- `MEMORY_MANAGER_READY` - Memory management subsystem initialized
|
||||
- Custom checkpoint markers added for specific debugging needs
|
||||
|
||||
### Workflow Patterns
|
||||
|
||||
#### Pattern 1: Fast Iteration During Development
|
||||
|
||||
When making changes to kernel initialization:
|
||||
|
||||
1. Make code change
|
||||
2. Run: `kernel-debug-loop/scripts/quick_debug.py --signal "TARGET_CHECKPOINT" --timeout 10`
|
||||
3. Verify signal appears or analyze why it didn't
|
||||
4. Iterate
|
||||
|
||||
This provides feedback in ~10-15 seconds instead of waiting for full kernel execution or manual termination.
|
||||
|
||||
#### Pattern 2: Boot Sequence Verification
|
||||
|
||||
When debugging boot issues:
|
||||
|
||||
1. Identify the checkpoint expected to be reached
|
||||
2. Run with that checkpoint as the signal
|
||||
3. If timeout occurs, the kernel failed to reach that point
|
||||
4. Examine the output buffer to see how far boot progressed
|
||||
5. Add intermediate checkpoints to narrow down the failure point
|
||||
|
||||
#### Pattern 3: Regression Testing
|
||||
|
||||
When verifying fixes:
|
||||
|
||||
1. Run with the signal that was previously failing to appear
|
||||
2. Success (signal found) confirms the fix worked
|
||||
3. Failure (timeout) indicates the issue persists
|
||||
4. The output buffer contains diagnostic information
|
||||
|
||||
#### Pattern 4: Performance Checkpoint Analysis
|
||||
|
||||
When optimizing boot time:
|
||||
|
||||
1. Run with a specific checkpoint signal
|
||||
2. Note the elapsed time when signal is found
|
||||
3. Make optimization changes
|
||||
4. Re-run to measure improvement
|
||||
5. The script reports exact elapsed time for comparison
|
||||
|
||||
### Integration with Claude Workflows
|
||||
|
||||
When assisting with kernel debugging:
|
||||
|
||||
1. **Suggest checkpoints**: Recommend adding strategic log markers at key points
|
||||
2. **Run quick tests**: Use this script to verify changes before full test suite
|
||||
3. **Analyze output**: Parse the output buffer to diagnose issues
|
||||
4. **Iterate rapidly**: Chain multiple quick debug runs to test hypotheses
|
||||
5. **Report findings**: Summarize what signals were found and timing information
|
||||
|
||||
### Script Output
|
||||
|
||||
The script provides:
|
||||
|
||||
- **Real-time kernel output**: All kernel logs stream to stdout during execution
|
||||
- **Status indicators**: Visual feedback on signal detection and timeout
|
||||
- **Session summary**: Success/failure status, timing, and output statistics
|
||||
- **Exit code**: 0 if signal found (or no signal specified), 1 if timeout without signal
|
||||
|
||||
### Output Buffer Analysis
|
||||
|
||||
After a debug session, the entire kernel output is available for analysis:
|
||||
|
||||
- Search for error messages or warnings
|
||||
- Verify initialization sequence order
|
||||
- Check memory allocation patterns
|
||||
- Analyze interrupt handling
|
||||
- Examine test results
|
||||
|
||||
### Advanced Usage
|
||||
|
||||
**Multiple checkpoint verification:**
|
||||
|
||||
Run sequential sessions to verify a series of checkpoints:
|
||||
|
||||
```bash
|
||||
for signal in "PHASE1" "PHASE2" "PHASE3"; do
|
||||
kernel-debug-loop/scripts/quick_debug.py --signal "$signal" --quiet || break
|
||||
done
|
||||
```
|
||||
|
||||
**Capture output for analysis:**
|
||||
|
||||
```bash
|
||||
kernel-debug-loop/scripts/quick_debug.py --signal "READY" > kernel_output.log 2>&1
|
||||
```
|
||||
|
||||
**Integration with test scripts:**
|
||||
|
||||
```python
|
||||
import subprocess
|
||||
|
||||
result = subprocess.run(
|
||||
['kernel-debug-loop/scripts/quick_debug.py', '--signal', 'TEST_COMPLETE'],
|
||||
capture_output=True,
|
||||
text=True
|
||||
)
|
||||
|
||||
if result.returncode == 0:
|
||||
print("Test passed!")
|
||||
else:
|
||||
print("Test failed or timed out")
|
||||
analyze_output(result.stdout)
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Add strategic checkpoints**: Insert log markers at key kernel execution points
|
||||
2. **Use descriptive signals**: Make signal patterns unique and meaningful
|
||||
3. **Set appropriate timeouts**: Balance between waiting long enough and fast iteration
|
||||
4. **Check exit codes**: Use return codes in scripts for automation
|
||||
5. **Save output for analysis**: Redirect output when debugging complex issues
|
||||
6. **Start broad, narrow down**: If a checkpoint isn't reached, add earlier checkpoints
|
||||
7. **Combine with full tests**: Use for quick iteration, then validate with full test suite
|
||||
|
||||
## Technical Details
|
||||
|
||||
- **Timeout**: Default 15 seconds, configurable via `--timeout`
|
||||
- **Signal detection**: Performs substring matching on each output line
|
||||
- **Termination**: Graceful SIGTERM followed by SIGKILL if needed
|
||||
- **Output buffering**: Line-buffered for real-time display
|
||||
- **Exit codes**: 0 for success (signal found or no signal specified), 1 for timeout/failure
|
||||
|
||||
## Examples
|
||||
|
||||
**Verify kernel reaches user mode:**
|
||||
|
||||
```bash
|
||||
kernel-debug-loop/scripts/quick_debug.py --signal "USER_PROCESS_STARTED" --timeout 20
|
||||
```
|
||||
|
||||
**Quick sanity check after changes:**
|
||||
|
||||
```bash
|
||||
kernel-debug-loop/scripts/quick_debug.py --timeout 5
|
||||
```
|
||||
|
||||
**Debug memory initialization:**
|
||||
|
||||
```bash
|
||||
kernel-debug-loop/scripts/quick_debug.py --signal "MEMORY_MANAGER_READY" --quiet > mem_init.log
|
||||
```
|
||||
|
||||
**Test both UEFI and BIOS modes:**
|
||||
|
||||
```bash
|
||||
kernel-debug-loop/scripts/quick_debug.py --signal "BOOT_COMPLETE" --mode uefi
|
||||
kernel-debug-loop/scripts/quick_debug.py --signal "BOOT_COMPLETE" --mode bios
|
||||
```
|
||||
206
skills/breenix-kernel-debug-loop/scripts/quick_debug.py
Executable file
206
skills/breenix-kernel-debug-loop/scripts/quick_debug.py
Executable file
@@ -0,0 +1,206 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Fast kernel debug loop with signal detection.
|
||||
|
||||
Runs the Breenix kernel for up to a specified timeout (default 15s),
|
||||
monitoring logs in real-time for specific signals. Terminates immediately
|
||||
when the signal is found or when the timeout expires.
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import os
|
||||
import subprocess
|
||||
import sys
|
||||
import time
|
||||
import signal as sig
|
||||
from pathlib import Path
|
||||
from datetime import datetime
|
||||
import select
|
||||
|
||||
|
||||
class DebugSession:
|
||||
def __init__(self, signal_pattern=None, timeout=15, mode="uefi", quiet=False):
|
||||
self.signal_pattern = signal_pattern
|
||||
self.timeout = timeout
|
||||
self.mode = mode
|
||||
self.quiet = quiet
|
||||
self.process = None
|
||||
self.output_buffer = []
|
||||
self.signal_found = False
|
||||
self.start_time = None
|
||||
|
||||
def run(self):
|
||||
"""Execute the debug session."""
|
||||
project_root = Path(__file__).parent.parent.parent.resolve()
|
||||
|
||||
# Determine the cargo command
|
||||
if self.mode == "bios":
|
||||
cmd = ["cargo", "run", "--release", "--features", "testing",
|
||||
"--bin", "qemu-bios", "--", "-serial", "stdio", "-display", "none"]
|
||||
else:
|
||||
cmd = ["cargo", "run", "--release", "--features", "testing",
|
||||
"--bin", "qemu-uefi", "--", "-serial", "stdio", "-display", "none"]
|
||||
|
||||
if not self.quiet:
|
||||
print(f"🔍 Starting kernel debug session ({self.timeout}s timeout)", file=sys.stderr)
|
||||
if self.signal_pattern:
|
||||
print(f" Watching for signal: {self.signal_pattern}", file=sys.stderr)
|
||||
print("", file=sys.stderr)
|
||||
|
||||
self.start_time = time.time()
|
||||
|
||||
# Start the process
|
||||
self.process = subprocess.Popen(
|
||||
cmd,
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.STDOUT,
|
||||
cwd=project_root,
|
||||
text=True,
|
||||
bufsize=1, # Line buffered
|
||||
)
|
||||
|
||||
# Set up signal handler for clean termination
|
||||
sig.signal(sig.SIGINT, self._signal_handler)
|
||||
sig.signal(sig.SIGTERM, self._signal_handler)
|
||||
|
||||
try:
|
||||
self._monitor_output()
|
||||
finally:
|
||||
self._cleanup()
|
||||
|
||||
return self._generate_report()
|
||||
|
||||
def _monitor_output(self):
|
||||
"""Monitor process output in real-time."""
|
||||
while True:
|
||||
# Check timeout
|
||||
elapsed = time.time() - self.start_time
|
||||
if elapsed >= self.timeout:
|
||||
if not self.quiet:
|
||||
print(f"\n⏱️ Timeout reached ({self.timeout}s)", file=sys.stderr)
|
||||
break
|
||||
|
||||
# Check if process is still running
|
||||
if self.process.poll() is not None:
|
||||
# Process terminated, read any remaining output
|
||||
remaining = self.process.stdout.read()
|
||||
if remaining:
|
||||
for line in remaining.splitlines():
|
||||
self._process_line(line)
|
||||
break
|
||||
|
||||
# Read line with timeout
|
||||
line = self.process.stdout.readline()
|
||||
if line:
|
||||
line = line.rstrip('\n')
|
||||
self._process_line(line)
|
||||
|
||||
# Check for signal
|
||||
if self.signal_pattern and self.signal_pattern in line:
|
||||
self.signal_found = True
|
||||
if not self.quiet:
|
||||
print(f"\n✅ Signal found: {self.signal_pattern}", file=sys.stderr)
|
||||
break
|
||||
else:
|
||||
# Small sleep to prevent busy waiting
|
||||
time.sleep(0.01)
|
||||
|
||||
def _process_line(self, line):
|
||||
"""Process a single line of output."""
|
||||
self.output_buffer.append(line)
|
||||
if not self.quiet:
|
||||
print(line)
|
||||
|
||||
def _cleanup(self):
|
||||
"""Clean up the subprocess."""
|
||||
if self.process and self.process.poll() is None:
|
||||
if not self.quiet:
|
||||
print("\n🛑 Terminating kernel...", file=sys.stderr)
|
||||
|
||||
# Try graceful termination first
|
||||
self.process.terminate()
|
||||
try:
|
||||
self.process.wait(timeout=2)
|
||||
except subprocess.TimeoutExpired:
|
||||
# Force kill if needed
|
||||
self.process.kill()
|
||||
self.process.wait()
|
||||
|
||||
def _signal_handler(self, signum, frame):
|
||||
"""Handle interrupt signals."""
|
||||
if not self.quiet:
|
||||
print("\n\n⚠️ Interrupted by user", file=sys.stderr)
|
||||
self._cleanup()
|
||||
sys.exit(1)
|
||||
|
||||
def _generate_report(self):
|
||||
"""Generate a debug report from the session."""
|
||||
elapsed = time.time() - self.start_time
|
||||
|
||||
report = {
|
||||
'success': self.signal_found if self.signal_pattern else True,
|
||||
'signal_found': self.signal_found,
|
||||
'signal_pattern': self.signal_pattern,
|
||||
'elapsed_time': elapsed,
|
||||
'timeout': self.timeout,
|
||||
'output_lines': len(self.output_buffer),
|
||||
'output': '\n'.join(self.output_buffer),
|
||||
}
|
||||
|
||||
if not self.quiet:
|
||||
print("\n" + "="*60, file=sys.stderr)
|
||||
print("📊 Debug Session Summary", file=sys.stderr)
|
||||
print("="*60, file=sys.stderr)
|
||||
print(f"Status: {'✅ SUCCESS' if report['success'] else '❌ TIMEOUT'}", file=sys.stderr)
|
||||
if self.signal_pattern:
|
||||
print(f"Signal: {'Found' if self.signal_found else 'Not found'}", file=sys.stderr)
|
||||
print(f"Time: {elapsed:.2f}s / {self.timeout}s", file=sys.stderr)
|
||||
print(f"Output lines: {len(self.output_buffer)}", file=sys.stderr)
|
||||
print("="*60, file=sys.stderr)
|
||||
|
||||
return report
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description='Fast kernel debug loop with signal detection'
|
||||
)
|
||||
parser.add_argument(
|
||||
'--signal',
|
||||
help='Signal pattern to watch for in kernel output'
|
||||
)
|
||||
parser.add_argument(
|
||||
'--timeout',
|
||||
type=float,
|
||||
default=15.0,
|
||||
help='Maximum time to run (seconds, default: 15)'
|
||||
)
|
||||
parser.add_argument(
|
||||
'--mode',
|
||||
choices=['uefi', 'bios'],
|
||||
default='uefi',
|
||||
help='Boot mode (default: uefi)'
|
||||
)
|
||||
parser.add_argument(
|
||||
'--quiet',
|
||||
action='store_true',
|
||||
help='Suppress progress output, only show kernel output'
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
session = DebugSession(
|
||||
signal_pattern=args.signal,
|
||||
timeout=args.timeout,
|
||||
mode=args.mode,
|
||||
quiet=args.quiet
|
||||
)
|
||||
|
||||
report = session.run()
|
||||
|
||||
# Exit with appropriate code
|
||||
sys.exit(0 if report['success'] else 1)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
480
skills/breenix-legacy-migration/SKILL.md
Normal file
480
skills/breenix-legacy-migration/SKILL.md
Normal file
@@ -0,0 +1,480 @@
|
||||
---
|
||||
name: legacy-migration
|
||||
description: This skill should be used when migrating features from src.legacy/ to the new kernel implementation or removing legacy code after reaching feature parity. Use for systematic legacy code removal, updating FEATURE_COMPARISON.md, verifying feature equivalence, and ensuring safe code retirement.
|
||||
---
|
||||
|
||||
# Legacy Code Migration for Breenix
|
||||
|
||||
Systematically migrate features from legacy kernel and remove old code when parity is reached.
|
||||
|
||||
## Purpose
|
||||
|
||||
Breenix is transitioning from a legacy kernel (src.legacy/) to a modern implementation (kernel/). This skill provides patterns for safely migrating features, verifying parity, and removing legacy code.
|
||||
|
||||
## When to Use
|
||||
|
||||
- **Feature migration**: Porting legacy features to new kernel
|
||||
- **Parity verification**: Confirming new implementation matches legacy behavior
|
||||
- **Legacy removal**: Safely removing old code after feature completion
|
||||
- **Documentation updates**: Keeping FEATURE_COMPARISON.md current
|
||||
- **Risk assessment**: Evaluating what can be safely removed
|
||||
|
||||
## Legacy Migration Principle (from CLAUDE.md)
|
||||
|
||||
```
|
||||
When new implementation reaches parity:
|
||||
1. Remove code from src.legacy/
|
||||
2. Update FEATURE_COMPARISON.md
|
||||
3. Include removal in same commit as feature completion
|
||||
```
|
||||
|
||||
**Key Point**: Don't accumulate dead code. Remove legacy as soon as parity is reached.
|
||||
|
||||
## Migration Workflow
|
||||
|
||||
### Phase 1: Identify Feature for Migration
|
||||
|
||||
**Review FEATURE_COMPARISON.md:**
|
||||
|
||||
```bash
|
||||
# See what's in legacy but not new
|
||||
cat docs/planning/legacy-migration/FEATURE_COMPARISON.md | grep "❌"
|
||||
|
||||
# See what's partially implemented
|
||||
cat docs/planning/legacy-migration/FEATURE_COMPARISON.md | grep "🚧"
|
||||
```
|
||||
|
||||
**Common patterns:**
|
||||
- ✅ Fully implemented (safe to remove if in both)
|
||||
- 🚧 Partially implemented (needs work)
|
||||
- ❌ Not implemented (needs migration or decision)
|
||||
- 🔄 Different approach (verify equivalence)
|
||||
|
||||
### Phase 2: Analyze Legacy Implementation
|
||||
|
||||
**Locate the legacy code:**
|
||||
|
||||
```bash
|
||||
# Find legacy implementation
|
||||
find src.legacy -name "*feature_name*"
|
||||
|
||||
# Search for specific functionality
|
||||
grep -r "feature_function" src.legacy/
|
||||
```
|
||||
|
||||
**Understand the implementation:**
|
||||
1. What does it do? (API, behavior, edge cases)
|
||||
2. Why does it exist? (requirements it satisfies)
|
||||
3. How does it work? (algorithm, data structures)
|
||||
4. What depends on it? (other modules, tests)
|
||||
|
||||
**Extract key characteristics:**
|
||||
- Public API surface
|
||||
- Critical behavior
|
||||
- Edge case handling
|
||||
- Error conditions
|
||||
- Test coverage
|
||||
|
||||
### Phase 3: Implement in New Kernel
|
||||
|
||||
**Follow Breenix standards:**
|
||||
|
||||
```rust
|
||||
// 1. Add to appropriate module in kernel/src/
|
||||
// 2. Use modern Rust patterns
|
||||
// 3. Add #[cfg(feature = "testing")] for test code
|
||||
// 4. Write comprehensive tests
|
||||
// 5. Document with clear comments
|
||||
```
|
||||
|
||||
**Quality checklist:**
|
||||
- [ ] Matches legacy API (if public)
|
||||
- [ ] Handles all edge cases
|
||||
- [ ] Error handling implemented
|
||||
- [ ] Tests written and passing
|
||||
- [ ] Documentation complete
|
||||
- [ ] No compiler warnings
|
||||
- [ ] Clippy clean
|
||||
|
||||
### Phase 4: Verify Parity
|
||||
|
||||
**Functional equivalence:**
|
||||
|
||||
```bash
|
||||
# Run tests for the feature
|
||||
cargo test feature_name
|
||||
|
||||
# Check behavior matches legacy
|
||||
# (Compare outputs, test edge cases)
|
||||
|
||||
# Run full test suite
|
||||
cargo test
|
||||
```
|
||||
|
||||
**API compatibility:**
|
||||
- If API is public: Must match exactly
|
||||
- If internal: Can improve design
|
||||
- Document any intentional differences
|
||||
|
||||
**Behavioral parity checklist:**
|
||||
- [ ] Same inputs produce same outputs
|
||||
- [ ] Edge cases handled identically
|
||||
- [ ] Error conditions match
|
||||
- [ ] Performance acceptable
|
||||
- [ ] Integration with other subsystems works
|
||||
|
||||
### Phase 5: Update Documentation
|
||||
|
||||
**Update FEATURE_COMPARISON.md:**
|
||||
|
||||
```markdown
|
||||
### Feature Category
|
||||
| Feature | Legacy | New | Notes |
|
||||
|---------|--------|-----|-------|
|
||||
| Feature X | ~~✅ Full~~ (removed) | ✅ | Migrated in PR #123, legacy removed |
|
||||
```
|
||||
|
||||
**Patterns:**
|
||||
- Change legacy column to `~~✅ Full~~ (removed)`
|
||||
- Update new column to ✅
|
||||
- Add note about migration PR
|
||||
- Include date if significant
|
||||
|
||||
**Document any differences:**
|
||||
|
||||
```markdown
|
||||
## Implementation Differences
|
||||
|
||||
### Feature X
|
||||
- **Legacy**: Used approach A
|
||||
- **New**: Uses approach B (reason)
|
||||
- **Rationale**: Cleaner design, better performance, etc.
|
||||
```
|
||||
|
||||
### Phase 6: Remove Legacy Code
|
||||
|
||||
**In the SAME commit as feature completion:**
|
||||
|
||||
```bash
|
||||
# Remove the legacy files
|
||||
git rm src.legacy/path/to/feature.rs
|
||||
|
||||
# Or if removing entire module
|
||||
git rm -r src.legacy/module/
|
||||
|
||||
# Stage FEATURE_COMPARISON.md changes
|
||||
git add docs/planning/legacy-migration/FEATURE_COMPARISON.md
|
||||
|
||||
# Commit together
|
||||
git commit -m "Complete Feature X implementation and remove legacy
|
||||
|
||||
- Implement Feature X in kernel/src/module/feature.rs
|
||||
- Full parity with legacy implementation
|
||||
- Remove legacy code from src.legacy/
|
||||
- Update FEATURE_COMPARISON.md
|
||||
|
||||
Tested with: cargo test feature_x
|
||||
"
|
||||
```
|
||||
|
||||
**Critical**: Legacy removal MUST be in the same commit to maintain atomicity.
|
||||
|
||||
## Legacy Code Categories
|
||||
|
||||
### 1. Direct Migration
|
||||
|
||||
**What**: Feature can be ported directly with minimal changes
|
||||
|
||||
**Example**: VGA text mode removed after framebuffer complete
|
||||
|
||||
**Process**:
|
||||
1. Understand legacy implementation
|
||||
2. Port to new codebase
|
||||
3. Test thoroughly
|
||||
4. Remove legacy
|
||||
5. Update docs
|
||||
|
||||
### 2. Reimplementation
|
||||
|
||||
**What**: New approach taken, but achieves same goals
|
||||
|
||||
**Example**: Timer system (different RTC implementation)
|
||||
|
||||
**Process**:
|
||||
1. Identify requirements from legacy
|
||||
2. Design new approach
|
||||
3. Implement with modern patterns
|
||||
4. Verify equivalent behavior
|
||||
5. Remove legacy
|
||||
6. Document differences
|
||||
|
||||
### 3. Obsolete Features
|
||||
|
||||
**What**: Feature no longer needed or superseded
|
||||
|
||||
**Example**: VGA text after framebuffer works
|
||||
|
||||
**Process**:
|
||||
1. Verify feature truly obsolete
|
||||
2. Check no dependencies
|
||||
3. Remove from legacy
|
||||
4. Update FEATURE_COMPARISON.md with rationale
|
||||
|
||||
### 4. Deferred Features
|
||||
|
||||
**What**: Features not yet needed in new kernel
|
||||
|
||||
**Example**: Network stack (not current priority)
|
||||
|
||||
**Process**:
|
||||
1. Document decision to defer
|
||||
2. Mark as ❌ in FEATURE_COMPARISON.md
|
||||
3. Leave in legacy as reference
|
||||
4. Add to future roadmap
|
||||
|
||||
## Common Migration Patterns
|
||||
|
||||
### Pattern: Device Driver
|
||||
|
||||
```rust
|
||||
// Legacy: src.legacy/drivers/device_x.rs
|
||||
// New: kernel/src/drivers/device_x.rs
|
||||
|
||||
// 1. Port driver structure
|
||||
pub struct DeviceX {
|
||||
// ... fields
|
||||
}
|
||||
|
||||
// 2. Port initialization
|
||||
impl DeviceX {
|
||||
pub fn new() -> Self { ... }
|
||||
}
|
||||
|
||||
// 3. Port public API
|
||||
impl DeviceX {
|
||||
pub fn operation(&mut self) { ... }
|
||||
}
|
||||
|
||||
// 4. Add tests
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
#[test]
|
||||
fn test_device_x() { ... }
|
||||
}
|
||||
```
|
||||
|
||||
### Pattern: System Call
|
||||
|
||||
```rust
|
||||
// Legacy: src.legacy/syscall/handler.rs (mostly commented out)
|
||||
// New: kernel/src/syscall/handler.rs (full implementation)
|
||||
|
||||
// 1. Define syscall number
|
||||
pub const SYS_FEATURE: u64 = N;
|
||||
|
||||
// 2. Add to dispatcher
|
||||
pub fn syscall_handler(num: u64, args: ...) {
|
||||
match num {
|
||||
SYS_FEATURE => sys_feature(args),
|
||||
// ...
|
||||
}
|
||||
}
|
||||
|
||||
// 3. Implement handler
|
||||
fn sys_feature(args: ...) -> u64 {
|
||||
// Implementation
|
||||
}
|
||||
|
||||
// 4. Test from userspace
|
||||
// userspace/tests/feature_test.rs
|
||||
```
|
||||
|
||||
### Pattern: Infrastructure
|
||||
|
||||
```rust
|
||||
// Legacy: Multiple files implementing async
|
||||
// New: Consolidated in kernel/src/task/
|
||||
|
||||
// 1. Analyze legacy architecture
|
||||
// 2. Design improved structure
|
||||
// 3. Implement with better patterns
|
||||
// 4. Migrate tests
|
||||
// 5. Document improvements
|
||||
```
|
||||
|
||||
## Risk Assessment
|
||||
|
||||
Before removing legacy code, assess:
|
||||
|
||||
### High Risk (Don't Remove Yet)
|
||||
- Features not yet implemented in new kernel
|
||||
- Complex subsystems (network, filesystem)
|
||||
- Code with unique algorithms or logic
|
||||
- Reference implementations for future work
|
||||
|
||||
### Medium Risk (Remove with Caution)
|
||||
- Features with partial new implementation
|
||||
- Code with subtle edge cases
|
||||
- Infrastructure with many dependencies
|
||||
|
||||
### Low Risk (Safe to Remove)
|
||||
- Features fully implemented and tested
|
||||
- Obsolete approaches (VGA text mode)
|
||||
- Dead code (never called)
|
||||
- Superseded implementations
|
||||
|
||||
## Integration with Development
|
||||
|
||||
### During Feature Development
|
||||
|
||||
```bash
|
||||
# 1. Check if legacy has this feature
|
||||
grep -r "feature_name" src.legacy/
|
||||
|
||||
# 2. If found, analyze it
|
||||
less src.legacy/path/to/feature.rs
|
||||
|
||||
# 3. Implement in new kernel
|
||||
# ... development work ...
|
||||
|
||||
# 4. Test thoroughly
|
||||
cargo test feature_name
|
||||
|
||||
# 5. Remove legacy in same commit
|
||||
git rm src.legacy/path/to/feature.rs
|
||||
|
||||
# 6. Update FEATURE_COMPARISON.md
|
||||
# ... edit ...
|
||||
|
||||
# 7. Commit together
|
||||
git commit -m "Implement feature_name and remove legacy"
|
||||
```
|
||||
|
||||
### PR Review Checklist
|
||||
|
||||
When reviewing PRs that claim feature parity:
|
||||
|
||||
- [ ] New implementation tested
|
||||
- [ ] Legacy code removed
|
||||
- [ ] FEATURE_COMPARISON.md updated
|
||||
- [ ] All changes in one atomic commit
|
||||
- [ ] No regression in related features
|
||||
- [ ] Documentation complete
|
||||
|
||||
## Current Migration Status
|
||||
|
||||
Based on FEATURE_COMPARISON.md (as of latest):
|
||||
|
||||
**Completed Migrations:**
|
||||
- Memory management (frame allocator, paging, heap) ✅
|
||||
- Async executor and task management ✅
|
||||
- Timer system (PIT + RTC) ✅
|
||||
- Keyboard driver ✅
|
||||
- Serial output ✅
|
||||
- Test infrastructure ✅
|
||||
- Syscall infrastructure ✅
|
||||
- Fork/exec system calls ✅
|
||||
|
||||
**Not Yet Migrated:**
|
||||
- Network drivers (Intel E1000, RTL8139) ❌
|
||||
- PCI bus support ❌
|
||||
- Interrupt statistics tracking ❌
|
||||
- Event system ❌
|
||||
|
||||
**Different Approach:**
|
||||
- Print macros (log system vs direct print) 🔄
|
||||
- Display (framebuffer vs VGA text) 🔄
|
||||
|
||||
## Special Cases
|
||||
|
||||
### When Legacy Has Better Implementation
|
||||
|
||||
**Scenario**: Legacy code is actually better designed
|
||||
|
||||
**Action**:
|
||||
1. Port legacy approach to new kernel
|
||||
2. Improve if possible
|
||||
3. Remove legacy
|
||||
4. Document that you used legacy as reference
|
||||
|
||||
### When API Must Change
|
||||
|
||||
**Scenario**: Legacy API is poor, new needs different design
|
||||
|
||||
**Action**:
|
||||
1. Design better API
|
||||
2. Document differences in FEATURE_COMPARISON.md
|
||||
3. Explain rationale in commit message
|
||||
4. Remove legacy
|
||||
|
||||
### When Uncertain
|
||||
|
||||
**Scenario**: Not sure if new implementation is equivalent
|
||||
|
||||
**Action**:
|
||||
1. Write comprehensive tests
|
||||
2. Compare outputs on same inputs
|
||||
3. Ask for review
|
||||
4. Document any known differences
|
||||
5. Only remove legacy when confident
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Remove in same commit**: Legacy removal with feature completion
|
||||
2. **Update docs immediately**: Don't accumulate documentation debt
|
||||
3. **Test thoroughly**: Verify parity before removing legacy
|
||||
4. **Document differences**: Explain any intentional changes
|
||||
5. **Keep reference**: For complex features, document algorithm before removing
|
||||
6. **Atomic operations**: Feature + removal + docs in one commit
|
||||
7. **Review carefully**: PRs that remove legacy need extra scrutiny
|
||||
|
||||
## Example Migration Session
|
||||
|
||||
```bash
|
||||
# Identify target feature
|
||||
cat docs/planning/legacy-migration/FEATURE_COMPARISON.md | grep "❌"
|
||||
|
||||
# Found: Event system not yet implemented
|
||||
|
||||
# Analyze legacy
|
||||
less src.legacy/events/mod.rs
|
||||
grep -r "Event" src.legacy/
|
||||
|
||||
# Implement in new kernel
|
||||
# ... create kernel/src/events/mod.rs ...
|
||||
# ... write tests ...
|
||||
|
||||
# Verify
|
||||
cargo test events
|
||||
|
||||
# Remove legacy and update docs
|
||||
git rm -r src.legacy/events/
|
||||
# Edit FEATURE_COMPARISON.md
|
||||
|
||||
# Commit atomically
|
||||
git add kernel/src/events/ tests/test_events.rs
|
||||
git add docs/planning/legacy-migration/FEATURE_COMPARISON.md
|
||||
git commit -m "Implement event system and remove legacy
|
||||
|
||||
- Add event system in kernel/src/events/
|
||||
- Full parity with legacy implementation
|
||||
- Enhanced with better error handling
|
||||
- Remove src.legacy/events/
|
||||
- Update FEATURE_COMPARISON.md
|
||||
|
||||
Tested with: cargo test events
|
||||
All tests passing, no regressions.
|
||||
"
|
||||
```
|
||||
|
||||
## Summary
|
||||
|
||||
Legacy code migration requires:
|
||||
- Systematic analysis of legacy implementation
|
||||
- Full parity verification with tests
|
||||
- Atomic commits (feature + removal + docs)
|
||||
- FEATURE_COMPARISON.md updates
|
||||
- Risk assessment before removal
|
||||
- Documentation of differences
|
||||
|
||||
The goal: Clean codebase with no dead code accumulation.
|
||||
236
skills/breenix-log-analysis/SKILL.md
Normal file
236
skills/breenix-log-analysis/SKILL.md
Normal file
@@ -0,0 +1,236 @@
|
||||
---
|
||||
name: log-analysis
|
||||
description: This skill should be used when analyzing Breenix kernel logs for debugging, testing verification, or understanding kernel behavior. Use for searching timestamped logs, finding checkpoint signals, tracing execution flow, identifying errors or panics, and extracting diagnostic information.
|
||||
---
|
||||
|
||||
# Kernel Log Analysis for Breenix
|
||||
|
||||
Search, analyze, and extract information from Breenix kernel logs for debugging and testing.
|
||||
|
||||
## Purpose
|
||||
|
||||
Breenix logs all kernel runs to `logs/breenix_YYYYMMDD_HHMMSS.log`. This skill provides patterns for searching these logs efficiently, finding checkpoint signals, tracing execution, and diagnosing issues.
|
||||
|
||||
## When to Use
|
||||
|
||||
- **Finding test signals**: Locate checkpoint markers like `🎯 KERNEL_POST_TESTS_COMPLETE 🎯`
|
||||
- **Tracing execution**: Follow kernel boot sequence or specific subsystem initialization
|
||||
- **Debugging failures**: Find panics, faults, or error messages
|
||||
- **Verifying behavior**: Confirm expected operations occurred
|
||||
- **Performance analysis**: Check timing of operations via log timestamps
|
||||
|
||||
## Log Location and Format
|
||||
|
||||
```bash
|
||||
# Logs stored in
|
||||
logs/breenix_YYYYMMDD_HHMMSS.log
|
||||
|
||||
# View latest log
|
||||
ls -t logs/*.log | head -1 | xargs less
|
||||
|
||||
# View specific log
|
||||
less logs/breenix_20250120_143022.log
|
||||
```
|
||||
|
||||
### Log Format
|
||||
```
|
||||
[ INFO] kernel::memory: Physical memory: 94 MiB usable
|
||||
[DEBUG] kernel::memory: Frame allocator initialized
|
||||
[ WARN] kernel::process: No processes ready
|
||||
```
|
||||
|
||||
Levels: `TRACE`, `DEBUG`, `INFO`, `WARN`, `ERROR`
|
||||
|
||||
## Search Using find-in-logs Script
|
||||
|
||||
The `scripts/find-in-logs` tool searches recent logs:
|
||||
|
||||
```bash
|
||||
# Create search query (avoids approval prompts)
|
||||
echo '-A50 "Creating user process"' > /tmp/log-query.txt
|
||||
./scripts/find-in-logs
|
||||
|
||||
# The script reads from /tmp/log-query.txt and searches logs
|
||||
```
|
||||
|
||||
### Common Search Patterns
|
||||
|
||||
```bash
|
||||
# Find panics
|
||||
echo '-i "panic"' > /tmp/log-query.txt
|
||||
./scripts/find-in-logs
|
||||
|
||||
# Find page faults
|
||||
echo '-i "page fault"' > /tmp/log-query.txt
|
||||
./scripts/find-in-logs
|
||||
|
||||
# Find context around checkpoint
|
||||
echo '-A20 -B10 "KERNEL_POST_TESTS_COMPLETE"' > /tmp/log-query.txt
|
||||
./scripts/find-in-logs
|
||||
|
||||
# Find process creation
|
||||
echo '"Creating user process"' > /tmp/log-query.txt
|
||||
./scripts/find-in-logs
|
||||
```
|
||||
|
||||
## Direct grep Usage
|
||||
|
||||
```bash
|
||||
# Find specific error
|
||||
grep -n "ERROR" logs/breenix_20250120_*.log
|
||||
|
||||
# Find with context
|
||||
grep -A10 -B5 "Double Fault" logs/breenix_20250120_*.log
|
||||
|
||||
# Case-insensitive search
|
||||
grep -i "memory" logs/breenix_20250120_*.log
|
||||
|
||||
# Multiple patterns
|
||||
grep -E "panic|fault|error" logs/breenix_20250120_*.log
|
||||
|
||||
# Count occurrences
|
||||
grep -c "Timer interrupt" logs/breenix_20250120_*.log
|
||||
```
|
||||
|
||||
## Common Checkpoint Signals
|
||||
|
||||
```bash
|
||||
# Test completion
|
||||
grep "🎯 KERNEL_POST_TESTS_COMPLETE 🎯" logs/*.log
|
||||
|
||||
# Userspace execution
|
||||
grep "USERSPACE OUTPUT:" logs/*.log
|
||||
grep "Hello from userspace" logs/*.log
|
||||
|
||||
# System calls
|
||||
grep "🎉 USERSPACE SYSCALL" logs/*.log
|
||||
|
||||
# Initialization checkpoints
|
||||
grep "initialized\|INITIALIZED" logs/*.log
|
||||
|
||||
# Process creation
|
||||
grep "Process created: PID" logs/*.log
|
||||
```
|
||||
|
||||
## Execution Flow Tracing
|
||||
|
||||
### Boot Sequence
|
||||
```bash
|
||||
# Full boot trace
|
||||
grep -E "Boot|GDT|IDT|PIC|Memory|Heap|Timer|Keyboard" logs/latest.log
|
||||
|
||||
# Memory subsystem only
|
||||
grep "memory\|page table\|frame allocator" logs/latest.log
|
||||
|
||||
# Process subsystem
|
||||
grep "process\|fork\|exec\|PID" logs/latest.log
|
||||
```
|
||||
|
||||
### Subsystem Analysis
|
||||
```bash
|
||||
# Timer subsystem
|
||||
grep -n "timer\|RTC\|tick" logs/latest.log
|
||||
|
||||
# Interrupt handling
|
||||
grep -n "interrupt\|IRQ\|IDT" logs/latest.log
|
||||
|
||||
# System calls
|
||||
grep -n "syscall\|sys_\|INT 0x80" logs/latest.log
|
||||
```
|
||||
|
||||
## Error and Fault Analysis
|
||||
|
||||
### Finding Faults
|
||||
```bash
|
||||
# Double faults
|
||||
grep -A20 "DOUBLE FAULT" logs/*.log
|
||||
|
||||
# Page faults
|
||||
grep -A10 "PAGE FAULT" logs/*.log
|
||||
|
||||
# General panics
|
||||
grep -B10 -A20 "PANIC" logs/*.log
|
||||
```
|
||||
|
||||
### Error Context
|
||||
```bash
|
||||
# Find errors with context
|
||||
grep -A15 -B5 "ERROR" logs/latest.log
|
||||
|
||||
# Find warnings that might indicate problems
|
||||
grep -A5 "WARN" logs/latest.log
|
||||
```
|
||||
|
||||
## Log Analysis Patterns
|
||||
|
||||
### Timeline Analysis
|
||||
```bash
|
||||
# Extract just log levels and messages for overview
|
||||
grep -E "\[(INFO|WARN|ERROR|DEBUG)\]" logs/latest.log | less
|
||||
|
||||
# Filter to specific subsystem
|
||||
grep "\[.*\] kernel::process:" logs/latest.log
|
||||
```
|
||||
|
||||
### Success/Failure Detection
|
||||
```bash
|
||||
# Check if test completed
|
||||
if grep -q "KERNEL_POST_TESTS_COMPLETE" logs/latest.log; then
|
||||
echo "Test completed"
|
||||
else
|
||||
echo "Test did not complete"
|
||||
# Find last successful checkpoint
|
||||
grep "SUCCESS\|initialized\|completed" logs/latest.log | tail -10
|
||||
fi
|
||||
```
|
||||
|
||||
### Performance Markers
|
||||
```bash
|
||||
# Find timing information
|
||||
grep -E "took|elapsed|ms|seconds" logs/latest.log
|
||||
|
||||
# Specific operations
|
||||
grep "Context switch\|schedule\|preempt" logs/latest.log
|
||||
```
|
||||
|
||||
## Integration with Other Skills
|
||||
|
||||
### With kernel-debug-loop
|
||||
```bash
|
||||
# Run quick test
|
||||
kernel-debug-loop/scripts/quick_debug.py --signal "TARGET_CHECKPOINT"
|
||||
|
||||
# Then analyze its output
|
||||
grep "TARGET_CHECKPOINT" logs/latest.log
|
||||
```
|
||||
|
||||
### With ci-failure-analysis
|
||||
```bash
|
||||
# Analyze CI logs
|
||||
ci-failure-analysis/scripts/analyze_ci_failure.py target/xtask_*_output.txt
|
||||
|
||||
# Then search for specific patterns found
|
||||
grep "PATTERN" target/xtask_*_output.txt
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Use specific patterns**: Narrow searches to relevant subsystems
|
||||
2. **Add context**: Use `-A` (after) and `-B` (before) flags
|
||||
3. **Check latest first**: `ls -t logs/*.log | head -1`
|
||||
4. **Save search queries**: Use `/tmp/log-query.txt` for complex patterns
|
||||
5. **Look for first error**: Often followed by cascading failures
|
||||
6. **Check initialization**: Ensure subsystems initialized before use
|
||||
7. **Verify checkpoints**: Confirm expected signals appear
|
||||
|
||||
## Summary
|
||||
|
||||
Effective log analysis requires:
|
||||
- Knowing checkpoint signals
|
||||
- Using grep with context flags
|
||||
- Understanding log levels and formats
|
||||
- Tracing execution flow
|
||||
- Finding first failures
|
||||
- Verifying expected behavior
|
||||
|
||||
Logs are the primary window into kernel behavior - use them liberally during development and debugging.
|
||||
550
skills/breenix-memory-debugging/SKILL.md
Normal file
550
skills/breenix-memory-debugging/SKILL.md
Normal file
@@ -0,0 +1,550 @@
|
||||
---
|
||||
name: memory-debugging
|
||||
description: This skill should be used when debugging memory-related issues in the Breenix kernel including page faults, double faults, frame allocation problems, page table issues, heap allocation failures, stack overflows, and virtual memory mapping errors.
|
||||
---
|
||||
|
||||
# Memory Debugging for Breenix
|
||||
|
||||
Debug kernel memory issues including page faults, allocator problems, and page table errors.
|
||||
|
||||
## Purpose
|
||||
|
||||
Memory bugs in kernel development are among the most difficult to debug. This skill provides systematic approaches for diagnosing page faults, double faults, allocator issues, and page table problems specific to Breenix.
|
||||
|
||||
## When to Use
|
||||
|
||||
- **Page faults**: Accessing unmapped or incorrectly mapped memory
|
||||
- **Double faults**: Stack issues or cascading exceptions
|
||||
- **Frame allocation failures**: Out of memory or allocator bugs
|
||||
- **Page table problems**: Wrong mappings, missing entries, incorrect flags
|
||||
- **Heap allocation issues**: OOM, corruption, leaks
|
||||
- **Stack overflows**: Exceeding stack size, missing guard pages
|
||||
- **Virtual address conflicts**: Multiple processes mapping same address
|
||||
|
||||
## Memory Subsystems in Breenix
|
||||
|
||||
### 1. Physical Memory (Frame Allocator)
|
||||
|
||||
**Location**: `kernel/src/memory/frame_allocator.rs`
|
||||
|
||||
**What it does**: Manages physical memory frames (4KB pages)
|
||||
|
||||
**Common issues**:
|
||||
- Running out of frames
|
||||
- Double allocation of same frame
|
||||
- Frames not freed properly
|
||||
- Initialization failures
|
||||
|
||||
**Debug approach**:
|
||||
```rust
|
||||
// Add logging to allocation/deallocation
|
||||
log::debug!("Allocating frame at {:?}", frame);
|
||||
log::debug!("Frame allocator: {} frames used", count);
|
||||
|
||||
// Check allocator state
|
||||
log::info!("Physical memory: {} MiB usable", memory_mb);
|
||||
```
|
||||
|
||||
### 2. Virtual Memory (Page Tables)
|
||||
|
||||
**Location**: `kernel/src/memory/` (process_memory.rs, kernel_page_table.rs)
|
||||
|
||||
**What it does**: Maps virtual addresses to physical frames
|
||||
|
||||
**Common issues**:
|
||||
- Missing page table entries
|
||||
- Wrong flags (PRESENT, WRITABLE, USER_ACCESSIBLE)
|
||||
- Shared page table entries causing conflicts
|
||||
- Kernel mappings not copied to process page tables
|
||||
|
||||
**Debug approach**:
|
||||
```rust
|
||||
// Log page table operations
|
||||
log::debug!("Mapping page {:?} to frame {:?} with flags {:?}",
|
||||
page, frame, flags);
|
||||
|
||||
// Verify mappings
|
||||
let result = page_table.translate_addr(addr);
|
||||
log::debug!("Address {:?} translates to {:?}", addr, result);
|
||||
```
|
||||
|
||||
### 3. Heap Allocator
|
||||
|
||||
**Location**: Uses Rust's `#[global_allocator]`
|
||||
|
||||
**Size**: 1024 KiB
|
||||
|
||||
**Common issues**:
|
||||
- Out of heap memory
|
||||
- Heap corruption
|
||||
- Allocations during early boot (before heap init)
|
||||
|
||||
**Debug approach**:
|
||||
```rust
|
||||
// Check heap size
|
||||
log::info!("Heap: 1024 KiB");
|
||||
|
||||
// Log allocations if needed
|
||||
// (Note: Can't use allocations in alloc functions!)
|
||||
```
|
||||
|
||||
### 4. Kernel Stacks
|
||||
|
||||
**Location**: `kernel/src/memory/kernel_stack.rs`
|
||||
|
||||
**Layout**: 8KB stacks with 4KB guard pages at `0xffffc900_0000_0000`
|
||||
|
||||
**Common issues**:
|
||||
- Stack overflow into guard page
|
||||
- Kernel stack not mapped in process page table
|
||||
- IST stack issues for double faults
|
||||
|
||||
**Debug approach**:
|
||||
```rust
|
||||
// Log stack allocation
|
||||
log::debug!("Allocated kernel stack {} at {:?}", id, addr);
|
||||
|
||||
// Check stack bounds
|
||||
log::debug!("Stack bottom: {:?}, top: {:?}", bottom, top);
|
||||
|
||||
// Verify stack is mapped
|
||||
```
|
||||
|
||||
## Common Memory Errors
|
||||
|
||||
### Error 1: Page Fault
|
||||
|
||||
**Symptoms**:
|
||||
```
|
||||
PAGE FAULT at 0x... Error Code: 0x...
|
||||
```
|
||||
|
||||
**Error Code Decoding**:
|
||||
```
|
||||
Bit 0 (P): 0 = Page not present
|
||||
1 = Protection violation
|
||||
Bit 1 (W): 0 = Read access
|
||||
1 = Write access
|
||||
Bit 2 (U): 0 = Kernel mode
|
||||
1 = User mode
|
||||
Bit 3 (R): 1 = Reserved bit set
|
||||
Bit 4 (I): 1 = Instruction fetch
|
||||
```
|
||||
|
||||
**Common Causes**:
|
||||
|
||||
**1. Accessing unmapped memory**
|
||||
```rust
|
||||
// Problem: Address not mapped in page table
|
||||
let ptr = 0x12345000 as *const u64;
|
||||
unsafe { *ptr } // PAGE FAULT - not mapped
|
||||
```
|
||||
|
||||
**Diagnosis**:
|
||||
- Check if address should be mapped
|
||||
- Verify page table has entry for this address
|
||||
- Confirm physical frame was allocated
|
||||
|
||||
**Fix**: Map the page before accessing
|
||||
|
||||
**2. Writing to read-only page**
|
||||
```rust
|
||||
// Problem: Page mapped without WRITABLE flag
|
||||
let ptr = read_only_page as *mut u64;
|
||||
unsafe { *ptr = 42; } // PAGE FAULT - write to read-only
|
||||
```
|
||||
|
||||
**Diagnosis**:
|
||||
- Check page table flags
|
||||
- Verify WRITABLE flag is set
|
||||
- Confirm not writing to kernel code/data
|
||||
|
||||
**Fix**: Add WRITABLE flag or don't write to read-only pages
|
||||
|
||||
**3. User accessing kernel page**
|
||||
```rust
|
||||
// Problem: Userspace trying to access kernel memory
|
||||
// (from Ring 3)
|
||||
let ptr = 0xFFFF_8000_0000_0000 as *const u64; // Kernel address
|
||||
unsafe { *ptr } // PAGE FAULT - user accessing kernel
|
||||
```
|
||||
|
||||
**Diagnosis**:
|
||||
- Check if address is in kernel space (upper half)
|
||||
- Verify page doesn't have USER_ACCESSIBLE flag
|
||||
- Confirm userspace should not access this
|
||||
|
||||
**Fix**: Don't allow userspace to access kernel memory
|
||||
|
||||
**4. Accessing kernel stack not mapped in process page table**
|
||||
```rust
|
||||
// Problem: Kernel stack mapped in kernel PT but not process PT
|
||||
// Ring 3 -> Ring 0 transition tries to use unmapped kernel stack
|
||||
// This was the DIRECT_EXECUTION_FIX issue!
|
||||
```
|
||||
|
||||
**Diagnosis**:
|
||||
- Check if kernel stack is mapped in process page table
|
||||
- Verify TSS RSP0 points to valid kernel stack
|
||||
- Look for double fault during syscalls (int 0x80)
|
||||
|
||||
**Fix**: Copy kernel stack mappings to process page table
|
||||
|
||||
### Error 2: Double Fault
|
||||
|
||||
**Symptoms**:
|
||||
```
|
||||
DOUBLE FAULT - Error Code: 0x...
|
||||
Instruction Pointer: 0x...
|
||||
Stack Pointer: 0x...
|
||||
```
|
||||
|
||||
**What it means**: Exception occurred while handling another exception
|
||||
|
||||
**Common Causes**:
|
||||
|
||||
**1. Kernel stack not mapped during exception**
|
||||
```
|
||||
Sequence:
|
||||
1. Exception occurs (page fault, etc.)
|
||||
2. CPU tries to switch to kernel stack
|
||||
3. Kernel stack not mapped in current page table
|
||||
4. Page fault accessing kernel stack
|
||||
5. DOUBLE FAULT
|
||||
```
|
||||
|
||||
**Diagnosis**:
|
||||
- Check which exception triggered the double fault
|
||||
- Verify kernel stack is mapped
|
||||
- Check TSS RSP0 value
|
||||
- Look at instruction pointer (where was CPU when it faulted?)
|
||||
|
||||
**Fix**: Ensure kernel stack mapped in all page tables
|
||||
|
||||
**2. Stack overflow**
|
||||
```
|
||||
Sequence:
|
||||
1. Recursive function or large stack allocation
|
||||
2. Stack exceeds allocated size
|
||||
3. Writes into guard page
|
||||
4. Page fault (guard page not mapped)
|
||||
5. Page fault handler needs stack
|
||||
6. DOUBLE FAULT
|
||||
```
|
||||
|
||||
**Diagnosis**:
|
||||
- Check stack pointer value
|
||||
- Compare against stack bounds
|
||||
- Look for recursive calls
|
||||
- Check for large stack allocations
|
||||
|
||||
**Fix**: Increase stack size or fix code causing overflow
|
||||
|
||||
**3. Exception handler itself faults**
|
||||
```
|
||||
Sequence:
|
||||
1. Exception occurs
|
||||
2. Handler tries to access unmapped memory
|
||||
3. Page fault inside handler
|
||||
4. DOUBLE FAULT
|
||||
```
|
||||
|
||||
**Diagnosis**:
|
||||
- Review exception handler code
|
||||
- Check what handler was executing
|
||||
- Verify handler doesn't access invalid addresses
|
||||
|
||||
**Fix**: Fix bug in exception handler
|
||||
|
||||
### Error 3: Page Already Mapped
|
||||
|
||||
**Symptoms**:
|
||||
```
|
||||
Error: Attempted to map already-mapped page
|
||||
```
|
||||
|
||||
**Common Causes**:
|
||||
|
||||
**1. Shared page table levels**
|
||||
```rust
|
||||
// Problem: Multiple processes share L3 table
|
||||
// Second process tries to map page in shared table
|
||||
// This was the PAGE_TABLE_FIX issue!
|
||||
```
|
||||
|
||||
**Diagnosis**:
|
||||
- Check if page table levels are shared between processes
|
||||
- Verify each process has independent L3/L2/L1 tables
|
||||
- Look at PML4 entry copying code
|
||||
|
||||
**Fix**: Deep copy page table levels, don't share
|
||||
|
||||
**2. Mapping same address twice**
|
||||
```rust
|
||||
// Problem: Code tries to map a page that's already mapped
|
||||
page_table.map_to(page, frame, flags, allocator)?;
|
||||
page_table.map_to(page, frame, flags, allocator)?; // Error!
|
||||
```
|
||||
|
||||
**Diagnosis**:
|
||||
- Check if page is already mapped before mapping
|
||||
- Look for duplicate mapping calls
|
||||
- Verify cleanup properly unmaps pages
|
||||
|
||||
**Fix**: Check before mapping or unmap first
|
||||
|
||||
### Error 4: Out of Memory
|
||||
|
||||
**Symptoms**:
|
||||
```
|
||||
Error: Frame allocator out of memory
|
||||
```
|
||||
|
||||
**Common Causes**:
|
||||
|
||||
**1. Too many allocations**
|
||||
```rust
|
||||
// Problem: Allocating too many frames
|
||||
loop {
|
||||
allocator.allocate_frame(); // Eventually runs out
|
||||
}
|
||||
```
|
||||
|
||||
**Diagnosis**:
|
||||
- Log total memory available
|
||||
- Count allocations vs deallocations
|
||||
- Check for memory leaks
|
||||
|
||||
**Fix**: Free frames when done, or increase memory
|
||||
|
||||
**2. Memory leaks**
|
||||
```rust
|
||||
// Problem: Frames allocated but never freed
|
||||
let frame = allocator.allocate_frame()?;
|
||||
// ... use frame ...
|
||||
// Forget to deallocate - LEAK!
|
||||
```
|
||||
|
||||
**Diagnosis**:
|
||||
- Track allocation/deallocation counts
|
||||
- Look for allocations without corresponding frees
|
||||
- Use systematic allocation patterns
|
||||
|
||||
**Fix**: Properly free all allocated frames
|
||||
|
||||
## Debugging Techniques
|
||||
|
||||
### Technique 1: Add Checkpoint Logging
|
||||
|
||||
Add logging at critical memory operations:
|
||||
|
||||
```rust
|
||||
log::debug!("CHECKPOINT: Before page table operation");
|
||||
page_table.map_to(page, frame, flags, allocator)?;
|
||||
log::debug!("CHECKPOINT: After page table operation");
|
||||
|
||||
log::debug!("CHECKPOINT: Before memory access");
|
||||
unsafe { *(addr as *const u64) };
|
||||
log::debug!("CHECKPOINT: After memory access");
|
||||
```
|
||||
|
||||
If crash happens between checkpoints, you know where to focus.
|
||||
|
||||
### Technique 2: Verify Assumptions
|
||||
|
||||
Check assumptions about memory state:
|
||||
|
||||
```rust
|
||||
// Verify address is mapped before accessing
|
||||
match page_table.translate_addr(addr) {
|
||||
Some(phys) => log::debug!("Address {:?} mapped to {:?}", addr, phys),
|
||||
None => log::warn!("Address {:?} NOT MAPPED", addr),
|
||||
}
|
||||
|
||||
// Verify frame was allocated
|
||||
log::debug!("Allocated frame: {:?}", frame);
|
||||
assert!(frame.start_address().as_u64() > 0);
|
||||
|
||||
// Verify flags are correct
|
||||
log::debug!("Page mapped with flags: {:?}", flags);
|
||||
assert!(flags.contains(PageTableFlags::PRESENT));
|
||||
```
|
||||
|
||||
### Technique 3: Dump Page Table State
|
||||
|
||||
Add functions to dump page table state:
|
||||
|
||||
```rust
|
||||
pub fn dump_page_table_entry(page_table: &PageTable, addr: VirtAddr) {
|
||||
let result = page_table.translate_addr(addr);
|
||||
log::debug!("Address: {:?}", addr);
|
||||
log::debug!(" Translation: {:?}", result);
|
||||
|
||||
// Walk page table levels
|
||||
// Log each level's entry
|
||||
}
|
||||
```
|
||||
|
||||
### Technique 4: Use kernel-debug-loop
|
||||
|
||||
Fast iteration for memory issues:
|
||||
|
||||
```bash
|
||||
# Test fix quickly
|
||||
kernel-debug-loop/scripts/quick_debug.py \
|
||||
--signal "MEMORY OPERATION COMPLETE" \
|
||||
--timeout 10
|
||||
```
|
||||
|
||||
### Technique 5: Compare Working vs Broken State
|
||||
|
||||
If something used to work:
|
||||
|
||||
```bash
|
||||
# Run working version
|
||||
git checkout working_commit
|
||||
kernel-debug-loop/scripts/quick_debug.py ... > working.log
|
||||
|
||||
# Run broken version
|
||||
git checkout broken_commit
|
||||
kernel-debug-loop/scripts/quick_debug.py ... > broken.log
|
||||
|
||||
# Compare
|
||||
diff -u working.log broken.log
|
||||
```
|
||||
|
||||
## Memory Issue Patterns
|
||||
|
||||
### Pattern: Syscall Page Fault
|
||||
|
||||
**Scenario**: Page fault when userspace calls `int 0x80`
|
||||
|
||||
**Diagnosis**:
|
||||
1. Kernel stack not mapped in process page table
|
||||
2. Ring 3 → Ring 0 transition fails
|
||||
|
||||
**Fix**: `copy_kernel_stack_to_process()`
|
||||
|
||||
**Reference**: DIRECT_EXECUTION_FIX.md
|
||||
|
||||
### Pattern: Context Switch Double Fault
|
||||
|
||||
**Scenario**: Double fault during context switch between processes
|
||||
|
||||
**Diagnosis**:
|
||||
1. Wrong page table activated
|
||||
2. Current stack not mapped in new page table
|
||||
|
||||
**Fix**: Ensure kernel stacks globally mapped
|
||||
|
||||
**Reference**: Global kernel page table architecture
|
||||
|
||||
### Pattern: Process Creation "Already Mapped"
|
||||
|
||||
**Scenario**: Second process creation fails with "page already mapped"
|
||||
|
||||
**Diagnosis**:
|
||||
1. Processes sharing page table levels
|
||||
2. First process's mappings conflict with second
|
||||
|
||||
**Fix**: Deep copy page table levels
|
||||
|
||||
**Reference**: PAGE_TABLE_FIX.md
|
||||
|
||||
### Pattern: Heap Allocation Panic
|
||||
|
||||
**Scenario**: Panic during heap allocation
|
||||
|
||||
**Diagnosis**:
|
||||
1. Out of heap memory
|
||||
2. Heap not initialized yet
|
||||
3. Heap corruption
|
||||
|
||||
**Fix**: Increase heap size, defer allocation, or fix corruption
|
||||
|
||||
## Integration with Other Skills
|
||||
|
||||
### With kernel-debug-loop
|
||||
```bash
|
||||
# Fast iteration on memory fixes
|
||||
kernel-debug-loop/scripts/quick_debug.py \
|
||||
--signal "MEMORY TEST COMPLETE" \
|
||||
--timeout 15
|
||||
```
|
||||
|
||||
### With systematic-debugging
|
||||
Document complex memory bugs:
|
||||
```markdown
|
||||
# Problem
|
||||
Page fault at 0x10001082 during sys_write
|
||||
|
||||
# Root Cause
|
||||
User buffer not mapped in process page table
|
||||
|
||||
# Solution
|
||||
Verify user addresses before access
|
||||
|
||||
# Evidence
|
||||
Before: PAGE FAULT
|
||||
After: Successful syscall
|
||||
```
|
||||
|
||||
### With log-analysis
|
||||
```bash
|
||||
# Find memory-related errors
|
||||
echo '"PAGE FAULT"' > /tmp/log-query.txt
|
||||
./scripts/find-in-logs
|
||||
|
||||
# Find allocation patterns
|
||||
echo '"Allocated\|Deallocated"' > /tmp/log-query.txt
|
||||
./scripts/find-in-logs
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Log extensively**: Memory operations should be well-logged
|
||||
2. **Verify before access**: Check addresses are mapped
|
||||
3. **Use checkpoints**: Narrow down failure location
|
||||
4. **Test incrementally**: Add small changes, test frequently
|
||||
5. **Understand architecture**: Know how page tables work
|
||||
6. **Reference working code**: Look at similar working code
|
||||
7. **Document patterns**: Save solutions for future reference
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### Page Fault Error Codes
|
||||
```
|
||||
0x0: Read from unmapped page (kernel mode)
|
||||
0x1: Read from non-present page (kernel mode)
|
||||
0x2: Write to unmapped page (kernel mode)
|
||||
0x3: Write to non-present page (kernel mode)
|
||||
0x4: Read from unmapped page (user mode)
|
||||
0x6: Write to unmapped page (user mode)
|
||||
```
|
||||
|
||||
### Memory Regions
|
||||
```
|
||||
0x0000_0000_0000_0000 - 0x0000_7FFF_FFFF_FFFF: User space
|
||||
0xFFFF_8000_0000_0000 - 0xFFFF_FFFF_FFFF_FFFF: Kernel space
|
||||
0xFFFF_C900_0000_0000: Kernel stacks
|
||||
```
|
||||
|
||||
### Key Files
|
||||
```
|
||||
kernel/src/memory/frame_allocator.rs - Physical memory
|
||||
kernel/src/memory/process_memory.rs - Process page tables
|
||||
kernel/src/memory/kernel_page_table.rs - Kernel mappings
|
||||
kernel/src/memory/kernel_stack.rs - Kernel stack allocator
|
||||
```
|
||||
|
||||
## Summary
|
||||
|
||||
Memory debugging requires:
|
||||
- Understanding memory subsystems (frame allocator, page tables, heap, stacks)
|
||||
- Systematic diagnosis of page faults and double faults
|
||||
- Checkpoint logging to isolate failures
|
||||
- Verification of page table state and mappings
|
||||
- Reference to past fixes (DIRECT_EXECUTION_FIX, PAGE_TABLE_FIX)
|
||||
- Integration with fast iteration tools
|
||||
|
||||
Memory bugs are complex but systematic debugging always works.
|
||||
326
skills/breenix-systematic-debugging/SKILL.md
Normal file
326
skills/breenix-systematic-debugging/SKILL.md
Normal file
@@ -0,0 +1,326 @@
|
||||
---
|
||||
name: systematic-debugging
|
||||
description: This skill should be used when debugging complex kernel issues requiring systematic investigation and documentation. Use for documenting problem analysis, root cause investigation, solution implementation, and evidence collection following Breenix's Problem→Root Cause→Solution→Evidence pattern.
|
||||
---
|
||||
|
||||
# Systematic Debugging for Breenix
|
||||
|
||||
Document-driven debugging workflow for kernel issues.
|
||||
|
||||
## Purpose
|
||||
|
||||
Complex kernel bugs require systematic investigation and documentation. This skill provides the pattern used in Breenix debugging docs like TIMER_INTERRUPT_INVESTIGATION.md, DIRECT_EXECUTION_FIX.md, and PAGE_TABLE_FIX.md.
|
||||
|
||||
## The Four-Phase Pattern
|
||||
|
||||
All debugging documents follow this structure:
|
||||
|
||||
1. **Problem**: What's broken? Observable symptoms
|
||||
2. **Root Cause**: Why is it broken? Deep analysis
|
||||
3. **Solution**: What fixes it? Implementation details
|
||||
4. **Evidence**: How do you know it's fixed? Before/after proof
|
||||
|
||||
## When to Use
|
||||
|
||||
- **Complex kernel issues**: Not simple typos or obvious bugs
|
||||
- **Architectural problems**: Issues requiring design changes
|
||||
- **Recurring failures**: Problems that reappear or are hard to reproduce
|
||||
- **Learning opportunities**: Bugs that teach important lessons
|
||||
- **CI investigations**: Failed tests requiring deep analysis
|
||||
|
||||
## Debugging Workflow
|
||||
|
||||
### Phase 1: Problem Definition
|
||||
|
||||
**Document observable symptoms:**
|
||||
|
||||
```markdown
|
||||
# Problem Summary
|
||||
|
||||
[Brief description of what's failing]
|
||||
|
||||
## Symptoms
|
||||
|
||||
- What fails? (test, boot, specific operation)
|
||||
- When does it fail? (always, intermittently, specific conditions)
|
||||
- Error messages or behavior observed
|
||||
- What works vs what doesn't
|
||||
```
|
||||
|
||||
**Example from DIRECT_EXECUTION_FIX.md:**
|
||||
```markdown
|
||||
# Problem Summary
|
||||
Direct userspace execution was failing with a double fault at `int 0x80`
|
||||
instruction (`0x10000019`).
|
||||
|
||||
## Symptoms
|
||||
- Userspace processes boot successfully
|
||||
- Calling int 0x80 triggers double fault
|
||||
- Error occurs during Ring 3 → Ring 0 transition
|
||||
```
|
||||
|
||||
### Phase 2: Root Cause Analysis
|
||||
|
||||
**Investigate systematically:**
|
||||
|
||||
1. **Reproduce consistently**
|
||||
```bash
|
||||
# Use kernel-debug-loop for fast iteration
|
||||
kernel-debug-loop/scripts/quick_debug.py --signal "FAILURE_POINT" --timeout 10
|
||||
```
|
||||
|
||||
2. **Add diagnostic logging**
|
||||
```rust
|
||||
log::debug!("About to perform operation X");
|
||||
log::debug!("Variable state: {:?}", state);
|
||||
log::debug!("After operation X");
|
||||
```
|
||||
|
||||
3. **Narrow down location**
|
||||
- Binary search: Add checkpoint in middle of suspect code
|
||||
- If reached: problem is after
|
||||
- If not reached: problem is before
|
||||
- Repeat until isolated
|
||||
|
||||
4. **Analyze state**
|
||||
- What values are variables?
|
||||
- What should they be?
|
||||
- What assumptions are violated?
|
||||
|
||||
**Document findings:**
|
||||
|
||||
```markdown
|
||||
## Root Cause Analysis
|
||||
|
||||
1. **Sequence of Events**:
|
||||
- Step 1 happens
|
||||
- Step 2 happens
|
||||
- Step 3 fails because X
|
||||
|
||||
2. **Technical Details**:
|
||||
- Specific memory addresses, registers, flags
|
||||
- Code paths taken
|
||||
- Assumptions violated
|
||||
|
||||
3. **Why It Happens**:
|
||||
- Fundamental reason for the failure
|
||||
- What design assumption was wrong
|
||||
```
|
||||
|
||||
### Phase 3: Solution Implementation
|
||||
|
||||
**Document the fix:**
|
||||
|
||||
```markdown
|
||||
## Solution
|
||||
|
||||
### 1. [Component] Fix
|
||||
**File**: `path/to/file.rs`
|
||||
**Lines**: X-Y
|
||||
|
||||
[Explanation of what changed and why]
|
||||
|
||||
```rust
|
||||
// Code snippet showing the fix
|
||||
```
|
||||
|
||||
### 2. [Another Component] Fix
|
||||
**File**: `path/to/another/file.rs`
|
||||
**Lines**: X-Y
|
||||
|
||||
[Explanation]
|
||||
```
|
||||
|
||||
**Example structure:**
|
||||
- Identify all files that need changes
|
||||
- For each change:
|
||||
- File path
|
||||
- Line numbers
|
||||
- Explanation of change
|
||||
- Code snippet
|
||||
- Rationale
|
||||
|
||||
### Phase 4: Evidence Collection
|
||||
|
||||
**Prove it works:**
|
||||
|
||||
```markdown
|
||||
## Evidence
|
||||
|
||||
### Before Fix:
|
||||
```
|
||||
[Log output or error messages showing failure]
|
||||
```
|
||||
|
||||
### After Fix:
|
||||
```
|
||||
[Log output showing success]
|
||||
```
|
||||
|
||||
### Test Results:
|
||||
- Test X: PASS
|
||||
- Test Y: PASS
|
||||
- Feature Z: Working as expected
|
||||
```
|
||||
|
||||
## Integration with Tools
|
||||
|
||||
### With kernel-debug-loop
|
||||
|
||||
Fast iteration during investigation:
|
||||
|
||||
```bash
|
||||
# Test hypothesis quickly
|
||||
kernel-debug-loop/scripts/quick_debug.py \
|
||||
--signal "CHECKPOINT_AFTER_FIX" \
|
||||
--timeout 15
|
||||
```
|
||||
|
||||
### With log-analysis
|
||||
|
||||
Extract evidence from logs:
|
||||
|
||||
```bash
|
||||
# Find before/after comparison
|
||||
echo '"Error pattern"' > /tmp/log-query.txt
|
||||
./scripts/find-in-logs
|
||||
|
||||
echo '"Success pattern"' > /tmp/log-query.txt
|
||||
./scripts/find-in-logs
|
||||
```
|
||||
|
||||
### With ci-failure-analysis
|
||||
|
||||
Analyze CI test failures:
|
||||
|
||||
```bash
|
||||
ci-failure-analysis/scripts/analyze_ci_failure.py \
|
||||
--context target/xtask_*_output.txt
|
||||
```
|
||||
|
||||
## Debug Document Template
|
||||
|
||||
```markdown
|
||||
# [Issue Name] Fix
|
||||
|
||||
Date: [YYYY-MM-DD]
|
||||
|
||||
## Problem Summary
|
||||
|
||||
[What's broken - one paragraph]
|
||||
|
||||
## Symptoms
|
||||
|
||||
- Symptom 1
|
||||
- Symptom 2
|
||||
- Error messages or behavior
|
||||
|
||||
## Root Cause Analysis
|
||||
|
||||
### Sequence of Events
|
||||
1. [Step by step what happens]
|
||||
2. [Leading to failure]
|
||||
|
||||
### Technical Details
|
||||
- Memory addresses, registers, etc.
|
||||
- Code paths taken
|
||||
- State at time of failure
|
||||
|
||||
### Why It Happens
|
||||
[Fundamental explanation]
|
||||
|
||||
## Solution
|
||||
|
||||
### 1. [First Change]
|
||||
**File**: `path/to/file.rs`
|
||||
**Lines**: X-Y
|
||||
|
||||
[Explanation]
|
||||
|
||||
```rust
|
||||
// Code change
|
||||
```
|
||||
|
||||
### 2. [Second Change]
|
||||
**File**: `path/to/file2.rs`
|
||||
**Lines**: X-Y
|
||||
|
||||
[Explanation]
|
||||
|
||||
## Evidence
|
||||
|
||||
### Before Fix:
|
||||
```
|
||||
[Error output]
|
||||
```
|
||||
|
||||
### After Fix:
|
||||
```
|
||||
[Success output]
|
||||
```
|
||||
|
||||
## Lessons Learned
|
||||
|
||||
1. [Key insight 1]
|
||||
2. [Key insight 2]
|
||||
3. [Patterns to apply in future]
|
||||
|
||||
## Related Issues
|
||||
|
||||
- [Link to similar past bugs]
|
||||
- [Related design decisions]
|
||||
```
|
||||
|
||||
## Example: Real Debugging Session
|
||||
|
||||
Based on TIMER_INTERRUPT_INVESTIGATION.md:
|
||||
|
||||
**Problem**: Kernel hanging after enabling interrupts
|
||||
|
||||
**Investigation**:
|
||||
1. Compare with other OS implementations (blog_os, xv6, Linux)
|
||||
2. Identify what they do (minimal timer handlers)
|
||||
3. Identify what Breenix does (complex timer handler with locks)
|
||||
4. Hypothesis: Timer handler too complex
|
||||
|
||||
**Solution**: Create simple_timer.rs with minimal handler
|
||||
|
||||
**Evidence**:
|
||||
- Before: Kernel hangs immediately
|
||||
- After: Kernel boots and reaches testing menu
|
||||
|
||||
**Lesson**: Interrupt handlers must be TRULY minimal
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Document as you debug**: Don't wait until after
|
||||
2. **Include evidence**: Logs, test results, screenshots
|
||||
3. **Explain reasoning**: Why you investigated X, not Y
|
||||
4. **Note dead ends**: What you tried that didn't work
|
||||
5. **Extract lessons**: What to remember for next time
|
||||
6. **Update related docs**: If this reveals design issues
|
||||
7. **Create regression tests**: Prevent this bug from returning
|
||||
|
||||
## When to Create a Debug Document
|
||||
|
||||
Create a document when:
|
||||
- Bug took >2 hours to solve
|
||||
- Solution required design changes
|
||||
- Bug could reoccur without understanding
|
||||
- Lessons applicable to future development
|
||||
- Multiple components involved
|
||||
- Fix not immediately obvious from code change
|
||||
|
||||
## Summary
|
||||
|
||||
Systematic debugging follows:
|
||||
1. Problem - Clear symptom description
|
||||
2. Root Cause - Deep technical analysis
|
||||
3. Solution - Implementation with rationale
|
||||
4. Evidence - Before/after proof
|
||||
|
||||
This pattern ensures:
|
||||
- Thorough understanding
|
||||
- Proper fixes (not workarounds)
|
||||
- Knowledge preservation
|
||||
- Prevention of similar bugs
|
||||
Reference in New Issue
Block a user