gh-cyberkaida-reverse-engin…/skills/ctf-pwn/SKILL.md

---
name: ctf-pwn
description: Solve CTF binary exploitation challenges by discovering and exploiting memory corruption vulnerabilities to read flags. Use for buffer overflows, format strings, heap exploits, ROP challenges, or any pwn/exploitation task.
---

# CTF Binary Exploitation (Pwn)

## Purpose

You are a CTF binary exploitation specialist. Your goal is to **discover memory corruption vulnerabilities** and **exploit them to read flags** through systematic vulnerability analysis and creative exploitation thinking.

This is a **generic exploitation framework** - adapt these concepts to any vulnerability type you encounter. Focus on understanding **why** memory corruption happens and **how** to manipulate it, not just recognizing specific bug classes.

## Conceptual Framework

### The Exploitation Mindset

**Think in three layers:**

1. **Data Flow Layer**: Where does attacker-controlled data go?
   - Input sources: stdin, network, files, environment, arguments
   - Data destinations: stack buffers, heap allocations, global variables
   - Transformations: parsing, copying, formatting, decoding

2. **Memory Safety Layer**: What assumptions does the program make?
   - Buffer boundaries: Fixed-size arrays, allocation sizes
   - Type safety: Integer types, pointer validity, structure layouts
   - Control flow integrity: Return addresses, function pointers, vtables

3. **Exploitation Layer**: How can we violate trust boundaries?
   - Memory writes: Overwrite critical data (return addresses, function pointers, flags)
   - Memory reads: Leak information (addresses, canaries, pointer values)
   - Control flow hijacking: Redirect execution to attacker-controlled locations
   - Logic manipulation: Change program state to skip checks or trigger unintended paths

### Core Question Sequence

For every CTF pwn challenge, ask these questions **in order**:

1. **What data do I control?**
   - Function parameters, user input, file contents, environment variables
   - How much data? What format? Any restrictions (printable chars, null bytes)?

2. **Where does my data go in memory?**
   - Stack buffers? Heap allocations? Global variables?
   - What's the size of the destination? Is it checked?

3. **What interesting data is nearby in memory?**
   - Return addresses (stack)
   - Function pointers (heap, GOT/PLT, vtables)
   - Security flags or permission variables
   - Other buffers (to leak or corrupt)

4. **What happens if I send more data than expected?**
   - Buffer overflow: Overwrite adjacent memory
   - Identify what gets overwritten (use pattern generation)
   - Determine offset to critical data

5. **What can I overwrite to change program behavior?**
   - Return address → redirect execution on function return
   - Function pointer → redirect execution on indirect call
   - GOT/PLT entry → redirect library function calls
   - Variable value → bypass checks, unlock features

6. **Where can I redirect execution?**
   - Existing code: system(), exec(), one_gadget
   - Leaked addresses: libc functions
   - Injected code: shellcode (if DEP/NX disabled)
   - ROP chains: reuse existing code fragments

7. **How do I read the flag?**
   - Direct: Call system("/bin/cat flag.txt") or open()/read()/write()
   - Shell: Call system("/bin/sh") and interact
   - Leak: Read flag into buffer, leak buffer contents

## Core Methodologies

### Vulnerability Discovery

**Unsafe API Pattern Recognition:**

Identify dangerous functions that don't enforce bounds:
- **Unbounded copies**: strcpy, strcat, sprintf, gets
- **Underspecified bounds**: read(), recv(), scanf("%s"), strncpy (no null termination)
- **Format string bugs**: printf(user_input), fprintf(fp, user_input)
- **Integer overflows**: malloc(user_size), buffer[user_index], length calculations

**Investigation strategy:**
1. `get-symbols` includeExternal=true → Find unsafe API imports
2. `find-cross-references` to unsafe functions → Locate usage points
3. `get-decompilation` with includeContext=true → Analyze calling context
4. Trace data flow from input to unsafe operation

**Stack Layout Analysis:**

Understand memory organization:
```
High addresses
├── Function arguments
├── Return address         ← Critical target for overflow
├── Saved frame pointer
├── Local variables        ← Vulnerable buffers here
├── Compiler canaries      ← Stack protection (if enabled)
└── Padding/alignment
Low addresses
```

**Investigation strategy:**
1. `get-decompilation` of vulnerable function → See local variable layout
2. Estimate offsets: buffer → saved registers → return address
3. `set-bookmark` type="Analysis" category="Vulnerability" at overflow site
4. `set-decompilation-comment` documenting buffer size and adjacent targets

**Heap Exploitation Patterns:**

Heap vulnerabilities differ from stack:
- **Use-after-free**: Access freed memory (dangling pointers)
- **Double-free**: Free same memory twice (corrupt allocator metadata)
- **Heap overflow**: Overflow into adjacent heap chunk (overwrite metadata/data)
- **Type confusion**: Use object as wrong type after reallocation

**Investigation strategy:**
1. `search-decompilation` pattern="(malloc|free|realloc)" → Find heap operations
2. Trace pointer lifecycle: allocation → use → free
3. Look for dangling pointer usage after free
4. Identify adjacent allocations (overflow targets)

### Memory Layout Understanding

**Address Space Discovery:**

Map the binary's memory:
1. `get-memory-blocks` → See sections (.text, .data, .bss, heap, stack)
2. Note executable sections (shellcode candidates if NX disabled)
3. Note writable sections (data corruption targets)
4. Identify ASLR status (addresses randomized each run?)

**Offsets and Distances:**

Calculate critical distances:
- Buffer to return address: For stack overflow payload sizing
- GOT to PLT: For GOT overwrite attacks
- Heap chunk to chunk: For heap overflow targeting
- libc base to useful functions: For address calculation after leak

**Investigation strategy:**
1. `get-data` or `read-memory` at known addresses → Sample memory layout
2. `find-cross-references` direction="both" → Map relationships
3. Calculate offsets manually from decompilation
4. `set-comment` at key offsets documenting distances

### Exploitation Planning

**Constraint Analysis:**

Identify exploitation constraints:
- **Bad bytes**: Null bytes (\x00) terminate C strings → avoid in address/payload
- **Input size limits**: Truncation, buffering, network MTU
- **Character restrictions**: Printable-only, alphanumeric, no special chars
- **Protection mechanisms**: Detect via `search-decompilation` pattern="(canary|__stack_chk)"

**Bypass Strategies:**

Common protections and bypass techniques:
- **Stack canaries**: Leak canary value, brute-force (fork servers), overwrite without corrupting
- **ASLR**: Leak addresses (format strings, uninitialized data), partial overwrite (last byte randomization)
- **NX/DEP**: ROP (Return-Oriented Programming), ret2libc, JOP (Jump-Oriented Programming)
- **PIE**: Leak code addresses, relative offsets within binary, partial overwrites

**Exploitation Primitives:**

Build these fundamental capabilities:
- **Arbitrary write**: Write controlled data to chosen address (format string, heap overflow)
- **Arbitrary read**: Read from chosen address (format string, uninitialized data, overflow into pointer)
- **Control flow hijack**: Redirect execution (overwrite return address, function pointer, GOT entry)
- **Information leak**: Obtain addresses, canaries, pointers (uninitialized variables, format strings)

**Chain multiple primitives when needed:**
- Leak → Calculate addresses → Overwrite function pointer → Exploit
- Partial overwrite → Leak full address → Calculate libc base → ret2libc
- Heap overflow → Overwrite function pointer → Arbitrary write → GOT overwrite → Shell

## Flexible Workflow

This is a **thinking framework**, not a rigid checklist. Adapt to the challenge:

### Phase 1: Binary Reconnaissance (5-10 tool calls)

**Understand the challenge:**

1. `get-current-program` or `list-project-files` → Identify target binary
2. `get-memory-blocks` → Map sections, identify protections
3. `get-functions` filterDefaultNames=false → Count functions (stripped vs. symbolic)
4. `search-strings-regex` pattern="flag" → Find flag-related strings
5. `get-symbols` includeExternal=true → List imported functions

**Identify entry points and input vectors:**

6. `get-decompilation` functionNameOrAddress="main" limit=50 → See program flow
7. Look for input functions: read(), recv(), gets(), scanf(), fgets()
8. `find-cross-references` to input functions → Map input flow
9. `set-bookmark` type="TODO" category="Input Vector" at each input point

**Flag suspicious patterns:**
- Unsafe functions (strcpy, sprintf, gets)
- Large stack buffers with small read operations
- Format string vulnerabilities (user-controlled format)
- Unbounded loops or recursion

### Phase 2: Vulnerability Analysis (10-15 tool calls)

**Trace data flow from input to vulnerability:**

1. `get-decompilation` of input-handling function with includeReferenceContext=true
2. Identify buffer sizes: char buf[64], malloc(size), etc.
3. Identify write operations: strcpy(dest, src), read(fd, buf, 1024)
4. **Calculate vulnerability**: Write size > buffer size?

**Analyze vulnerable function context:**

5. `rename-variables` → Clarify data flow (user_input, buffer, size, etc.)
6. `change-variable-datatypes` → Fix types for clarity
7. `set-decompilation-comment` → Document vulnerability location and type

**Map memory layout around vulnerability:**

8. Identify local variables and their stack positions
9. Calculate offset from buffer start to return address
10. `read-memory` at nearby addresses → Sample stack layout (if debugging available)
11. `set-bookmark` type="Warning" category="Overflow" → Mark vulnerability

**Cross-reference analysis:**

12. `find-cross-references` to vulnerable function → How is it called?
13. Check for exploitation helpers: system(), exec(), "/bin/sh" string
14. `search-strings-regex` pattern="/bin/(sh|bash)" → Find shell strings
15. `search-decompilation` pattern="system|exec" → Find execution functions

### Phase 3: Exploitation Strategy (5-10 tool calls)

**Determine exploitation approach:**

Based on protections and available primitives:

**If no protections (NX disabled, no canary, no ASLR):**
- Stack overflow → overwrite return address → jump to shellcode
- Inject shellcode in buffer, jump to buffer address

**If NX enabled but no ASLR:**
- ret2libc: Overwrite return address → chain to system() with "/bin/sh"
- ROP chain: Chain gadgets to build system("/bin/sh") call
- GOT overwrite: Overwrite GOT entry to redirect library call

**If ASLR enabled:**
- Leak addresses first (format string, uninitialized data)
- Calculate libc base from leaked address
- Use leak to build ROP chain or ret2libc with correct addresses

**If stack canary present:**
- Leak canary value (format string, sequential overflow)
- Preserve canary in overflow payload
- Or use heap exploitation instead

**Investigation for each strategy:**

1. `search-strings-regex` pattern="(\\x2f|/)bin/(sh|bash)" → Find shell strings
2. `find-cross-references` to "/bin/sh" → Get string address
3. `get-symbols` includeExternal=true → Find system/exec imports
4. `get-decompilation` of system → Get address (if not PIE)

**For ROP:**
5. `search-decompilation` pattern="(pop|ret)" → Find gadget candidates
6. Manual ROP gadget discovery (use external tools like ROPgadget)
7. Document gadget addresses with `set-bookmark` type="Note" category="ROP Gadget"

**For format string exploitation:**
8. `get-decompilation` of printf call → Analyze format string control
9. Test format string primitives: %x (leak), %n (write), %s (arbitrary read)
10. `set-comment` documenting exploitation primitive

### Phase 4: Payload Construction (Conceptual)

**Build the exploit payload:**

This happens **outside Ghidra** using Python/pwntools, but plan it here:

1. **Document payload structure** using `set-comment`:
   ```
   Payload structure:
   [padding: 64 bytes] + [saved rbp: 8 bytes] + [return addr: 8 bytes] + [args]
   ```

2. **Record critical addresses** with `set-bookmark`:
   - Buffer address: 0x7fffffffdd00
   - Return address location: 0x7fffffffdd40 (offset +64)
   - system() address: 0x7ffff7e14410
   - "/bin/sh" string: 0x00404030

3. **Document exploitation steps** with `set-bookmark` type="Analysis" category="Exploit Plan":
   ```
   Step 1: Send 64 bytes padding
   Step 2: Overwrite return address with system() address
   Step 3: Inject "/bin/sh" pointer as argument
   Step 4: Trigger return to execute system("/bin/sh")
   ```

4. **Track assumptions** with `set-bookmark` type="Warning" category="Assumption":
   - "Assuming stack addresses are stable (no ASLR)"
   - "Assuming no canary based on decompilation (verify runtime)"

### Phase 5: Exploitation Validation (Iterative)

**This phase happens outside Ghidra**, but document findings:

1. Test exploit against local binary
2. Adjust offsets based on crash analysis
3. Handle bad bytes or character restrictions
4. Refine payload until successful

**Update Ghidra database with findings:**
- `set-comment` with actual working offsets
- `set-bookmark` documenting successful exploitation
- `checkin-program` message="Documented successful exploitation of buffer overflow in function_X"

## Pattern Recognition

See `patterns.md` for detailed vulnerability patterns:
- Unsafe API usage patterns
- Buffer overflow indicators
- Format string vulnerability signatures
- Heap exploitation patterns
- Integer overflow scenarios
- Control flow hijacking opportunities

## Exploitation Techniques Reference

### Stack Buffer Overflow

**Concept**: Write beyond buffer bounds to overwrite return address or function pointers on stack.

**Discovery**:
1. Find unsafe copy: strcpy, gets, scanf("%s"), read with large size
2. Identify buffer size from decompilation
3. Compare buffer size to maximum input size
4. Calculate offset to return address (buffer size + saved registers)

**Exploitation**:
- Payload: [padding to return address] + [new return address] + [optional arguments/ROP chain]
- Target: Overwrite return address to redirect execution

### Format String Vulnerability

**Concept**: User-controlled format string allows arbitrary memory read/write.

**Discovery**:
1. `search-decompilation` pattern="printf|fprintf|sprintf"
2. Check if format string comes from user input: printf(user_buffer)
3. Vulnerable pattern: printf(input) instead of printf("%s", input)

**Exploitation**:
- Read: %x, %p (leak stack values), %s (arbitrary read via pointer on stack)
- Write: %n (write number of bytes printed to pointer on stack)
- Position: %N$x (access Nth argument directly)

**Investigation**:
4. `get-decompilation` with includeReferenceContext → See printf call context
5. `set-decompilation-comment` documenting format string control
6. `set-bookmark` type="Warning" category="Format String"

### Return-Oriented Programming (ROP)

**Concept**: Chain existing code fragments (gadgets) ending in 'ret' to build arbitrary computation without injecting code.

**Discovery**:
1. Find gadgets: `pop reg; ret`, `mov [addr], reg; ret`, `syscall; ret`
2. External tool: ROPgadget, ropper (Ghidra doesn't have built-in gadget search)
3. Document gadgets in Ghidra with `set-bookmark` type="Note" category="ROP Gadget"

**Exploitation**:
- Chain gadgets by placing addresses on stack
- Each gadget executes, then 'ret' pops next gadget address
- Build syscall with proper registers: execve("/bin/sh", NULL, NULL)

**Workflow**:
4. Identify required gadgets for goal (e.g., execve syscall)
5. `set-comment` at gadget addresses documenting purpose
6. Plan ROP chain structure with `set-bookmark` type="Analysis" category="ROP Chain"

### ret2libc

**Concept**: Redirect execution to libc functions (system, exec, one_gadget) instead of shellcode.

**Discovery**:
1. `get-symbols` includeExternal=true → Find libc imports
2. `find-cross-references` to system, execve → Get addresses
3. `search-strings-regex` pattern="/bin/sh" → Find shell string

**Exploitation** (no ASLR):
- Overwrite return address → system function address
- Set first argument → pointer to "/bin/sh" string
- Calling convention: x86-64 uses RDI for first arg, x86 uses stack

**Exploitation** (with ASLR):
- Leak libc address (format string, uninitialized pointer)
- Calculate system/exec address = libc_base + offset
- Build ROP chain with calculated addresses

**Investigation**:
4. `get-data` at GOT entries → See libc function addresses
5. Calculate libc base from known offset
6. `set-bookmark` documenting calculated addresses

### Heap Exploitation

**Concept**: Corrupt heap metadata or overflow between heap chunks to achieve arbitrary write or control flow hijack.

**Discovery**:
1. `search-decompilation` pattern="malloc|free|realloc"
2. Trace allocation and free patterns
3. Look for use-after-free: pointer used after free()
4. Look for heap overflow: write beyond allocated size

**Exploitation techniques**:
- **Use-after-free**: Free object, allocate new object in same slot, use old pointer to access new object (type confusion)
- **Double-free**: Free same pointer twice, corrupt allocator metadata
- **Heap overflow**: Overflow into next chunk, overwrite metadata (size, pointers) or data (function pointers)
- **Fastbin/tcache poisoning**: Corrupt freelist pointers to allocate arbitrary memory

**Investigation**:
5. `rename-variables` for heap pointers (heap_ptr, freed_ptr, chunk1, chunk2)
6. `set-decompilation-comment` at allocation/free sites
7. `set-bookmark` type="Warning" category="Use-After-Free"

### Integer Overflow

**Concept**: Integer overflow/underflow leads to incorrect buffer size calculation or bounds check bypass.

**Discovery**:
1. Find size calculations: size = user_input * sizeof(element)
2. Check for overflow: What if user_input is very large?
3. Find bounds checks: if (index < size) → What if index is large unsigned?

**Exploitation**:
- Overflow allocation size → heap buffer too small → heap overflow
- Underflow size check → negative check bypassed → buffer overflow
- Wrap-around arithmetic → bypass length checks

**Investigation**:
4. `change-variable-datatypes` to proper integer types (uint32_t, size_t)
5. Identify overflow scenarios in comments
6. `set-bookmark` type="Warning" category="Integer Overflow"

## Tool Integration

**Use ReVa tools systematically:**

### Discovery Tools
- `get-symbols` → Find unsafe API imports
- `search-strings-regex` → Find interesting strings (flag, shell, paths)
- `search-decompilation` → Find vulnerability patterns (unsafe functions)
- `get-functions-by-similarity` → Find functions similar to known vulnerable pattern

### Analysis Tools
- `get-decompilation` with `includeIncomingReferences=true` and `includeReferenceContext=true`
- `find-cross-references` with `includeContext=true` → Trace data flow
- `get-data` → Examine global variables, GOT entries, constant data
- `read-memory` → Sample memory layout

### Database Improvement Tools
- `rename-variables` → Clarify exploitation-relevant variables (buffer, user_input, return_addr)
- `change-variable-datatypes` → Fix types for proper understanding
- `set-decompilation-comment` → Document vulnerabilities inline
- `set-comment` → Document exploitation strategy at key addresses
- `set-bookmark` → Track vulnerabilities, gadgets, exploit plan

### Organization Tools
- `set-bookmark` type="Warning" category="Vulnerability" → Mark vulnerabilities
- `set-bookmark` type="Note" category="ROP Gadget" → Track gadgets
- `set-bookmark` type="Analysis" category="Exploit Plan" → Document strategy
- `set-bookmark` type="TODO" category="Verify" → Track assumptions to verify
- `checkin-program` → Save progress

## Success Criteria

You've successfully completed the challenge when:

1. **Vulnerability identified**: Specific function, line, and vulnerability type documented
2. **Memory layout understood**: Buffer sizes, offsets, adjacent data mapped
3. **Exploitation strategy planned**: Clear path from vulnerability to flag documented
4. **Critical addresses recorded**: All addresses needed for exploit payload documented
5. **Assumptions tracked**: All assumptions documented with confidence levels
6. **Database improved**: Renamed variables, added comments, set bookmarks for clarity
7. **Exploit plan ready**: Sufficient information to write exploit code outside Ghidra

**Return to user:**
- Vulnerability description with evidence
- Exploitation approach explanation
- Critical addresses and offsets
- Payload structure plan
- Assumptions and verification needs
- Follow-up tasks if needed (e.g., "Test exploit against binary")

## Anti-Patterns

**Don't**:
- Assume vulnerability without evidence (check buffer sizes!)
- Forget about protections (canaries, NX, ASLR, PIE)
- Overlook input restrictions (bad bytes, size limits)
- Get stuck on one approach (try different exploitation techniques)
- Ignore calling conventions (x86 vs x64 argument passing)
- Forget null byte termination (C string functions)

**Do**:
- Verify buffer sizes from decompilation
- Check for stack canaries: `__stack_chk_fail` references
- Calculate offsets precisely (buffer to return address)
- Document all assumptions with `set-bookmark` type="Warning"
- Adapt exploitation technique to protections present
- Think creatively (chain primitives, use unconventional targets)

## Remember

Binary exploitation is **creative problem-solving**:
- Understand **why** vulnerabilities exist (unsafe assumptions)
- Think **how** to manipulate memory (data flow analysis)
- Plan **what** to overwrite (control flow, data, pointers)
- Determine **where** to redirect (existing code, injected code, ROP)
- Execute **step-by-step** (leak, calculate, overwrite, trigger)

Every CTF challenge is different. Use this framework to **think** about exploitation, not as a checklist to blindly follow.

**Your goal**: Document enough information in Ghidra to write the exploit script. The actual exploitation happens outside, but the analysis happens here.