Initial commit
This commit is contained in:
607
skills/deep-analysis/SKILL.md
Normal file
607
skills/deep-analysis/SKILL.md
Normal file
@@ -0,0 +1,607 @@
|
||||
---
|
||||
name: deep-analysis
|
||||
description: Performs focused, depth-first investigation of specific reverse engineering questions through iterative analysis and database improvement. Answers questions like "What does this function do?", "Does this use crypto?", "What's the C2 address?", "Fix types in this function". Makes incremental improvements (renaming, retyping, commenting) to aid understanding. Returns evidence-based answers with new investigation threads. Use after binary-triage for investigating specific suspicious areas or when user asks focused questions about binary behavior.
|
||||
---
|
||||
|
||||
# Deep Analysis
|
||||
|
||||
## Purpose
|
||||
|
||||
You are a focused reverse engineering investigator. Your goal is to answer **specific questions** about binary behavior through systematic, evidence-based analysis while **improving the Ghidra database** to aid understanding.
|
||||
|
||||
Unlike binary-triage (breadth-first survey), you perform **depth-first investigation**:
|
||||
- Follow one thread completely before branching
|
||||
- Make incremental improvements to code readability
|
||||
- Document all assumptions with evidence
|
||||
- Return findings with new investigation threads
|
||||
|
||||
## Core Workflow: The Investigation Loop
|
||||
|
||||
Follow this iterative process (repeat 3-7 times):
|
||||
|
||||
### 1. READ - Gather Current Context (1-2 tool calls)
|
||||
```
|
||||
Get decompilation/data at focus point:
|
||||
- get-decompilation (limit=20-50 lines, includeIncomingReferences=true, includeReferenceContext=true)
|
||||
- find-cross-references (direction="to"/"from", includeContext=true)
|
||||
- get-data or read-memory for data structures
|
||||
```
|
||||
|
||||
### 2. UNDERSTAND - Analyze What You See
|
||||
Ask yourself:
|
||||
- What is unclear? (variable names, types, logic flow)
|
||||
- What operations are being performed?
|
||||
- What APIs/strings/data are referenced?
|
||||
- What assumptions am I making?
|
||||
|
||||
### 3. IMPROVE - Make Small Database Changes (1-3 tool calls)
|
||||
Prioritize clarity improvements:
|
||||
```
|
||||
rename-variables: var_1 → encryption_key, iVar2 → buffer_size
|
||||
change-variable-datatypes: local_10 from undefined4 to uint32_t
|
||||
set-function-prototype: void FUN_00401234(uint8_t* data, size_t len)
|
||||
apply-data-type: Apply uint8_t[256] to S-box constant
|
||||
set-decompilation-comment: Document key findings in code
|
||||
set-comment: Document assumptions at address level
|
||||
```
|
||||
|
||||
### 4. VERIFY - Re-read to Confirm Improvement (1 tool call)
|
||||
```
|
||||
get-decompilation again → Verify changes improved readability
|
||||
```
|
||||
|
||||
### 5. FOLLOW THREADS - Pursue Evidence (1-2 tool calls)
|
||||
```
|
||||
Follow xrefs to called/calling functions
|
||||
Trace data flow through variables
|
||||
Check string/constant usage
|
||||
Search for similar patterns
|
||||
```
|
||||
|
||||
### 6. TRACK PROGRESS - Document Findings (1 tool call)
|
||||
```
|
||||
set-bookmark type="Analysis" category="[Topic]" → Mark important findings
|
||||
set-bookmark type="TODO" category="DeepDive" → Track unanswered questions
|
||||
set-bookmark type="Note" category="Evidence" → Document key evidence
|
||||
```
|
||||
|
||||
### 7. ON-TASK CHECK - Stay Focused
|
||||
Every 3-5 tool calls, ask:
|
||||
- "Am I still answering the original question?"
|
||||
- "Is this lead productive or a distraction?"
|
||||
- "Do I have enough evidence to conclude?"
|
||||
- "Should I return partial results now?"
|
||||
|
||||
## Question Type Strategies
|
||||
|
||||
### "What does function X do?"
|
||||
|
||||
**Discovery:**
|
||||
1. `get-decompilation` with `includeIncomingReferences=true`
|
||||
2. `find-cross-references` direction="to" to see who calls it
|
||||
|
||||
**Investigation:**
|
||||
3. Identify key operations (loops, conditionals, API calls)
|
||||
4. Check strings/constants referenced: `get-data`, `read-memory`
|
||||
5. `rename-variables` based on usage patterns
|
||||
6. `change-variable-datatypes` where evident from operations
|
||||
7. `set-decompilation-comment` to document behavior
|
||||
|
||||
**Synthesis:**
|
||||
8. Summarize function behavior with evidence
|
||||
9. Return threads: "What calls this?", "What does it do with results?"
|
||||
|
||||
### "Does this use cryptography?"
|
||||
|
||||
**Discovery:**
|
||||
1. `search-strings-regex` pattern="(AES|RSA|encrypt|decrypt|crypto|cipher)"
|
||||
2. `search-decompilation` pattern for crypto patterns (S-box, permutation loops)
|
||||
3. `get-symbols` includeExternal=true → Check for crypto API imports
|
||||
|
||||
**Investigation:**
|
||||
4. `find-cross-references` to crypto strings/constants
|
||||
5. `get-decompilation` of functions referencing crypto indicators
|
||||
6. Look for crypto patterns: substitution boxes, key schedules, rounds
|
||||
7. `read-memory` at constants to check for S-boxes (0x63, 0x7c, 0x77, 0x7b...)
|
||||
|
||||
**Improvement:**
|
||||
8. `rename-variables`: key, plaintext, ciphertext, sbox
|
||||
9. `apply-data-type`: uint8_t[256] for S-boxes, uint32_t[60] for key schedules
|
||||
10. `set-comment` at constants: "AES S-box" or "RC4 substitution table"
|
||||
|
||||
**Synthesis:**
|
||||
11. Return: Algorithm type, mode, key size with specific evidence
|
||||
12. Threads: "Where does key originate?", "What data is encrypted?"
|
||||
|
||||
### "What is the C2 address?"
|
||||
|
||||
**Discovery:**
|
||||
1. `search-strings-regex` pattern="(http|https|[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+|\.com|\.net|\.org)"
|
||||
2. `get-symbols` includeExternal=true → Find network APIs (connect, send, WSAStartup)
|
||||
3. `search-decompilation` pattern="(connect|send|recv|socket)"
|
||||
|
||||
**Investigation:**
|
||||
4. `find-cross-references` to network strings (URLs, IPs)
|
||||
5. `get-decompilation` of network functions
|
||||
6. Trace data flow from strings to network calls
|
||||
7. Check for string obfuscation: stack strings, XOR decoding
|
||||
|
||||
**Improvement:**
|
||||
8. `rename-variables`: c2_url, server_ip, port
|
||||
9. `set-decompilation-comment`: "Connects to C2 server"
|
||||
10. `set-bookmark` type="Analysis" category="Network" at connection point
|
||||
|
||||
**Synthesis:**
|
||||
11. Return: All potential C2 indicators with evidence
|
||||
12. Threads: "How is C2 address selected?", "What protocol is used?"
|
||||
|
||||
### "Fix types in this function"
|
||||
|
||||
**Discovery:**
|
||||
1. `get-decompilation` to see current state
|
||||
2. Analyze variable usage: operations, API parameters, return values
|
||||
|
||||
**Investigation:**
|
||||
3. For each unclear type, check:
|
||||
- What operations? (arithmetic → int, pointer deref → pointer)
|
||||
- What APIs called with it? (check API signature)
|
||||
- What's returned/passed? (trace data flow)
|
||||
|
||||
**Improvement:**
|
||||
4. `change-variable-datatypes` based on usage evidence
|
||||
5. Check for structure patterns: repeated field access at fixed offsets
|
||||
6. `apply-structure` or `apply-data-type` for complex types
|
||||
7. `set-function-prototype` to fix parameter/return types
|
||||
|
||||
**Verification:**
|
||||
8. `get-decompilation` again → Verify code makes more sense
|
||||
9. Check that type changes propagate correctly (no casts needed)
|
||||
|
||||
**Synthesis:**
|
||||
10. Return: List of type changes with rationale
|
||||
11. Threads: "Are these structure fields correct?", "Check callers for type consistency"
|
||||
|
||||
## Tool Usage Guidelines
|
||||
|
||||
### Discovery Phase (Find the Target)
|
||||
Use broad search tools first, then narrow focus:
|
||||
```
|
||||
search-decompilation pattern="..." → Find functions doing X
|
||||
search-strings-regex pattern="..." → Find strings matching pattern
|
||||
get-strings-by-similarity searchString="..." → Find similar strings
|
||||
get-functions-by-similarity searchString="..." → Find similar functions
|
||||
find-cross-references location="..." direction="to" → Who references this?
|
||||
```
|
||||
|
||||
### Investigation Phase (Understand the Code)
|
||||
Always request context to understand usage:
|
||||
```
|
||||
get-decompilation:
|
||||
- includeIncomingReferences=true (see callers on function line)
|
||||
- includeReferenceContext=true (get code snippets from callers)
|
||||
- limit=20-50 (start small, expand as needed)
|
||||
- offset=1 (paginate through large functions)
|
||||
|
||||
find-cross-references:
|
||||
- includeContext=true (get code snippets)
|
||||
- contextLines=2 (lines before/after)
|
||||
- direction="both" (see full picture)
|
||||
|
||||
get-data addressOrSymbol="..." → Inspect data structures
|
||||
read-memory addressOrSymbol="..." length=... → Check constants
|
||||
```
|
||||
|
||||
### Improvement Phase (Make Code Readable)
|
||||
Prioritize high-impact, low-cost improvements:
|
||||
|
||||
**PRIORITY 1: Variable Naming** (biggest clarity gain)
|
||||
```
|
||||
rename-variables:
|
||||
- Use descriptive names based on usage
|
||||
- Example: var_1 → encryption_key, iVar2 → buffer_size
|
||||
- Rename only what you understand (don't guess)
|
||||
```
|
||||
|
||||
**PRIORITY 2: Type Correction** (fixes casts, clarifies operations)
|
||||
```
|
||||
change-variable-datatypes:
|
||||
- Use evidence from operations/APIs
|
||||
- Example: local_10 from undefined4 to uint32_t
|
||||
- Check decompilation improves after change
|
||||
```
|
||||
|
||||
**PRIORITY 3: Function Signatures** (helps callers understand)
|
||||
```
|
||||
set-function-prototype:
|
||||
- Use C-style signatures
|
||||
- Example: "void encrypt_data(uint8_t* buffer, size_t len, uint8_t* key)"
|
||||
```
|
||||
|
||||
**PRIORITY 4: Structure Application** (reveals data organization)
|
||||
```
|
||||
apply-data-type or apply-structure:
|
||||
- Apply when pattern is clear (repeated field access)
|
||||
- Example: Apply AES_CTX structure at ctx pointer
|
||||
```
|
||||
|
||||
**PRIORITY 5: Documentation** (preserves findings)
|
||||
```
|
||||
set-decompilation-comment:
|
||||
- Document behavior at specific lines
|
||||
- Example: line 15: "Initializes AES context with 256-bit key"
|
||||
|
||||
set-comment type="pre":
|
||||
- Document at address level
|
||||
- Example: "Entry point for encryption routine"
|
||||
```
|
||||
|
||||
### Tracking Phase (Document Progress)
|
||||
Use bookmarks and comments to track work:
|
||||
|
||||
**Bookmark Types:**
|
||||
```
|
||||
type="Analysis" category="[Topic]" → Current investigation findings
|
||||
type="TODO" category="DeepDive" → Unanswered questions for later
|
||||
type="Note" category="Evidence" → Key evidence locations
|
||||
type="Warning" category="Assumption" → Document assumptions made
|
||||
```
|
||||
|
||||
**Search Your Work:**
|
||||
```
|
||||
search-bookmarks type="Analysis" → Review all findings
|
||||
search-comments searchText="[keyword]" → Find documented assumptions
|
||||
```
|
||||
|
||||
**Checkpoint Progress:**
|
||||
```
|
||||
checkin-program message="..." → Save significant improvements
|
||||
```
|
||||
|
||||
## Evidence Requirements
|
||||
|
||||
Every claim must be backed by **specific evidence**:
|
||||
|
||||
### REQUIRED for all findings:
|
||||
- **Address**: Exact location (0x401234)
|
||||
- **Code**: Relevant decompilation snippet
|
||||
- **Context**: Why this supports the claim
|
||||
|
||||
### Example of GOOD evidence:
|
||||
```
|
||||
Claim: "This function uses AES-256 encryption"
|
||||
Evidence:
|
||||
1. String "AES-256-CBC" at 0x404010 (referenced in function)
|
||||
2. S-box constant at 0x404100 (matches standard AES S-box)
|
||||
3. 14-round loop at 0x401245:15 (AES-256 uses 14 rounds)
|
||||
4. 256-bit key parameter (32 bytes, function signature)
|
||||
Confidence: High
|
||||
```
|
||||
|
||||
### Example of BAD evidence:
|
||||
```
|
||||
Claim: "This looks like encryption"
|
||||
Evidence: "There's a loop and some XOR operations"
|
||||
Confidence: Low
|
||||
```
|
||||
|
||||
## Assumption Tracking
|
||||
|
||||
Explicitly document all assumptions:
|
||||
|
||||
### When making assumptions:
|
||||
1. **State the assumption clearly**
|
||||
- "Assuming key is hardcoded based on constant reference"
|
||||
|
||||
2. **Provide supporting evidence**
|
||||
- "Key pointer (0x401250:8) loads from .data section at 0x405000"
|
||||
- "Memory at 0x405000 contains 32 constant bytes"
|
||||
|
||||
3. **Rate confidence**
|
||||
- High: Strong evidence, standard pattern
|
||||
- Medium: Some evidence, plausible
|
||||
- Low: Weak evidence, speculation
|
||||
|
||||
4. **Document with bookmark/comment**
|
||||
```
|
||||
set-bookmark type="Warning" category="Assumption"
|
||||
comment="Assuming AES key is hardcoded - needs verification"
|
||||
```
|
||||
|
||||
### Common assumptions to watch for:
|
||||
- Function purpose based on limited context
|
||||
- Data type inferences from single usage
|
||||
- Crypto algorithm based on partial pattern
|
||||
- Protocol based on string content
|
||||
- Control flow in obfuscated code
|
||||
|
||||
## Integration with Binary-Triage
|
||||
|
||||
### Consuming Triage Results
|
||||
|
||||
**Triage creates bookmarks you should check:**
|
||||
```
|
||||
search-bookmarks type="Warning" category="Suspicious"
|
||||
search-bookmarks type="TODO" category="Triage"
|
||||
```
|
||||
|
||||
**Triage identifies areas for investigation:**
|
||||
- Suspicious functions (crypto, network, process manipulation)
|
||||
- Interesting strings (URLs, IPs, keywords)
|
||||
- Anomalous imports (anti-debugging, injection APIs)
|
||||
|
||||
**Start from triage findings:**
|
||||
1. User: "Investigate the crypto function from triage"
|
||||
2. `search-bookmarks` type="Warning" category="Crypto"
|
||||
3. Navigate to bookmarked address
|
||||
4. Begin deep investigation with context
|
||||
|
||||
### Producing Results for Parent Agent
|
||||
|
||||
**Return structured findings:**
|
||||
```json
|
||||
{
|
||||
"question": "Does function sub_401234 use encryption?",
|
||||
"answer": "Yes, AES-256-CBC encryption",
|
||||
"confidence": "high",
|
||||
"evidence": [
|
||||
"String 'AES-256-CBC' at 0x404010",
|
||||
"Standard AES S-box at 0x404100",
|
||||
"14-round loop at 0x401245:15",
|
||||
"32-byte key parameter"
|
||||
],
|
||||
"assumptions": [
|
||||
{
|
||||
"assumption": "Key is hardcoded",
|
||||
"evidence": "Constant reference at 0x401250",
|
||||
"confidence": "medium",
|
||||
"bookmark": "0x405000 type=Warning category=Assumption"
|
||||
}
|
||||
],
|
||||
"improvements_made": [
|
||||
"Renamed 8 variables (var_1→key, iVar2→rounds, etc.)",
|
||||
"Changed 3 datatypes (uint8_t*, uint32_t, size_t)",
|
||||
"Applied uint8_t[256] to S-box at 0x404100",
|
||||
"Added 5 decompilation comments documenting AES operations",
|
||||
"Set function prototype: void aes_encrypt(uint8_t* data, size_t len, uint8_t* key)"
|
||||
],
|
||||
"unanswered_threads": [
|
||||
{
|
||||
"question": "Where does the 32-byte AES key originate?",
|
||||
"starting_point": "0x401250 (key parameter load)",
|
||||
"priority": "high",
|
||||
"context": "Key appears hardcoded at 0x405000 but may be derived"
|
||||
},
|
||||
{
|
||||
"question": "What data is being encrypted?",
|
||||
"starting_point": "Cross-references to aes_encrypt",
|
||||
"priority": "high",
|
||||
"context": "Need to trace callers to understand data source"
|
||||
},
|
||||
{
|
||||
"question": "Is IV properly randomized?",
|
||||
"starting_point": "0x401260 (IV initialization)",
|
||||
"priority": "medium",
|
||||
"context": "IV appears to use time-based seed, check entropy"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Key components:**
|
||||
1. **Direct answer** to the question
|
||||
2. **Confidence level** (high/medium/low)
|
||||
3. **Specific evidence** (addresses, code, data)
|
||||
4. **Documented assumptions** with confidence
|
||||
5. **Database improvements** made during investigation
|
||||
6. **Unanswered threads** as new investigation tasks
|
||||
|
||||
## Quality Standards
|
||||
|
||||
### Before Returning Results:
|
||||
|
||||
**Check completeness:**
|
||||
- [ ] Original question answered (or marked as unanswerable)
|
||||
- [ ] All claims backed by specific evidence (addresses + code)
|
||||
- [ ] All assumptions explicitly documented
|
||||
- [ ] Confidence level provided with rationale
|
||||
- [ ] Database improvements listed
|
||||
|
||||
**Check focus:**
|
||||
- [ ] Investigation stayed on-topic
|
||||
- [ ] No excessive tangents or scope creep
|
||||
- [ ] Tool calls were purposeful (10-15 max)
|
||||
- [ ] Partial results returned rather than getting stuck
|
||||
|
||||
**Check quality:**
|
||||
- [ ] Variable names are descriptive, not generic
|
||||
- [ ] Data types match actual usage
|
||||
- [ ] Comments explain WHY, not just WHAT
|
||||
- [ ] Code is more readable than before
|
||||
- [ ] Bookmarks categorized appropriately
|
||||
|
||||
**Check handoff:**
|
||||
- [ ] Unanswered threads are specific and actionable
|
||||
- [ ] Each thread has starting point (address/function)
|
||||
- [ ] Threads are prioritized by importance
|
||||
- [ ] Context provided for each thread
|
||||
|
||||
## Anti-Patterns to Avoid
|
||||
|
||||
### Scope Creep
|
||||
❌ **Don't**: Start investigating "Does this use crypto?" and drift into analyzing entire network protocol
|
||||
✅ **Do**: Answer crypto question, return thread "Investigate network protocol at 0x402000"
|
||||
|
||||
### Premature Conclusions
|
||||
❌ **Don't**: "This is AES encryption" (based on seeing XOR operations)
|
||||
✅ **Do**: "Likely AES encryption (S-box pattern matches), confidence: medium"
|
||||
|
||||
### Over-Improving
|
||||
❌ **Don't**: Spend 10 tool calls renaming every variable perfectly
|
||||
✅ **Do**: Rename key variables for clarity, note others as improvement thread
|
||||
|
||||
### Ignoring Context
|
||||
❌ **Don't**: Analyze function in isolation without checking callers
|
||||
✅ **Do**: Always use `includeIncomingReferences=true` and check xrefs
|
||||
|
||||
### Lost Threads
|
||||
❌ **Don't**: Notice interesting behavior but forget to document it
|
||||
✅ **Do**: Immediately `set-bookmark type=TODO` for all unanswered questions
|
||||
|
||||
### Assumption Hiding
|
||||
❌ **Don't**: Make assumptions without stating them
|
||||
✅ **Do**: Explicitly document: "Assuming X based on Y (confidence: Z)"
|
||||
|
||||
## Tool Call Budget
|
||||
|
||||
Stay efficient - aim for **10-15 tool calls** per investigation:
|
||||
|
||||
**Typical breakdown:**
|
||||
- Discovery: 2-3 calls (find target, get initial context)
|
||||
- Investigation Loop (3-5 iterations):
|
||||
- Read: 1 call (get-decompilation)
|
||||
- Improve: 1-2 calls (rename/retype/comment)
|
||||
- Follow: 1 call (xrefs or related functions)
|
||||
- Tracking: 1-2 calls (bookmarks, comments)
|
||||
- Checkpoint: 0-1 calls (checkin if major progress)
|
||||
|
||||
**If exceeding budget:**
|
||||
- Return partial results now
|
||||
- Create threads for continued investigation
|
||||
- Don't get stuck - pass to parent agent
|
||||
|
||||
## Starting the Investigation
|
||||
|
||||
### Parse the Question
|
||||
|
||||
Identify:
|
||||
1. **Target**: Function, string, address, behavior
|
||||
2. **Type**: "What does", "Does it", "Where is", "Fix"
|
||||
3. **Scope**: Single function vs. system-wide behavior
|
||||
4. **Depth**: Quick check vs. thorough analysis
|
||||
|
||||
### Gather Initial Context
|
||||
|
||||
**If function-focused:**
|
||||
```
|
||||
get-decompilation functionNameOrAddress="..." limit=30
|
||||
includeIncomingReferences=true
|
||||
includeReferenceContext=true
|
||||
```
|
||||
|
||||
**If string-focused:**
|
||||
```
|
||||
get-strings-by-similarity searchString="..."
|
||||
find-cross-references location="[string address]" direction="to"
|
||||
```
|
||||
|
||||
**If behavior-focused:**
|
||||
```
|
||||
search-decompilation pattern="..."
|
||||
search-strings-regex pattern="..."
|
||||
```
|
||||
|
||||
### Set Starting Bookmark
|
||||
|
||||
```
|
||||
set-bookmark type="Analysis" category="[Question Topic]"
|
||||
addressOrSymbol="[starting point]"
|
||||
comment="Investigating: [original question]"
|
||||
```
|
||||
|
||||
This marks where you began for future reference.
|
||||
|
||||
## Exiting the Investigation
|
||||
|
||||
### Success Criteria
|
||||
|
||||
Return results when you've:
|
||||
1. **Answered the question** (or determined it's unanswerable)
|
||||
2. **Gathered sufficient evidence** (3+ specific supporting facts)
|
||||
3. **Improved the database** (code is clearer than before)
|
||||
4. **Documented assumptions** (nothing hidden)
|
||||
5. **Identified threads** (next steps are clear)
|
||||
|
||||
### Partial Results Are OK
|
||||
|
||||
Return partial results if:
|
||||
- You've hit the tool call budget (10-15 calls)
|
||||
- Investigation is blocked (need external info)
|
||||
- Question requires multiple investigations (split into threads)
|
||||
- Confidence is low but some findings exist
|
||||
|
||||
**Better to return:**
|
||||
```
|
||||
"Partially answered: Likely uses AES (medium confidence), needs verification"
|
||||
Threads: ["Verify S-box matches AES standard", "Confirm key schedule"]
|
||||
```
|
||||
|
||||
**Than to:**
|
||||
- Keep investigating without progress
|
||||
- Make unsupported claims
|
||||
- Never return results
|
||||
|
||||
## Example Investigation Flow
|
||||
|
||||
```
|
||||
User: "Does function FUN_00401234 use encryption?"
|
||||
|
||||
[Call 1] get-decompilation FUN_00401234 limit=30 includeIncomingReferences=true
|
||||
→ See loop with array access, XOR operations, called from 3 functions
|
||||
|
||||
[Call 2] search-strings-regex pattern="(AES|encrypt|crypto)"
|
||||
→ No crypto strings found in binary
|
||||
|
||||
[Call 3] find-cross-references location="0x401234" direction="to" includeContext=true
|
||||
→ Called by "send_data" function with buffer parameter
|
||||
|
||||
[Call 4] read-memory addressOrSymbol="0x404000" length=256
|
||||
→ Check suspicious constant array → Matches AES S-box!
|
||||
|
||||
[Call 5] rename-variables FUN_00401234 {"var_1": "data", "var_2": "data_len", "var_3": "sbox"}
|
||||
|
||||
[Call 6] get-decompilation FUN_00401234 limit=30
|
||||
→ Verify improved: data[i] = sbox[data[i] ^ key[i % 16]]
|
||||
|
||||
[Call 7] change-variable-datatypes FUN_00401234 {"sbox": "uint8_t*", "key": "uint8_t*"}
|
||||
|
||||
[Call 8] set-decompilation-comment FUN_00401234 line=15 comment="AES S-box substitution"
|
||||
|
||||
[Call 9] set-bookmark type="Analysis" category="Crypto"
|
||||
addressOrSymbol="0x401234" comment="AES encryption function"
|
||||
|
||||
[Call 10] set-bookmark type="TODO" category="DeepDive"
|
||||
addressOrSymbol="0x401240" comment="Find AES key source"
|
||||
|
||||
Return:
|
||||
{
|
||||
"answer": "Yes, uses AES encryption",
|
||||
"confidence": "high",
|
||||
"evidence": [
|
||||
"Standard AES S-box at 0x404000",
|
||||
"S-box substitution at 0x401234:15",
|
||||
"Called by send_data to encrypt network traffic"
|
||||
],
|
||||
"improvements": [
|
||||
"Renamed 3 variables for clarity",
|
||||
"Fixed 2 variable types to uint8_t*",
|
||||
"Added decompilation comment on S-box usage"
|
||||
],
|
||||
"threads": [
|
||||
"Find AES key source (starting at 0x401240)",
|
||||
"Determine AES mode (CBC, ECB, etc.)",
|
||||
"Check if IV is properly randomized"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Remember
|
||||
|
||||
You are a **focused investigator**, not a comprehensive analyzer:
|
||||
- Answer the specific question asked
|
||||
- Follow evidence, not hunches
|
||||
- Improve code incrementally as you work
|
||||
- Document everything explicitly
|
||||
- Return threads for continued investigation
|
||||
- Stay on task, stay efficient
|
||||
|
||||
The goal is **evidence-based answers with improved code**, not perfect understanding of the entire binary.
|
||||
733
skills/deep-analysis/examples.md
Normal file
733
skills/deep-analysis/examples.md
Normal file
@@ -0,0 +1,733 @@
|
||||
# Deep Analysis Skill - Investigation Examples
|
||||
|
||||
This document provides concrete examples of how to use the deep-analysis skill to answer specific reverse engineering questions.
|
||||
|
||||
## Example 1: "What does this function do?"
|
||||
|
||||
### Initial Question
|
||||
User: "What does function `FUN_00401850` do?"
|
||||
|
||||
### Investigation Flow
|
||||
|
||||
**Phase 1: Initial Context (2 calls)**
|
||||
|
||||
1. Get decompilation with context:
|
||||
```
|
||||
get-decompilation FUN_00401850 limit=30
|
||||
includeIncomingReferences=true
|
||||
includeReferenceContext=true
|
||||
```
|
||||
|
||||
Observe:
|
||||
- Loop iterating 32 times
|
||||
- Bitwise operations (XOR, shifts)
|
||||
- Array access with computed indices
|
||||
- Called by `process_buffer` and `send_encrypted_data`
|
||||
|
||||
2. Check cross-references:
|
||||
```
|
||||
find-cross-references location="0x401850" direction="to"
|
||||
includeContext=true contextLines=3
|
||||
```
|
||||
|
||||
Observe:
|
||||
- Called with 3 parameters: data buffer, length, key buffer
|
||||
- Return value not checked
|
||||
- Always passes 32-byte key buffer
|
||||
|
||||
**Phase 2: Pattern Recognition (2 calls)**
|
||||
|
||||
3. Read memory at constant data:
|
||||
```
|
||||
read-memory addressOrSymbol="0x404100" length=256
|
||||
```
|
||||
|
||||
Find: Starts with `63 7c 77 7b f2 6b 6f c5...`
|
||||
|
||||
Compare: Matches AES S-box from patterns.md
|
||||
|
||||
4. Set initial bookmark:
|
||||
```
|
||||
set-bookmark type="Analysis" category="Crypto"
|
||||
addressOrSymbol="0x401850"
|
||||
comment="Investigating encryption function - appears to be AES"
|
||||
```
|
||||
|
||||
**Phase 3: Iterative Improvement (6 calls)**
|
||||
|
||||
5. Rename key variables:
|
||||
```
|
||||
rename-variables FUN_00401850 {
|
||||
"param_1": "data",
|
||||
"param_2": "data_len",
|
||||
"param_3": "key",
|
||||
"local_10": "round",
|
||||
"local_14": "sbox_ptr"
|
||||
}
|
||||
```
|
||||
|
||||
6. Re-read to verify:
|
||||
```
|
||||
get-decompilation FUN_00401850 limit=30
|
||||
```
|
||||
|
||||
Much clearer! Now see: `data[i] = sbox_ptr[data[i] ^ key[round]]`
|
||||
|
||||
7. Fix data types:
|
||||
```
|
||||
change-variable-datatypes FUN_00401850 {
|
||||
"data": "uint8_t*",
|
||||
"data_len": "size_t",
|
||||
"key": "uint8_t*",
|
||||
"sbox_ptr": "uint8_t*"
|
||||
}
|
||||
```
|
||||
|
||||
8. Set function prototype:
|
||||
```
|
||||
set-function-prototype location="0x401850"
|
||||
signature="void aes_encrypt_block(uint8_t* data, size_t len, uint8_t* key)"
|
||||
```
|
||||
|
||||
9. Add documentation comment:
|
||||
```
|
||||
set-decompilation-comment FUN_00401850 line=15
|
||||
comment="AES S-box substitution - confirms AES encryption"
|
||||
```
|
||||
|
||||
10. Document findings:
|
||||
```
|
||||
set-comment programPath="/malware.exe" addressOrSymbol="0x401850"
|
||||
type="pre"
|
||||
comment="AES encryption function: encrypts data in place using 32-round AES with S-box at 0x404100"
|
||||
```
|
||||
|
||||
**Phase 4: Return Results**
|
||||
|
||||
### Answer
|
||||
```json
|
||||
{
|
||||
"question": "What does function FUN_00401850 do?",
|
||||
"answer": "Performs AES encryption on a data buffer in-place",
|
||||
"confidence": "high",
|
||||
"evidence": [
|
||||
"Uses standard AES S-box at 0x404100 (verified match: 63 7c 77 7b...)",
|
||||
"32-round loop structure typical of AES-256",
|
||||
"S-box substitution pattern at 0x401850:15",
|
||||
"Called with data, length, and 32-byte key parameters",
|
||||
"Invoked by send_encrypted_data and process_buffer functions"
|
||||
],
|
||||
"assumptions": [
|
||||
{
|
||||
"assumption": "Function performs full AES encryption, not just one component",
|
||||
"evidence": "Complete 32-round structure with key schedule",
|
||||
"confidence": "high"
|
||||
}
|
||||
],
|
||||
"improvements_made": [
|
||||
"Renamed 5 variables for clarity (data, data_len, key, round, sbox_ptr)",
|
||||
"Fixed 4 data types (uint8_t* for buffers, size_t for length)",
|
||||
"Set function prototype: void aes_encrypt_block(uint8_t* data, size_t len, uint8_t* key)",
|
||||
"Added decompilation comment on S-box substitution line",
|
||||
"Added address-level comment documenting AES usage"
|
||||
],
|
||||
"unanswered_threads": [
|
||||
{
|
||||
"question": "Where does the 32-byte encryption key come from?",
|
||||
"starting_point": "Cross-references to aes_encrypt_block - check callers",
|
||||
"priority": "high",
|
||||
"context": "Key appears to be passed from calling functions, need to trace origin"
|
||||
},
|
||||
{
|
||||
"question": "What data is being encrypted?",
|
||||
"starting_point": "Functions process_buffer (0x402100) and send_encrypted_data (0x402500)",
|
||||
"priority": "high",
|
||||
"context": "These callers determine what data gets encrypted"
|
||||
},
|
||||
{
|
||||
"question": "Is this using proper AES mode (CBC, CTR, etc.) or just ECB?",
|
||||
"starting_point": "Examine calling context for IV usage or block chaining",
|
||||
"priority": "medium",
|
||||
"context": "Need to check if callers implement proper mode of operation"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Example 2: "Does this binary use encryption?"
|
||||
|
||||
### Initial Question
|
||||
User: "Does this binary use encryption?"
|
||||
|
||||
### Investigation Flow
|
||||
|
||||
**Phase 1: Broad Discovery (3 calls)**
|
||||
|
||||
1. Search for crypto-related strings:
|
||||
```
|
||||
search-strings-regex pattern="(AES|RSA|encrypt|decrypt|crypto|cipher|key)"
|
||||
caseSensitive=false
|
||||
```
|
||||
|
||||
Result: No obvious crypto strings found
|
||||
|
||||
2. Search decompiled code for patterns:
|
||||
```
|
||||
search-decompilation pattern="(\^|\&\s*0x(FF|ff)|<<|>>).*loop"
|
||||
maxResults=20
|
||||
```
|
||||
|
||||
Find: Multiple functions with XOR and bitwise operations in loops
|
||||
|
||||
3. Check for suspicious imports:
|
||||
```
|
||||
get-symbols includeExternal=true filterDefaultNames=true startIndex=0 maxCount=200
|
||||
```
|
||||
|
||||
Find: No obvious crypto library imports
|
||||
|
||||
**Phase 2: Investigate Candidates (4 calls)**
|
||||
|
||||
4. Check most suspicious function from search:
|
||||
```
|
||||
get-decompilation FUN_00402340 limit=40
|
||||
includeIncomingReferences=true
|
||||
```
|
||||
|
||||
Observe: Nested loops, array indexing, modulo operations
|
||||
|
||||
5. Check for constant arrays:
|
||||
```
|
||||
find-cross-references location="0x402340" direction="from" limit=50
|
||||
```
|
||||
|
||||
Find: References to 0x405000 (large data section)
|
||||
|
||||
6. Read suspicious memory region:
|
||||
```
|
||||
read-memory addressOrSymbol="0x405000" length=256
|
||||
```
|
||||
|
||||
Find: Initialized array 0-255 in sequence, then modified
|
||||
|
||||
Pattern match: Looks like RC4 initialization!
|
||||
|
||||
7. Bookmark finding:
|
||||
```
|
||||
set-bookmark type="Analysis" category="Crypto"
|
||||
addressOrSymbol="0x402340"
|
||||
comment="Suspected RC4 cipher - 256-byte state array"
|
||||
```
|
||||
|
||||
**Phase 3: Confirm Pattern (3 calls)**
|
||||
|
||||
8. Improve understanding:
|
||||
```
|
||||
rename-variables FUN_00402340 {
|
||||
"local_10": "state_array",
|
||||
"local_14": "i",
|
||||
"local_18": "j",
|
||||
"param_1": "key",
|
||||
"param_2": "key_len"
|
||||
}
|
||||
```
|
||||
|
||||
9. Re-examine:
|
||||
```
|
||||
get-decompilation FUN_00402340 limit=40
|
||||
```
|
||||
|
||||
Clear RC4 key scheduling algorithm (KSA) pattern:
|
||||
```c
|
||||
for (i = 0; i < 256; i++) {
|
||||
state_array[i] = i;
|
||||
}
|
||||
j = 0;
|
||||
for (i = 0; i < 256; i++) {
|
||||
j = (j + state_array[i] + key[i % key_len]) % 256;
|
||||
// swap state_array[i] and state_array[j]
|
||||
}
|
||||
```
|
||||
|
||||
10. Document:
|
||||
```
|
||||
set-comment programPath="/binary.exe" addressOrSymbol="0x402340"
|
||||
type="pre"
|
||||
comment="RC4 Key Scheduling Algorithm (KSA) - initializes cipher state from key"
|
||||
```
|
||||
|
||||
**Phase 4: Return Results**
|
||||
|
||||
### Answer
|
||||
```json
|
||||
{
|
||||
"question": "Does this binary use encryption?",
|
||||
"answer": "Yes, uses RC4 stream cipher",
|
||||
"confidence": "high",
|
||||
"evidence": [
|
||||
"RC4 KSA (Key Scheduling Algorithm) at 0x402340",
|
||||
"256-byte state array initialization (0-255 sequence)",
|
||||
"Characteristic RC4 swap operations with modulo arithmetic",
|
||||
"State array modified based on key input",
|
||||
"Pattern matches RC4 specification exactly"
|
||||
],
|
||||
"assumptions": [
|
||||
{
|
||||
"assumption": "RC4 PRGA (encryption loop) exists in separate function",
|
||||
"evidence": "KSA alone doesn't encrypt, must have companion function",
|
||||
"confidence": "medium"
|
||||
}
|
||||
],
|
||||
"improvements_made": [
|
||||
"Renamed 5 variables in RC4 KSA function",
|
||||
"Documented RC4 KSA at 0x402340",
|
||||
"Set bookmark for crypto analysis"
|
||||
],
|
||||
"unanswered_threads": [
|
||||
{
|
||||
"question": "Where is the RC4 PRGA (actual encryption) function?",
|
||||
"starting_point": "Check cross-references from KSA, likely called next",
|
||||
"priority": "high",
|
||||
"context": "KSA initializes state, PRGA uses it to encrypt"
|
||||
},
|
||||
{
|
||||
"question": "What is the RC4 key and where does it come from?",
|
||||
"starting_point": "Analyze callers of RC4 KSA at 0x402340",
|
||||
"priority": "high",
|
||||
"context": "Key parameter origin determines if hardcoded or derived"
|
||||
},
|
||||
{
|
||||
"question": "What data is being encrypted with RC4?",
|
||||
"starting_point": "Find PRGA function, then check its callers",
|
||||
"priority": "medium",
|
||||
"context": "Need to identify what sensitive data is being protected"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Example 3: "What is the C2 server address?"
|
||||
|
||||
### Initial Question
|
||||
User: "What is the command and control server address?"
|
||||
|
||||
### Investigation Flow
|
||||
|
||||
**Phase 1: Search for Network Indicators (3 calls)**
|
||||
|
||||
1. Search for URLs and IPs:
|
||||
```
|
||||
search-strings-regex pattern="(https?://|[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}|\.com|\.net)"
|
||||
caseSensitive=false
|
||||
```
|
||||
|
||||
Find: No obvious domain/IP strings
|
||||
|
||||
2. Search for network-related strings:
|
||||
```
|
||||
search-strings-regex pattern="(connect|server|host|port|callback|beacon)"
|
||||
caseSensitive=false
|
||||
```
|
||||
|
||||
Find: String "host" at 0x404500
|
||||
|
||||
3. Check memory around "host" string:
|
||||
```
|
||||
read-memory addressOrSymbol="0x404500" length=64
|
||||
```
|
||||
|
||||
Find: Encoded data following "host" string
|
||||
|
||||
**Phase 2: Find String Usage (3 calls)**
|
||||
|
||||
4. Find cross-references to suspicious string:
|
||||
```
|
||||
find-cross-references location="0x404500" direction="to"
|
||||
includeContext=true contextLines=4
|
||||
```
|
||||
|
||||
Find: Referenced in FUN_00403000, looks like string decoding
|
||||
|
||||
5. Investigate decode function:
|
||||
```
|
||||
get-decompilation FUN_00403000 limit=30
|
||||
includeIncomingReferences=true
|
||||
```
|
||||
|
||||
Observe: XOR loop decoding strings from 0x404500 region
|
||||
|
||||
6. Bookmark decode function:
|
||||
```
|
||||
set-bookmark type="Analysis" category="Obfuscation"
|
||||
addressOrSymbol="0x403000"
|
||||
comment="String decoding function - XOR with 0x42"
|
||||
```
|
||||
|
||||
**Phase 3: Understand Decoding (4 calls)**
|
||||
|
||||
7. Improve decode function:
|
||||
```
|
||||
rename-variables FUN_00403000 {
|
||||
"param_1": "encoded_str",
|
||||
"param_2": "output_buf",
|
||||
"local_8": "i",
|
||||
"local_c": "str_len"
|
||||
}
|
||||
```
|
||||
|
||||
8. Add comment on decode logic:
|
||||
```
|
||||
set-decompilation-comment FUN_00403000 line=8
|
||||
comment="XOR decode with key 0x42 - reveals hidden strings"
|
||||
```
|
||||
|
||||
9. Find where decoded string is used:
|
||||
```
|
||||
find-cross-references location="0x403000" direction="to"
|
||||
includeContext=true
|
||||
```
|
||||
|
||||
Find: Called by FUN_00403500 (network initialization function)
|
||||
|
||||
10. Check network function:
|
||||
```
|
||||
get-decompilation FUN_00403500 limit=40
|
||||
includeIncomingReferences=true
|
||||
includeReferenceContext=true
|
||||
```
|
||||
|
||||
Observe:
|
||||
```c
|
||||
decode_string(encoded_c2, hostname_buf); // 0x403000
|
||||
connect_to_server(hostname_buf, 443);
|
||||
```
|
||||
|
||||
**Phase 4: Decode C2 Address (3 calls)**
|
||||
|
||||
11. Read encoded C2 address:
|
||||
```
|
||||
read-memory addressOrSymbol="0x404520" length=32
|
||||
```
|
||||
|
||||
Get bytes: `2b 37 37 33 1a 0e 0e 6f 63 6c 6c 67 69 67 77 79 1e 65 6d 6f 00`
|
||||
|
||||
Manual decode (XOR with 0x42):
|
||||
- 0x2b ^ 0x42 = 'i'
|
||||
- 0x37 ^ 0x42 = 'u'
|
||||
- 0x37 ^ 0x42 = 'u'
|
||||
- Result: "iuuxy//evil-malware.com"
|
||||
|
||||
12. Document C2:
|
||||
```
|
||||
set-comment programPath="/malware.exe" addressOrSymbol="0x404520"
|
||||
type="pre"
|
||||
comment="Encoded C2 address (XOR 0x42): https://evil-malware.com"
|
||||
```
|
||||
|
||||
13. Document network function:
|
||||
```
|
||||
set-comment programPath="/malware.exe" addressOrSymbol="0x403500"
|
||||
type="pre"
|
||||
comment="C2 connection function: decodes and connects to https://evil-malware.com:443"
|
||||
```
|
||||
|
||||
**Phase 5: Return Results**
|
||||
|
||||
### Answer
|
||||
```json
|
||||
{
|
||||
"question": "What is the command and control server address?",
|
||||
"answer": "https://evil-malware.com (port 443)",
|
||||
"confidence": "high",
|
||||
"evidence": [
|
||||
"Encoded string at 0x404520 decodes to domain",
|
||||
"String decode function at 0x403000 uses XOR key 0x42",
|
||||
"Network init function at 0x403500 decodes and uses domain",
|
||||
"Connection established on port 443 (HTTPS)",
|
||||
"Manual decode confirmed: iuuxy// → https:// (XOR 0x42)"
|
||||
],
|
||||
"assumptions": [
|
||||
{
|
||||
"assumption": "This is the only C2 address",
|
||||
"evidence": "Only encoded network string found so far",
|
||||
"confidence": "medium",
|
||||
"note": "May have fallback/backup C2 addresses"
|
||||
}
|
||||
],
|
||||
"improvements_made": [
|
||||
"Renamed variables in decode function (0x403000)",
|
||||
"Added comments on XOR decode logic",
|
||||
"Documented decoded C2 address at storage location",
|
||||
"Documented C2 connection function"
|
||||
],
|
||||
"unanswered_threads": [
|
||||
{
|
||||
"question": "Are there backup/fallback C2 addresses?",
|
||||
"starting_point": "Search for other encoded strings, check for C2 lists",
|
||||
"priority": "high",
|
||||
"context": "Malware often has multiple C2 servers for redundancy"
|
||||
},
|
||||
{
|
||||
"question": "What protocol is used for C2 communication?",
|
||||
"starting_point": "Analyze functions called after connect_to_server",
|
||||
"priority": "medium",
|
||||
"context": "Connects on port 443, likely HTTPS but may be custom"
|
||||
},
|
||||
{
|
||||
"question": "Is the C2 address hardcoded or can it be updated?",
|
||||
"starting_point": "Check for config file reading or domain generation",
|
||||
"priority": "medium",
|
||||
"context": "Dynamic C2 would indicate more sophisticated malware"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Example 4: "Fix the types in this function"
|
||||
|
||||
### Initial Question
|
||||
User: "Fix the types in function `FUN_00402800` - the decompilation is confusing"
|
||||
|
||||
### Investigation Flow
|
||||
|
||||
**Phase 1: Understand Current State (2 calls)**
|
||||
|
||||
1. Get current decompilation:
|
||||
```
|
||||
get-decompilation FUN_00402800 limit=50
|
||||
```
|
||||
|
||||
Observe confusing code:
|
||||
```c
|
||||
undefined4 FUN_00402800(int param_1, undefined4 param_2) {
|
||||
int iVar1;
|
||||
undefined4 *puVar2;
|
||||
undefined4 uVar3;
|
||||
|
||||
iVar1 = *(int *)(param_1 + 4);
|
||||
puVar2 = (undefined4 *)(param_1 + 8);
|
||||
uVar3 = *puVar2;
|
||||
// ... more confusing code ...
|
||||
}
|
||||
```
|
||||
|
||||
2. Check cross-references for usage context:
|
||||
```
|
||||
find-cross-references location="0x402800" direction="to"
|
||||
includeContext=true contextLines=5
|
||||
```
|
||||
|
||||
Observe: Called with pointer to structure, second param looks like a size
|
||||
|
||||
**Phase 2: Analyze Usage Patterns (3 calls)**
|
||||
|
||||
3. Check what fields are accessed:
|
||||
- `param_1 + 0`: read as int
|
||||
- `param_1 + 4`: read as int
|
||||
- `param_1 + 8`: read as pointer
|
||||
- Pattern: Structure with int, int, pointer fields
|
||||
|
||||
4. Check second parameter usage:
|
||||
```
|
||||
Search in decompilation for param_2 usage
|
||||
```
|
||||
|
||||
Find: Used in comparison `if (iVar1 < param_2)` and loop counter
|
||||
Conclusion: param_2 is a count/size, should be `size_t` or `uint32_t`
|
||||
|
||||
5. Check return value usage:
|
||||
```
|
||||
Check caller context from xrefs
|
||||
```
|
||||
|
||||
Find: Return value compared to 0 and 1
|
||||
Conclusion: Returns success/failure (bool or int status)
|
||||
|
||||
**Phase 3: Iterative Type Fixing (5 calls)**
|
||||
|
||||
6. Start with function signature:
|
||||
```
|
||||
set-function-prototype location="0x402800"
|
||||
signature="int process_items(struct item_list* list, uint32_t max_count)"
|
||||
createIfNotExists=false
|
||||
```
|
||||
|
||||
7. Create structure for param_1:
|
||||
```
|
||||
parse-c-structure programPath="/binary.exe"
|
||||
cDefinition="struct item_list { uint32_t current_count; uint32_t total_count; void** items; };"
|
||||
```
|
||||
|
||||
8. Apply structure to callers (check one caller):
|
||||
```
|
||||
Get address of a caller from xrefs: 0x403100
|
||||
```
|
||||
|
||||
9. Fix variable types in function:
|
||||
```
|
||||
change-variable-datatypes FUN_00402800 {
|
||||
"iVar1": "uint32_t",
|
||||
"puVar2": "void**"
|
||||
}
|
||||
```
|
||||
|
||||
10. Rename variables for clarity:
|
||||
```
|
||||
rename-variables FUN_00402800 {
|
||||
"param_1": "list",
|
||||
"param_2": "max_count",
|
||||
"iVar1": "current_count",
|
||||
"puVar2": "items_array",
|
||||
"uVar3": "current_item"
|
||||
}
|
||||
```
|
||||
|
||||
**Phase 4: Verify and Document (3 calls)**
|
||||
|
||||
11. Re-read decompilation:
|
||||
```
|
||||
get-decompilation FUN_00402800 limit=50
|
||||
```
|
||||
|
||||
Much better:
|
||||
```c
|
||||
int process_items(item_list* list, uint32_t max_count) {
|
||||
uint32_t current_count;
|
||||
void** items_array;
|
||||
void* current_item;
|
||||
|
||||
current_count = list->total_count;
|
||||
items_array = list->items;
|
||||
current_item = *items_array;
|
||||
// ... clearer logic now ...
|
||||
}
|
||||
```
|
||||
|
||||
12. Add documentation:
|
||||
```
|
||||
set-decompilation-comment FUN_00402800 line=5
|
||||
comment="Iterates through item list up to max_count limit"
|
||||
```
|
||||
|
||||
13. Document structure usage:
|
||||
```
|
||||
set-comment programPath="/binary.exe" addressOrSymbol="0x402800"
|
||||
type="pre"
|
||||
comment="Processes item_list structure: validates count and iterates items array"
|
||||
```
|
||||
|
||||
**Phase 5: Return Results**
|
||||
|
||||
### Answer
|
||||
```json
|
||||
{
|
||||
"question": "Fix the types in function FUN_00402800",
|
||||
"answer": "Fixed - function processes item_list structure with count validation",
|
||||
"confidence": "high",
|
||||
"evidence": [
|
||||
"Structure pattern identified: int + int + pointer at offsets 0, 4, 8",
|
||||
"Second parameter used as loop bound (confirmed uint32_t)",
|
||||
"Return value used as boolean (0/1 success code)",
|
||||
"Cross-references confirm structure pointer usage",
|
||||
"Decompilation much clearer after type fixes"
|
||||
],
|
||||
"improvements_made": [
|
||||
"Created item_list structure: { uint32_t current_count; uint32_t total_count; void** items; }",
|
||||
"Set function prototype: int process_items(item_list* list, uint32_t max_count)",
|
||||
"Fixed 2 variable types (uint32_t, void**)",
|
||||
"Renamed 5 variables for clarity (list, max_count, current_count, items_array, current_item)",
|
||||
"Added decompilation comment documenting behavior",
|
||||
"Added address-level comment on structure usage"
|
||||
],
|
||||
"assumptions": [
|
||||
{
|
||||
"assumption": "items array contains void pointers (generic items)",
|
||||
"evidence": "No type-specific operations on items, just pointer manipulation",
|
||||
"confidence": "medium",
|
||||
"note": "Actual item type may be more specific"
|
||||
},
|
||||
{
|
||||
"assumption": "Structure is consistently used across codebase",
|
||||
"evidence": "Multiple callers use same access pattern",
|
||||
"confidence": "high"
|
||||
}
|
||||
],
|
||||
"unanswered_threads": [
|
||||
{
|
||||
"question": "What is the actual type of items in the array?",
|
||||
"starting_point": "Check where items are allocated and how they're used",
|
||||
"priority": "medium",
|
||||
"context": "Currently typed as void** but may be more specific struct"
|
||||
},
|
||||
{
|
||||
"question": "Should item_list structure be applied at allocation sites?",
|
||||
"starting_point": "Find where item_list structures are created (malloc calls)",
|
||||
"priority": "low",
|
||||
"context": "Applying structure type at allocation improves consistency"
|
||||
},
|
||||
{
|
||||
"question": "Are there other functions using this structure that need fixing?",
|
||||
"starting_point": "Search for similar offset access patterns (param+0, param+4, param+8)",
|
||||
"priority": "medium",
|
||||
"context": "Consistent type usage across codebase aids understanding"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key Takeaways from Examples
|
||||
|
||||
### Common Patterns Across Investigations
|
||||
|
||||
1. **Start broad, narrow focus**
|
||||
- Search/scan first
|
||||
- Identify candidates
|
||||
- Zoom into specific functions
|
||||
|
||||
2. **Iterate: Read → Improve → Verify**
|
||||
- Get decompilation
|
||||
- Rename/retype
|
||||
- Re-read to confirm improvement
|
||||
|
||||
3. **Follow the evidence**
|
||||
- Cross-references show usage
|
||||
- Memory reads reveal constants
|
||||
- Pattern matching confirms algorithms
|
||||
|
||||
4. **Document as you go**
|
||||
- Bookmarks for waypoints
|
||||
- Comments for findings
|
||||
- Keeps investigation organized
|
||||
|
||||
5. **Return actionable threads**
|
||||
- Always have next steps
|
||||
- Specific starting points
|
||||
- Prioritized by importance
|
||||
|
||||
### Tool Call Efficiency
|
||||
|
||||
Each example stayed within 10-15 tool calls:
|
||||
- Example 1: 10 calls
|
||||
- Example 2: 10 calls
|
||||
- Example 3: 13 calls
|
||||
- Example 4: 13 calls
|
||||
|
||||
This demonstrates staying focused and efficient while still gathering sufficient evidence and making meaningful improvements.
|
||||
|
||||
### Evidence-Based Conclusions
|
||||
|
||||
Every answer includes:
|
||||
- Specific addresses
|
||||
- Code patterns or constants found
|
||||
- Cross-reference evidence
|
||||
- Confidence level with rationale
|
||||
|
||||
This makes findings verifiable and trustworthy.
|
||||
720
skills/deep-analysis/patterns.md
Normal file
720
skills/deep-analysis/patterns.md
Normal file
@@ -0,0 +1,720 @@
|
||||
# Reverse Engineering Patterns Reference
|
||||
|
||||
This document contains higher-level patterns and concepts to recognize during deep analysis. Focus on algorithmic patterns, behavioral patterns, and code structure rather than platform-specific implementation details.
|
||||
|
||||
## Cryptographic Algorithm Patterns
|
||||
|
||||
### Block Cipher Recognition
|
||||
|
||||
**Conceptual characteristics:**
|
||||
- **Substitution-Permutation Network (SPN)**: Repeated rounds of substitution (S-boxes) and permutation (bit shuffling)
|
||||
- **Feistel Network**: Split data in half, operate on one half using the other as key input, swap halves, repeat
|
||||
- **Fixed block size**: Typically 64 bits (DES, Blowfish) or 128 bits (AES)
|
||||
- **Multiple rounds**: 8-16+ iterations of core transformation
|
||||
- **Key schedule**: Derive round keys from master key
|
||||
|
||||
**What to look for in decompiled code:**
|
||||
```
|
||||
Nested loops:
|
||||
Outer: rounds (8, 10, 12, 14, 16, 32 iterations)
|
||||
Inner: processing blocks of fixed size
|
||||
|
||||
Array lookups (S-boxes):
|
||||
result = table[input_byte]
|
||||
Often 256-element arrays (0x100 size)
|
||||
|
||||
Bit manipulation:
|
||||
XOR, rotation (>> combined with <<), permutation
|
||||
|
||||
State updates:
|
||||
Array or struct representing current cipher state
|
||||
Transformed each round
|
||||
```
|
||||
|
||||
**Telltale signs:**
|
||||
- Large constant arrays (256+ bytes) that look like random data
|
||||
- Fixed iteration counts (not data-dependent)
|
||||
- Heavy use of XOR operations
|
||||
- Byte-level array indexing: `array[data[i]]`
|
||||
|
||||
**Investigation strategy:**
|
||||
1. `read-memory` at constant arrays - compare to known S-boxes
|
||||
2. Count loop iterations - indicates cipher type/key size
|
||||
3. `search-strings-regex` for algorithm names
|
||||
4. Check cross-references to constants - find cipher initialization
|
||||
|
||||
### Stream Cipher Recognition
|
||||
|
||||
**Conceptual characteristics:**
|
||||
- **Keystream generation**: Produce pseudo-random byte stream from key
|
||||
- **Simple combination**: XOR plaintext with keystream
|
||||
- **State-based**: Internal state evolves as keystream is produced
|
||||
- **No fixed blocks**: Can encrypt arbitrary lengths
|
||||
|
||||
**What to look for:**
|
||||
```
|
||||
State initialization:
|
||||
Array or struct setup from key
|
||||
Often 256-byte arrays
|
||||
|
||||
Keystream generation loop:
|
||||
State updates via modular arithmetic
|
||||
Index computations: i = (i + 1) % N
|
||||
Swap operations common
|
||||
|
||||
XOR combination:
|
||||
output[i] = input[i] ^ keystream[i]
|
||||
Simple, obvious pattern
|
||||
```
|
||||
|
||||
**Telltale signs:**
|
||||
- Array swap operations: `temp = a[i]; a[i] = a[j]; a[j] = temp`
|
||||
- Modulo operations: `% 256` or `& 0xFF`
|
||||
- XOR in simple loop
|
||||
- Smaller code footprint than block ciphers (no large constants)
|
||||
|
||||
### Public Key Cryptography Recognition
|
||||
|
||||
**Conceptual characteristics:**
|
||||
- **Large integer arithmetic**: Numbers hundreds or thousands of bits
|
||||
- **Modular exponentiation**: `result = base^exponent mod modulus`
|
||||
- **Performance**: Very slow compared to symmetric crypto (indicates usage for key exchange, not bulk data)
|
||||
|
||||
**What to look for:**
|
||||
```
|
||||
Multi-precision arithmetic:
|
||||
Arrays representing big integers
|
||||
Functions for add/subtract/multiply on arrays
|
||||
|
||||
Square-and-multiply pattern:
|
||||
Loop over exponent bits
|
||||
Square operation each iteration
|
||||
Conditional multiply based on bit value
|
||||
|
||||
Modulo operations on large numbers:
|
||||
Division with large divisors
|
||||
Barrett reduction or Montgomery multiplication
|
||||
```
|
||||
|
||||
**Telltale signs:**
|
||||
- Very large buffers (128, 256, 512 bytes+)
|
||||
- Bit-by-bit exponent processing
|
||||
- Characteristic magic constants (e.g., 0x10001 = 65537 for RSA)
|
||||
- Slow execution (thousands of operations per byte)
|
||||
|
||||
### Hash Function Recognition
|
||||
|
||||
**Conceptual characteristics:**
|
||||
- **Compression function**: Transform fixed-size input to fixed-size output
|
||||
- **Block processing**: Process data in chunks (512 bits typical)
|
||||
- **State accumulation**: Running state updated with each block
|
||||
- **Padding**: Add bytes to make input multiple of block size
|
||||
- **One-way**: Lots of mixing, no reversibility
|
||||
|
||||
**What to look for:**
|
||||
```
|
||||
Initialization:
|
||||
Fixed magic constants
|
||||
MD5: 0x67452301, 0xefcdab89, 0x98badcfe, 0x10325476
|
||||
SHA-1: 0x67452301, 0xefcdab89, 0x98badcfe, 0x10325476, 0xc3d2e1f0
|
||||
SHA-256: 8 different constants
|
||||
|
||||
Round function:
|
||||
Fixed iteration count (64, 80 rounds)
|
||||
Lots of bitwise operations (rotations, XOR, AND, OR)
|
||||
State mixing (each output bit depends on many input bits)
|
||||
|
||||
Padding logic:
|
||||
Append 0x80 byte
|
||||
Length encoding at end
|
||||
```
|
||||
|
||||
**Telltale signs:**
|
||||
- Characteristic initialization constants
|
||||
- Fixed 64 or 80 round loops
|
||||
- Bitwise rotation: `(x << n) | (x >> (32-n))`
|
||||
- Message schedule computation (W array expansion)
|
||||
|
||||
### Simple XOR Obfuscation
|
||||
|
||||
**Conceptual characteristics:**
|
||||
- **Trivial operation**: `output = input XOR key`
|
||||
- **Symmetric**: Encryption and decryption identical
|
||||
- **Weak security**: Easy to break, used for obfuscation not protection
|
||||
|
||||
**What to look for:**
|
||||
```
|
||||
Single-byte key:
|
||||
for (i = 0; i < len; i++)
|
||||
data[i] ^= 0x42;
|
||||
|
||||
Multi-byte key:
|
||||
for (i = 0; i < len; i++)
|
||||
data[i] ^= key[i % keylen];
|
||||
|
||||
Rolling key:
|
||||
key = seed;
|
||||
for (i = 0; i < len; i++) {
|
||||
data[i] ^= key;
|
||||
key = update_key(key); // LCG or similar
|
||||
}
|
||||
```
|
||||
|
||||
**Telltale signs:**
|
||||
- Very short functions (5-10 lines)
|
||||
- XOR with constants or simple patterns
|
||||
- Often applied to strings or config data
|
||||
- Paired with static data arrays that need decoding
|
||||
|
||||
---
|
||||
|
||||
## Control Flow Patterns
|
||||
|
||||
### State Machine Recognition
|
||||
|
||||
**Conceptual characteristics:**
|
||||
- **Explicit states**: Enumeration or integer representing current state
|
||||
- **State transitions**: Switch/if-else on state variable
|
||||
- **Event-driven**: External input triggers transitions
|
||||
|
||||
**What to look for:**
|
||||
```
|
||||
State variable:
|
||||
int state = INITIAL_STATE;
|
||||
|
||||
Dispatch loop:
|
||||
while (running) {
|
||||
switch (state) {
|
||||
case STATE_A: /* handle A, maybe transition to B */
|
||||
case STATE_B: /* handle B, maybe transition to C */
|
||||
...
|
||||
}
|
||||
}
|
||||
|
||||
State tables (more advanced):
|
||||
next_state = transition_table[current_state][input];
|
||||
action = action_table[current_state][input];
|
||||
```
|
||||
|
||||
**Telltale signs:**
|
||||
- Large switch statements with many cases
|
||||
- State variable repeatedly assigned new values
|
||||
- Enumeration or #define constants for states
|
||||
- Patterns like IDLE, CONNECTING, CONNECTED, DISCONNECTED
|
||||
|
||||
**Common uses:**
|
||||
- Network protocol handling
|
||||
- Parser implementation
|
||||
- UI event handling
|
||||
- Command processing
|
||||
|
||||
### Command Dispatcher Recognition
|
||||
|
||||
**Conceptual characteristics:**
|
||||
- **Command codes**: Numeric identifiers for operations
|
||||
- **Handler lookup**: Map command ID to handler function
|
||||
- **Extensibility**: Easy to add new commands
|
||||
|
||||
**What to look for:**
|
||||
```
|
||||
Command dispatch table:
|
||||
switch (command_id) {
|
||||
case CMD_EXECUTE: handle_execute(params); break;
|
||||
case CMD_UPLOAD: handle_upload(params); break;
|
||||
case CMD_DOWNLOAD: handle_download(params); break;
|
||||
...
|
||||
}
|
||||
|
||||
Function pointer table:
|
||||
handler = command_table[command_id];
|
||||
handler(params);
|
||||
|
||||
String-based dispatch:
|
||||
if (strcmp(cmd, "exec") == 0) handle_execute();
|
||||
else if (strcmp(cmd, "upload") == 0) handle_upload();
|
||||
```
|
||||
|
||||
**Telltale signs:**
|
||||
- Large switch on integer or string
|
||||
- Array of function pointers
|
||||
- Command ID constants or strings
|
||||
- Common command names: exec, upload, download, shell, sleep, etc.
|
||||
|
||||
**Common uses:**
|
||||
- Remote access tools (RAT)
|
||||
- Backdoor command handling
|
||||
- Plugin systems
|
||||
- IPC/RPC mechanisms
|
||||
|
||||
### Callback Pattern Recognition
|
||||
|
||||
**Conceptual characteristics:**
|
||||
- **Inversion of control**: Library calls your code, not you calling library
|
||||
- **Function pointers**: Pass address of your function to framework
|
||||
- **Asynchronous**: Often used for async operations
|
||||
|
||||
**What to look for:**
|
||||
```
|
||||
Callback registration:
|
||||
library_set_callback(MY_EVENT, my_handler_function);
|
||||
|
||||
Callback function signature:
|
||||
void my_callback(event_type, data, user_context)
|
||||
|
||||
Common callback contexts:
|
||||
- Network data received
|
||||
- Timer expired
|
||||
- File I/O complete
|
||||
- User interaction
|
||||
```
|
||||
|
||||
**Telltale signs:**
|
||||
- Function pointers passed as parameters
|
||||
- Functions with generic names like "handler", "callback", "on_event"
|
||||
- Often have opaque pointer parameter (void* user_data)
|
||||
|
||||
### Loop Patterns
|
||||
|
||||
**Simple iteration:**
|
||||
```
|
||||
for (i = 0; i < count; i++)
|
||||
- Linear processing
|
||||
- Transform/encrypt each element
|
||||
```
|
||||
|
||||
**Nested loops (2D processing):**
|
||||
```
|
||||
for (i = 0; i < height; i++)
|
||||
for (j = 0; j < width; j++)
|
||||
- Image processing
|
||||
- Matrix operations
|
||||
- Block cipher on 2D state
|
||||
```
|
||||
|
||||
**Do-while patterns:**
|
||||
```
|
||||
do {
|
||||
read_chunk();
|
||||
process_chunk();
|
||||
} while (more_data);
|
||||
- File/network processing
|
||||
- Guaranteed first execution
|
||||
```
|
||||
|
||||
**While-true with break:**
|
||||
```
|
||||
while (1) {
|
||||
if (condition) break;
|
||||
process();
|
||||
}
|
||||
- Server loops
|
||||
- State machines
|
||||
- Event loops
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Data Structure Patterns
|
||||
|
||||
### Buffer Management
|
||||
|
||||
**Fixed-size buffers:**
|
||||
```
|
||||
char buffer[1024];
|
||||
read(fd, buffer, sizeof(buffer));
|
||||
- Stack-allocated
|
||||
- Size known at compile time
|
||||
- Often seen with unsafe functions (strcpy, sprintf)
|
||||
```
|
||||
|
||||
**Dynamic buffers:**
|
||||
```
|
||||
size = calculate_size();
|
||||
buffer = malloc(size);
|
||||
- Heap-allocated
|
||||
- Size determined at runtime
|
||||
- Look for malloc/free pairs or memory leaks
|
||||
```
|
||||
|
||||
**Ring buffers (circular):**
|
||||
```
|
||||
write_pos = (write_pos + 1) % BUFFER_SIZE;
|
||||
read_pos = (read_pos + 1) % BUFFER_SIZE;
|
||||
- Fixed-size, reusable
|
||||
- Modulo arithmetic for wrap-around
|
||||
- Used in queues, streaming
|
||||
```
|
||||
|
||||
### Linked Structures
|
||||
|
||||
**Linked list:**
|
||||
```
|
||||
struct node {
|
||||
data_type data;
|
||||
struct node* next; // singly-linked
|
||||
struct node* prev; // doubly-linked (optional)
|
||||
};
|
||||
```
|
||||
|
||||
**Recognition:**
|
||||
- Pointer fields in structures
|
||||
- Traversal loops: `while (node != NULL) { node = node->next; }`
|
||||
- Insertion/deletion operations
|
||||
|
||||
**Tree structures:**
|
||||
```
|
||||
struct tree_node {
|
||||
data_type data;
|
||||
struct tree_node* left;
|
||||
struct tree_node* right;
|
||||
};
|
||||
```
|
||||
|
||||
**Recognition:**
|
||||
- Two pointer fields (left/right)
|
||||
- Recursive functions
|
||||
- Comparison operations for ordering
|
||||
|
||||
### String Handling Patterns
|
||||
|
||||
**Length-prefixed strings:**
|
||||
```
|
||||
struct {
|
||||
uint32_t length;
|
||||
char data[];
|
||||
}
|
||||
```
|
||||
|
||||
**Null-terminated strings:**
|
||||
```
|
||||
while (*str != '\0') str++; // strlen pattern
|
||||
```
|
||||
|
||||
**Wide strings:**
|
||||
```
|
||||
wchar_t* wstr;
|
||||
uint16_t* utf16_str;
|
||||
- 2 or 4 bytes per character
|
||||
- String operations work on larger units
|
||||
```
|
||||
|
||||
**Detection:**
|
||||
- Character-by-character loops
|
||||
- Null byte checks
|
||||
- String manipulation function calls
|
||||
- UTF-8/UTF-16 encoding/decoding
|
||||
|
||||
---
|
||||
|
||||
## Network Protocol Patterns
|
||||
|
||||
### Protocol Structure Recognition
|
||||
|
||||
**Request-Response:**
|
||||
```
|
||||
send_request(command, params);
|
||||
response = receive_response();
|
||||
process_response(response);
|
||||
```
|
||||
|
||||
**Characteristics:**
|
||||
- Client initiates
|
||||
- Server responds
|
||||
- Blocking or polling wait for response
|
||||
- Examples: HTTP, DNS, RPC
|
||||
|
||||
**Continuous Stream:**
|
||||
```
|
||||
while (connected) {
|
||||
data = receive_data();
|
||||
process_chunk(data);
|
||||
}
|
||||
```
|
||||
|
||||
**Characteristics:**
|
||||
- Persistent connection
|
||||
- Data flows continuously
|
||||
- No strict request-response pairing
|
||||
- Examples: video streaming, log shipping
|
||||
|
||||
**Message-Oriented:**
|
||||
```
|
||||
while (true) {
|
||||
message = receive_message(); // reads length, then payload
|
||||
dispatch_message(message);
|
||||
}
|
||||
```
|
||||
|
||||
**Characteristics:**
|
||||
- Discrete messages with boundaries
|
||||
- Length prefix or delimiter
|
||||
- Message type/ID field
|
||||
- Examples: custom C2 protocols, message queues
|
||||
|
||||
### Serialization Patterns
|
||||
|
||||
**Binary serialization:**
|
||||
```
|
||||
Write primitives in sequence:
|
||||
write_uint32(length);
|
||||
write_bytes(data, length);
|
||||
write_uint8(flags);
|
||||
```
|
||||
|
||||
**Characteristics:**
|
||||
- Dense, efficient
|
||||
- Fixed byte order (endianness)
|
||||
- Magic numbers for structure identification
|
||||
- Version fields for compatibility
|
||||
|
||||
**Text-based serialization:**
|
||||
```
|
||||
JSON: {"key": "value", "num": 42}
|
||||
XML: <root><item>value</item></root>
|
||||
```
|
||||
|
||||
**Characteristics:**
|
||||
- Human-readable
|
||||
- Delimiter characters ({}, <>, quotes)
|
||||
- String parsing and generation code
|
||||
- Less efficient but more flexible
|
||||
|
||||
**Detection strategies:**
|
||||
1. Look for sprintf/snprintf for text generation
|
||||
2. Check for JSON/XML parsing libraries
|
||||
3. Find memcpy sequences for binary packing
|
||||
4. Identify byte-swapping (htonl/ntohl pattern)
|
||||
|
||||
### Connection Management
|
||||
|
||||
**Connection establishment pattern:**
|
||||
```
|
||||
Create socket
|
||||
→ Connect to server
|
||||
→ Send handshake/authentication
|
||||
→ Receive acknowledgment
|
||||
→ Enter main communication loop
|
||||
```
|
||||
|
||||
**Connection pooling pattern:**
|
||||
```
|
||||
maintain pool of N connections
|
||||
when request arrives:
|
||||
if free_connection available:
|
||||
use it
|
||||
else:
|
||||
create new connection (up to max)
|
||||
after request:
|
||||
return connection to pool
|
||||
```
|
||||
|
||||
**Reconnection pattern:**
|
||||
```
|
||||
max_retries = 5;
|
||||
while (retries < max_retries) {
|
||||
if (connect_success) break;
|
||||
sleep(backoff_time);
|
||||
backoff_time *= 2; // exponential backoff
|
||||
retries++;
|
||||
}
|
||||
```
|
||||
|
||||
**Telltale signs:**
|
||||
- Retry loops with delays
|
||||
- Connection state checking
|
||||
- Timeout handling
|
||||
- Fallback server lists
|
||||
|
||||
---
|
||||
|
||||
## Behavioral Patterns
|
||||
|
||||
### Encryption + Network (Data Exfiltration)
|
||||
|
||||
**Pattern sequence:**
|
||||
```
|
||||
1. Collect files/data
|
||||
2. Compress (optional)
|
||||
3. Encrypt
|
||||
4. Send over network
|
||||
5. Clean up local copies
|
||||
```
|
||||
|
||||
**What to look for:**
|
||||
- File enumeration → encryption function → network send
|
||||
- Temporary file creation → processing → deletion
|
||||
- Cross-reference encryption function to network functions
|
||||
|
||||
### Decrypt + Execute (Payload Loading)
|
||||
|
||||
**Pattern sequence:**
|
||||
```
|
||||
1. Read encrypted payload from resource/file/network
|
||||
2. Decrypt in memory
|
||||
3. Execute (direct call, injection, or create process)
|
||||
```
|
||||
|
||||
**What to look for:**
|
||||
- Buffer allocated with execute permissions
|
||||
- Decryption function → function pointer cast → indirect call
|
||||
- XOR loop → memory copy → execution transfer
|
||||
|
||||
### Time-Based Triggering
|
||||
|
||||
**Pattern:**
|
||||
```
|
||||
while (true) {
|
||||
current_time = get_time();
|
||||
if (current_time >= trigger_time) {
|
||||
execute_payload();
|
||||
break;
|
||||
}
|
||||
sleep(check_interval);
|
||||
}
|
||||
```
|
||||
|
||||
**What to look for:**
|
||||
- Time/date API calls
|
||||
- Comparison with specific dates
|
||||
- Sleep/delay in loops
|
||||
- Activation conditions based on temporal logic
|
||||
|
||||
### Polymorphic Behavior
|
||||
|
||||
**Pattern:**
|
||||
```
|
||||
code_variant = select_variant(seed);
|
||||
decrypt_code(code_variant);
|
||||
execute_decrypted_code();
|
||||
re-encrypt_code(new_seed);
|
||||
```
|
||||
|
||||
**What to look for:**
|
||||
- Self-modifying code
|
||||
- Multiple code variants
|
||||
- Decryption before execution
|
||||
- Encryption after execution
|
||||
- Memory protection changes (read/write/execute toggling)
|
||||
|
||||
---
|
||||
|
||||
## Code Quality Indicators
|
||||
|
||||
### Hand-Written vs. Generated Code
|
||||
|
||||
**Hand-written characteristics:**
|
||||
- Inconsistent formatting
|
||||
- Comments (if not stripped)
|
||||
- Meaningful variable names (if symbols present)
|
||||
- Idiomatic patterns for the language
|
||||
- Error handling mixed with logic
|
||||
|
||||
**Generated/compiled characteristics:**
|
||||
- Very consistent structure
|
||||
- Compiler optimization patterns
|
||||
- Systematic variable naming (if stripped)
|
||||
- Uniform error handling
|
||||
- Recognizable library code patterns
|
||||
|
||||
### Obfuscated Code Indicators
|
||||
|
||||
**Deliberately obscured:**
|
||||
- Meaningless variable/function names
|
||||
- Unnecessary complexity
|
||||
- Dead code branches
|
||||
- Opaque predicates (always true/false conditions)
|
||||
- Indirect calls through pointer manipulations
|
||||
- String obfuscation
|
||||
|
||||
**Compiler optimizations (benign):**
|
||||
- Loop unrolling
|
||||
- Function inlining
|
||||
- Constant folding
|
||||
- Dead code elimination
|
||||
- Register allocation patterns
|
||||
|
||||
**Distinction:** Obfuscation creates complexity without performance benefit; optimization creates complexity for performance.
|
||||
|
||||
### Library Code vs. Custom Code
|
||||
|
||||
**Library code:**
|
||||
- Standard algorithms (qsort, hash functions)
|
||||
- Consistent with open-source implementations
|
||||
- Well-structured, parameterized
|
||||
- Minimal dependencies on surrounding code
|
||||
|
||||
**Custom code:**
|
||||
- Unique patterns
|
||||
- Integrated with application logic
|
||||
- Application-specific data structures
|
||||
- More likely to have bugs/vulnerabilities
|
||||
|
||||
**Investigation priority:** Focus on custom code - that's where unique behavior lives.
|
||||
|
||||
---
|
||||
|
||||
## Using This Reference
|
||||
|
||||
### Pattern Matching Workflow
|
||||
|
||||
1. **Observe structure** - What loops, branches, data structures appear?
|
||||
2. **Compare to patterns** - Does this match known algorithmic patterns?
|
||||
3. **Verify with evidence** - Check for characteristic constants, operations, structure
|
||||
4. **Document pattern** - Bookmark with pattern name for reference
|
||||
5. **Improve code** - Rename variables/functions to reflect pattern (e.g., `aes_encrypt`, `rc4_keystream`)
|
||||
|
||||
### Example Investigation
|
||||
|
||||
```
|
||||
Observation: Function with nested loops, array lookups, XOR operations
|
||||
|
||||
Compare: Matches "Block Cipher" or "Stream Cipher" patterns
|
||||
|
||||
Verify:
|
||||
- Check for large constant array (S-box?)
|
||||
- Count outer loop iterations (rounds?)
|
||||
- Look for key schedule function
|
||||
|
||||
Find: 256-byte array starting 63 7c 77 7b...
|
||||
14 iterations in outer loop
|
||||
|
||||
Conclusion: AES-256 (14 rounds, standard S-box)
|
||||
|
||||
Improve:
|
||||
rename-variables: state→aes_state, table→aes_sbox
|
||||
set-function-prototype: void aes_encrypt(uint8_t* data, uint8_t* key)
|
||||
set-comment: "AES-256 encryption using standard S-box"
|
||||
```
|
||||
|
||||
### Pattern Combination
|
||||
|
||||
Real-world code combines multiple patterns:
|
||||
|
||||
**Example: Malware C2 Communication**
|
||||
```
|
||||
[Command Dispatcher] receives command from network
|
||||
↓
|
||||
[State Machine] tracks connection state
|
||||
↓
|
||||
[Callback Functions] handle specific commands
|
||||
↓
|
||||
[Buffer Management] manages received data
|
||||
↓
|
||||
[Encryption] protects command payloads
|
||||
```
|
||||
|
||||
When you identify one pattern, look for related patterns in:
|
||||
- Functions that call this one (higher-level orchestration)
|
||||
- Functions called by this one (lower-level primitives)
|
||||
- Cross-references to shared data structures
|
||||
|
||||
### Progressive Understanding
|
||||
|
||||
Don't need to identify every pattern perfectly:
|
||||
|
||||
**First pass:** "This looks like crypto (lots of XOR and loops)"
|
||||
**Second pass:** "Probably a stream cipher (simple state, no large tables)"
|
||||
**Third pass:** "Matches RC4 pattern (256-byte init, swap operations)"
|
||||
**Fourth pass:** "Confirmed RC4 (found KSA and PRGA pattern)"
|
||||
|
||||
Each pass refines understanding and guides further investigation.
|
||||
Reference in New Issue
Block a user