Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:17:15 +08:00
commit 31df8711f2
13 changed files with 6197 additions and 0 deletions

View File

@@ -0,0 +1,720 @@
# Reverse Engineering Patterns Reference
This document contains higher-level patterns and concepts to recognize during deep analysis. Focus on algorithmic patterns, behavioral patterns, and code structure rather than platform-specific implementation details.
## Cryptographic Algorithm Patterns
### Block Cipher Recognition
**Conceptual characteristics:**
- **Substitution-Permutation Network (SPN)**: Repeated rounds of substitution (S-boxes) and permutation (bit shuffling)
- **Feistel Network**: Split data in half, operate on one half using the other as key input, swap halves, repeat
- **Fixed block size**: Typically 64 bits (DES, Blowfish) or 128 bits (AES)
- **Multiple rounds**: 8-16+ iterations of core transformation
- **Key schedule**: Derive round keys from master key
**What to look for in decompiled code:**
```
Nested loops:
Outer: rounds (8, 10, 12, 14, 16, 32 iterations)
Inner: processing blocks of fixed size
Array lookups (S-boxes):
result = table[input_byte]
Often 256-element arrays (0x100 size)
Bit manipulation:
XOR, rotation (>> combined with <<), permutation
State updates:
Array or struct representing current cipher state
Transformed each round
```
**Telltale signs:**
- Large constant arrays (256+ bytes) that look like random data
- Fixed iteration counts (not data-dependent)
- Heavy use of XOR operations
- Byte-level array indexing: `array[data[i]]`
**Investigation strategy:**
1. `read-memory` at constant arrays - compare to known S-boxes
2. Count loop iterations - indicates cipher type/key size
3. `search-strings-regex` for algorithm names
4. Check cross-references to constants - find cipher initialization
### Stream Cipher Recognition
**Conceptual characteristics:**
- **Keystream generation**: Produce pseudo-random byte stream from key
- **Simple combination**: XOR plaintext with keystream
- **State-based**: Internal state evolves as keystream is produced
- **No fixed blocks**: Can encrypt arbitrary lengths
**What to look for:**
```
State initialization:
Array or struct setup from key
Often 256-byte arrays
Keystream generation loop:
State updates via modular arithmetic
Index computations: i = (i + 1) % N
Swap operations common
XOR combination:
output[i] = input[i] ^ keystream[i]
Simple, obvious pattern
```
**Telltale signs:**
- Array swap operations: `temp = a[i]; a[i] = a[j]; a[j] = temp`
- Modulo operations: `% 256` or `& 0xFF`
- XOR in simple loop
- Smaller code footprint than block ciphers (no large constants)
### Public Key Cryptography Recognition
**Conceptual characteristics:**
- **Large integer arithmetic**: Numbers hundreds or thousands of bits
- **Modular exponentiation**: `result = base^exponent mod modulus`
- **Performance**: Very slow compared to symmetric crypto (indicates usage for key exchange, not bulk data)
**What to look for:**
```
Multi-precision arithmetic:
Arrays representing big integers
Functions for add/subtract/multiply on arrays
Square-and-multiply pattern:
Loop over exponent bits
Square operation each iteration
Conditional multiply based on bit value
Modulo operations on large numbers:
Division with large divisors
Barrett reduction or Montgomery multiplication
```
**Telltale signs:**
- Very large buffers (128, 256, 512 bytes+)
- Bit-by-bit exponent processing
- Characteristic magic constants (e.g., 0x10001 = 65537 for RSA)
- Slow execution (thousands of operations per byte)
### Hash Function Recognition
**Conceptual characteristics:**
- **Compression function**: Transform fixed-size input to fixed-size output
- **Block processing**: Process data in chunks (512 bits typical)
- **State accumulation**: Running state updated with each block
- **Padding**: Add bytes to make input multiple of block size
- **One-way**: Lots of mixing, no reversibility
**What to look for:**
```
Initialization:
Fixed magic constants
MD5: 0x67452301, 0xefcdab89, 0x98badcfe, 0x10325476
SHA-1: 0x67452301, 0xefcdab89, 0x98badcfe, 0x10325476, 0xc3d2e1f0
SHA-256: 8 different constants
Round function:
Fixed iteration count (64, 80 rounds)
Lots of bitwise operations (rotations, XOR, AND, OR)
State mixing (each output bit depends on many input bits)
Padding logic:
Append 0x80 byte
Length encoding at end
```
**Telltale signs:**
- Characteristic initialization constants
- Fixed 64 or 80 round loops
- Bitwise rotation: `(x << n) | (x >> (32-n))`
- Message schedule computation (W array expansion)
### Simple XOR Obfuscation
**Conceptual characteristics:**
- **Trivial operation**: `output = input XOR key`
- **Symmetric**: Encryption and decryption identical
- **Weak security**: Easy to break, used for obfuscation not protection
**What to look for:**
```
Single-byte key:
for (i = 0; i < len; i++)
data[i] ^= 0x42;
Multi-byte key:
for (i = 0; i < len; i++)
data[i] ^= key[i % keylen];
Rolling key:
key = seed;
for (i = 0; i < len; i++) {
data[i] ^= key;
key = update_key(key); // LCG or similar
}
```
**Telltale signs:**
- Very short functions (5-10 lines)
- XOR with constants or simple patterns
- Often applied to strings or config data
- Paired with static data arrays that need decoding
---
## Control Flow Patterns
### State Machine Recognition
**Conceptual characteristics:**
- **Explicit states**: Enumeration or integer representing current state
- **State transitions**: Switch/if-else on state variable
- **Event-driven**: External input triggers transitions
**What to look for:**
```
State variable:
int state = INITIAL_STATE;
Dispatch loop:
while (running) {
switch (state) {
case STATE_A: /* handle A, maybe transition to B */
case STATE_B: /* handle B, maybe transition to C */
...
}
}
State tables (more advanced):
next_state = transition_table[current_state][input];
action = action_table[current_state][input];
```
**Telltale signs:**
- Large switch statements with many cases
- State variable repeatedly assigned new values
- Enumeration or #define constants for states
- Patterns like IDLE, CONNECTING, CONNECTED, DISCONNECTED
**Common uses:**
- Network protocol handling
- Parser implementation
- UI event handling
- Command processing
### Command Dispatcher Recognition
**Conceptual characteristics:**
- **Command codes**: Numeric identifiers for operations
- **Handler lookup**: Map command ID to handler function
- **Extensibility**: Easy to add new commands
**What to look for:**
```
Command dispatch table:
switch (command_id) {
case CMD_EXECUTE: handle_execute(params); break;
case CMD_UPLOAD: handle_upload(params); break;
case CMD_DOWNLOAD: handle_download(params); break;
...
}
Function pointer table:
handler = command_table[command_id];
handler(params);
String-based dispatch:
if (strcmp(cmd, "exec") == 0) handle_execute();
else if (strcmp(cmd, "upload") == 0) handle_upload();
```
**Telltale signs:**
- Large switch on integer or string
- Array of function pointers
- Command ID constants or strings
- Common command names: exec, upload, download, shell, sleep, etc.
**Common uses:**
- Remote access tools (RAT)
- Backdoor command handling
- Plugin systems
- IPC/RPC mechanisms
### Callback Pattern Recognition
**Conceptual characteristics:**
- **Inversion of control**: Library calls your code, not you calling library
- **Function pointers**: Pass address of your function to framework
- **Asynchronous**: Often used for async operations
**What to look for:**
```
Callback registration:
library_set_callback(MY_EVENT, my_handler_function);
Callback function signature:
void my_callback(event_type, data, user_context)
Common callback contexts:
- Network data received
- Timer expired
- File I/O complete
- User interaction
```
**Telltale signs:**
- Function pointers passed as parameters
- Functions with generic names like "handler", "callback", "on_event"
- Often have opaque pointer parameter (void* user_data)
### Loop Patterns
**Simple iteration:**
```
for (i = 0; i < count; i++)
- Linear processing
- Transform/encrypt each element
```
**Nested loops (2D processing):**
```
for (i = 0; i < height; i++)
for (j = 0; j < width; j++)
- Image processing
- Matrix operations
- Block cipher on 2D state
```
**Do-while patterns:**
```
do {
read_chunk();
process_chunk();
} while (more_data);
- File/network processing
- Guaranteed first execution
```
**While-true with break:**
```
while (1) {
if (condition) break;
process();
}
- Server loops
- State machines
- Event loops
```
---
## Data Structure Patterns
### Buffer Management
**Fixed-size buffers:**
```
char buffer[1024];
read(fd, buffer, sizeof(buffer));
- Stack-allocated
- Size known at compile time
- Often seen with unsafe functions (strcpy, sprintf)
```
**Dynamic buffers:**
```
size = calculate_size();
buffer = malloc(size);
- Heap-allocated
- Size determined at runtime
- Look for malloc/free pairs or memory leaks
```
**Ring buffers (circular):**
```
write_pos = (write_pos + 1) % BUFFER_SIZE;
read_pos = (read_pos + 1) % BUFFER_SIZE;
- Fixed-size, reusable
- Modulo arithmetic for wrap-around
- Used in queues, streaming
```
### Linked Structures
**Linked list:**
```
struct node {
data_type data;
struct node* next; // singly-linked
struct node* prev; // doubly-linked (optional)
};
```
**Recognition:**
- Pointer fields in structures
- Traversal loops: `while (node != NULL) { node = node->next; }`
- Insertion/deletion operations
**Tree structures:**
```
struct tree_node {
data_type data;
struct tree_node* left;
struct tree_node* right;
};
```
**Recognition:**
- Two pointer fields (left/right)
- Recursive functions
- Comparison operations for ordering
### String Handling Patterns
**Length-prefixed strings:**
```
struct {
uint32_t length;
char data[];
}
```
**Null-terminated strings:**
```
while (*str != '\0') str++; // strlen pattern
```
**Wide strings:**
```
wchar_t* wstr;
uint16_t* utf16_str;
- 2 or 4 bytes per character
- String operations work on larger units
```
**Detection:**
- Character-by-character loops
- Null byte checks
- String manipulation function calls
- UTF-8/UTF-16 encoding/decoding
---
## Network Protocol Patterns
### Protocol Structure Recognition
**Request-Response:**
```
send_request(command, params);
response = receive_response();
process_response(response);
```
**Characteristics:**
- Client initiates
- Server responds
- Blocking or polling wait for response
- Examples: HTTP, DNS, RPC
**Continuous Stream:**
```
while (connected) {
data = receive_data();
process_chunk(data);
}
```
**Characteristics:**
- Persistent connection
- Data flows continuously
- No strict request-response pairing
- Examples: video streaming, log shipping
**Message-Oriented:**
```
while (true) {
message = receive_message(); // reads length, then payload
dispatch_message(message);
}
```
**Characteristics:**
- Discrete messages with boundaries
- Length prefix or delimiter
- Message type/ID field
- Examples: custom C2 protocols, message queues
### Serialization Patterns
**Binary serialization:**
```
Write primitives in sequence:
write_uint32(length);
write_bytes(data, length);
write_uint8(flags);
```
**Characteristics:**
- Dense, efficient
- Fixed byte order (endianness)
- Magic numbers for structure identification
- Version fields for compatibility
**Text-based serialization:**
```
JSON: {"key": "value", "num": 42}
XML: <root><item>value</item></root>
```
**Characteristics:**
- Human-readable
- Delimiter characters ({}, <>, quotes)
- String parsing and generation code
- Less efficient but more flexible
**Detection strategies:**
1. Look for sprintf/snprintf for text generation
2. Check for JSON/XML parsing libraries
3. Find memcpy sequences for binary packing
4. Identify byte-swapping (htonl/ntohl pattern)
### Connection Management
**Connection establishment pattern:**
```
Create socket
→ Connect to server
→ Send handshake/authentication
→ Receive acknowledgment
→ Enter main communication loop
```
**Connection pooling pattern:**
```
maintain pool of N connections
when request arrives:
if free_connection available:
use it
else:
create new connection (up to max)
after request:
return connection to pool
```
**Reconnection pattern:**
```
max_retries = 5;
while (retries < max_retries) {
if (connect_success) break;
sleep(backoff_time);
backoff_time *= 2; // exponential backoff
retries++;
}
```
**Telltale signs:**
- Retry loops with delays
- Connection state checking
- Timeout handling
- Fallback server lists
---
## Behavioral Patterns
### Encryption + Network (Data Exfiltration)
**Pattern sequence:**
```
1. Collect files/data
2. Compress (optional)
3. Encrypt
4. Send over network
5. Clean up local copies
```
**What to look for:**
- File enumeration → encryption function → network send
- Temporary file creation → processing → deletion
- Cross-reference encryption function to network functions
### Decrypt + Execute (Payload Loading)
**Pattern sequence:**
```
1. Read encrypted payload from resource/file/network
2. Decrypt in memory
3. Execute (direct call, injection, or create process)
```
**What to look for:**
- Buffer allocated with execute permissions
- Decryption function → function pointer cast → indirect call
- XOR loop → memory copy → execution transfer
### Time-Based Triggering
**Pattern:**
```
while (true) {
current_time = get_time();
if (current_time >= trigger_time) {
execute_payload();
break;
}
sleep(check_interval);
}
```
**What to look for:**
- Time/date API calls
- Comparison with specific dates
- Sleep/delay in loops
- Activation conditions based on temporal logic
### Polymorphic Behavior
**Pattern:**
```
code_variant = select_variant(seed);
decrypt_code(code_variant);
execute_decrypted_code();
re-encrypt_code(new_seed);
```
**What to look for:**
- Self-modifying code
- Multiple code variants
- Decryption before execution
- Encryption after execution
- Memory protection changes (read/write/execute toggling)
---
## Code Quality Indicators
### Hand-Written vs. Generated Code
**Hand-written characteristics:**
- Inconsistent formatting
- Comments (if not stripped)
- Meaningful variable names (if symbols present)
- Idiomatic patterns for the language
- Error handling mixed with logic
**Generated/compiled characteristics:**
- Very consistent structure
- Compiler optimization patterns
- Systematic variable naming (if stripped)
- Uniform error handling
- Recognizable library code patterns
### Obfuscated Code Indicators
**Deliberately obscured:**
- Meaningless variable/function names
- Unnecessary complexity
- Dead code branches
- Opaque predicates (always true/false conditions)
- Indirect calls through pointer manipulations
- String obfuscation
**Compiler optimizations (benign):**
- Loop unrolling
- Function inlining
- Constant folding
- Dead code elimination
- Register allocation patterns
**Distinction:** Obfuscation creates complexity without performance benefit; optimization creates complexity for performance.
### Library Code vs. Custom Code
**Library code:**
- Standard algorithms (qsort, hash functions)
- Consistent with open-source implementations
- Well-structured, parameterized
- Minimal dependencies on surrounding code
**Custom code:**
- Unique patterns
- Integrated with application logic
- Application-specific data structures
- More likely to have bugs/vulnerabilities
**Investigation priority:** Focus on custom code - that's where unique behavior lives.
---
## Using This Reference
### Pattern Matching Workflow
1. **Observe structure** - What loops, branches, data structures appear?
2. **Compare to patterns** - Does this match known algorithmic patterns?
3. **Verify with evidence** - Check for characteristic constants, operations, structure
4. **Document pattern** - Bookmark with pattern name for reference
5. **Improve code** - Rename variables/functions to reflect pattern (e.g., `aes_encrypt`, `rc4_keystream`)
### Example Investigation
```
Observation: Function with nested loops, array lookups, XOR operations
Compare: Matches "Block Cipher" or "Stream Cipher" patterns
Verify:
- Check for large constant array (S-box?)
- Count outer loop iterations (rounds?)
- Look for key schedule function
Find: 256-byte array starting 63 7c 77 7b...
14 iterations in outer loop
Conclusion: AES-256 (14 rounds, standard S-box)
Improve:
rename-variables: state→aes_state, table→aes_sbox
set-function-prototype: void aes_encrypt(uint8_t* data, uint8_t* key)
set-comment: "AES-256 encryption using standard S-box"
```
### Pattern Combination
Real-world code combines multiple patterns:
**Example: Malware C2 Communication**
```
[Command Dispatcher] receives command from network
[State Machine] tracks connection state
[Callback Functions] handle specific commands
[Buffer Management] manages received data
[Encryption] protects command payloads
```
When you identify one pattern, look for related patterns in:
- Functions that call this one (higher-level orchestration)
- Functions called by this one (lower-level primitives)
- Cross-references to shared data structures
### Progressive Understanding
Don't need to identify every pattern perfectly:
**First pass:** "This looks like crypto (lots of XOR and loops)"
**Second pass:** "Probably a stream cipher (simple state, no large tables)"
**Third pass:** "Matches RC4 pattern (256-byte init, swap operations)"
**Fourth pass:** "Confirmed RC4 (found KSA and PRGA pattern)"
Each pass refines understanding and guides further investigation.