17 KiB
Reverse Engineering Patterns Reference
This document contains higher-level patterns and concepts to recognize during deep analysis. Focus on algorithmic patterns, behavioral patterns, and code structure rather than platform-specific implementation details.
Cryptographic Algorithm Patterns
Block Cipher Recognition
Conceptual characteristics:
- Substitution-Permutation Network (SPN): Repeated rounds of substitution (S-boxes) and permutation (bit shuffling)
- Feistel Network: Split data in half, operate on one half using the other as key input, swap halves, repeat
- Fixed block size: Typically 64 bits (DES, Blowfish) or 128 bits (AES)
- Multiple rounds: 8-16+ iterations of core transformation
- Key schedule: Derive round keys from master key
What to look for in decompiled code:
Nested loops:
Outer: rounds (8, 10, 12, 14, 16, 32 iterations)
Inner: processing blocks of fixed size
Array lookups (S-boxes):
result = table[input_byte]
Often 256-element arrays (0x100 size)
Bit manipulation:
XOR, rotation (>> combined with <<), permutation
State updates:
Array or struct representing current cipher state
Transformed each round
Telltale signs:
- Large constant arrays (256+ bytes) that look like random data
- Fixed iteration counts (not data-dependent)
- Heavy use of XOR operations
- Byte-level array indexing:
array[data[i]]
Investigation strategy:
read-memoryat constant arrays - compare to known S-boxes- Count loop iterations - indicates cipher type/key size
search-strings-regexfor algorithm names- Check cross-references to constants - find cipher initialization
Stream Cipher Recognition
Conceptual characteristics:
- Keystream generation: Produce pseudo-random byte stream from key
- Simple combination: XOR plaintext with keystream
- State-based: Internal state evolves as keystream is produced
- No fixed blocks: Can encrypt arbitrary lengths
What to look for:
State initialization:
Array or struct setup from key
Often 256-byte arrays
Keystream generation loop:
State updates via modular arithmetic
Index computations: i = (i + 1) % N
Swap operations common
XOR combination:
output[i] = input[i] ^ keystream[i]
Simple, obvious pattern
Telltale signs:
- Array swap operations:
temp = a[i]; a[i] = a[j]; a[j] = temp - Modulo operations:
% 256or& 0xFF - XOR in simple loop
- Smaller code footprint than block ciphers (no large constants)
Public Key Cryptography Recognition
Conceptual characteristics:
- Large integer arithmetic: Numbers hundreds or thousands of bits
- Modular exponentiation:
result = base^exponent mod modulus - Performance: Very slow compared to symmetric crypto (indicates usage for key exchange, not bulk data)
What to look for:
Multi-precision arithmetic:
Arrays representing big integers
Functions for add/subtract/multiply on arrays
Square-and-multiply pattern:
Loop over exponent bits
Square operation each iteration
Conditional multiply based on bit value
Modulo operations on large numbers:
Division with large divisors
Barrett reduction or Montgomery multiplication
Telltale signs:
- Very large buffers (128, 256, 512 bytes+)
- Bit-by-bit exponent processing
- Characteristic magic constants (e.g., 0x10001 = 65537 for RSA)
- Slow execution (thousands of operations per byte)
Hash Function Recognition
Conceptual characteristics:
- Compression function: Transform fixed-size input to fixed-size output
- Block processing: Process data in chunks (512 bits typical)
- State accumulation: Running state updated with each block
- Padding: Add bytes to make input multiple of block size
- One-way: Lots of mixing, no reversibility
What to look for:
Initialization:
Fixed magic constants
MD5: 0x67452301, 0xefcdab89, 0x98badcfe, 0x10325476
SHA-1: 0x67452301, 0xefcdab89, 0x98badcfe, 0x10325476, 0xc3d2e1f0
SHA-256: 8 different constants
Round function:
Fixed iteration count (64, 80 rounds)
Lots of bitwise operations (rotations, XOR, AND, OR)
State mixing (each output bit depends on many input bits)
Padding logic:
Append 0x80 byte
Length encoding at end
Telltale signs:
- Characteristic initialization constants
- Fixed 64 or 80 round loops
- Bitwise rotation:
(x << n) | (x >> (32-n)) - Message schedule computation (W array expansion)
Simple XOR Obfuscation
Conceptual characteristics:
- Trivial operation:
output = input XOR key - Symmetric: Encryption and decryption identical
- Weak security: Easy to break, used for obfuscation not protection
What to look for:
Single-byte key:
for (i = 0; i < len; i++)
data[i] ^= 0x42;
Multi-byte key:
for (i = 0; i < len; i++)
data[i] ^= key[i % keylen];
Rolling key:
key = seed;
for (i = 0; i < len; i++) {
data[i] ^= key;
key = update_key(key); // LCG or similar
}
Telltale signs:
- Very short functions (5-10 lines)
- XOR with constants or simple patterns
- Often applied to strings or config data
- Paired with static data arrays that need decoding
Control Flow Patterns
State Machine Recognition
Conceptual characteristics:
- Explicit states: Enumeration or integer representing current state
- State transitions: Switch/if-else on state variable
- Event-driven: External input triggers transitions
What to look for:
State variable:
int state = INITIAL_STATE;
Dispatch loop:
while (running) {
switch (state) {
case STATE_A: /* handle A, maybe transition to B */
case STATE_B: /* handle B, maybe transition to C */
...
}
}
State tables (more advanced):
next_state = transition_table[current_state][input];
action = action_table[current_state][input];
Telltale signs:
- Large switch statements with many cases
- State variable repeatedly assigned new values
- Enumeration or #define constants for states
- Patterns like IDLE, CONNECTING, CONNECTED, DISCONNECTED
Common uses:
- Network protocol handling
- Parser implementation
- UI event handling
- Command processing
Command Dispatcher Recognition
Conceptual characteristics:
- Command codes: Numeric identifiers for operations
- Handler lookup: Map command ID to handler function
- Extensibility: Easy to add new commands
What to look for:
Command dispatch table:
switch (command_id) {
case CMD_EXECUTE: handle_execute(params); break;
case CMD_UPLOAD: handle_upload(params); break;
case CMD_DOWNLOAD: handle_download(params); break;
...
}
Function pointer table:
handler = command_table[command_id];
handler(params);
String-based dispatch:
if (strcmp(cmd, "exec") == 0) handle_execute();
else if (strcmp(cmd, "upload") == 0) handle_upload();
Telltale signs:
- Large switch on integer or string
- Array of function pointers
- Command ID constants or strings
- Common command names: exec, upload, download, shell, sleep, etc.
Common uses:
- Remote access tools (RAT)
- Backdoor command handling
- Plugin systems
- IPC/RPC mechanisms
Callback Pattern Recognition
Conceptual characteristics:
- Inversion of control: Library calls your code, not you calling library
- Function pointers: Pass address of your function to framework
- Asynchronous: Often used for async operations
What to look for:
Callback registration:
library_set_callback(MY_EVENT, my_handler_function);
Callback function signature:
void my_callback(event_type, data, user_context)
Common callback contexts:
- Network data received
- Timer expired
- File I/O complete
- User interaction
Telltale signs:
- Function pointers passed as parameters
- Functions with generic names like "handler", "callback", "on_event"
- Often have opaque pointer parameter (void* user_data)
Loop Patterns
Simple iteration:
for (i = 0; i < count; i++)
- Linear processing
- Transform/encrypt each element
Nested loops (2D processing):
for (i = 0; i < height; i++)
for (j = 0; j < width; j++)
- Image processing
- Matrix operations
- Block cipher on 2D state
Do-while patterns:
do {
read_chunk();
process_chunk();
} while (more_data);
- File/network processing
- Guaranteed first execution
While-true with break:
while (1) {
if (condition) break;
process();
}
- Server loops
- State machines
- Event loops
Data Structure Patterns
Buffer Management
Fixed-size buffers:
char buffer[1024];
read(fd, buffer, sizeof(buffer));
- Stack-allocated
- Size known at compile time
- Often seen with unsafe functions (strcpy, sprintf)
Dynamic buffers:
size = calculate_size();
buffer = malloc(size);
- Heap-allocated
- Size determined at runtime
- Look for malloc/free pairs or memory leaks
Ring buffers (circular):
write_pos = (write_pos + 1) % BUFFER_SIZE;
read_pos = (read_pos + 1) % BUFFER_SIZE;
- Fixed-size, reusable
- Modulo arithmetic for wrap-around
- Used in queues, streaming
Linked Structures
Linked list:
struct node {
data_type data;
struct node* next; // singly-linked
struct node* prev; // doubly-linked (optional)
};
Recognition:
- Pointer fields in structures
- Traversal loops:
while (node != NULL) { node = node->next; } - Insertion/deletion operations
Tree structures:
struct tree_node {
data_type data;
struct tree_node* left;
struct tree_node* right;
};
Recognition:
- Two pointer fields (left/right)
- Recursive functions
- Comparison operations for ordering
String Handling Patterns
Length-prefixed strings:
struct {
uint32_t length;
char data[];
}
Null-terminated strings:
while (*str != '\0') str++; // strlen pattern
Wide strings:
wchar_t* wstr;
uint16_t* utf16_str;
- 2 or 4 bytes per character
- String operations work on larger units
Detection:
- Character-by-character loops
- Null byte checks
- String manipulation function calls
- UTF-8/UTF-16 encoding/decoding
Network Protocol Patterns
Protocol Structure Recognition
Request-Response:
send_request(command, params);
response = receive_response();
process_response(response);
Characteristics:
- Client initiates
- Server responds
- Blocking or polling wait for response
- Examples: HTTP, DNS, RPC
Continuous Stream:
while (connected) {
data = receive_data();
process_chunk(data);
}
Characteristics:
- Persistent connection
- Data flows continuously
- No strict request-response pairing
- Examples: video streaming, log shipping
Message-Oriented:
while (true) {
message = receive_message(); // reads length, then payload
dispatch_message(message);
}
Characteristics:
- Discrete messages with boundaries
- Length prefix or delimiter
- Message type/ID field
- Examples: custom C2 protocols, message queues
Serialization Patterns
Binary serialization:
Write primitives in sequence:
write_uint32(length);
write_bytes(data, length);
write_uint8(flags);
Characteristics:
- Dense, efficient
- Fixed byte order (endianness)
- Magic numbers for structure identification
- Version fields for compatibility
Text-based serialization:
JSON: {"key": "value", "num": 42}
XML: <root><item>value</item></root>
Characteristics:
- Human-readable
- Delimiter characters ({}, <>, quotes)
- String parsing and generation code
- Less efficient but more flexible
Detection strategies:
- Look for sprintf/snprintf for text generation
- Check for JSON/XML parsing libraries
- Find memcpy sequences for binary packing
- Identify byte-swapping (htonl/ntohl pattern)
Connection Management
Connection establishment pattern:
Create socket
→ Connect to server
→ Send handshake/authentication
→ Receive acknowledgment
→ Enter main communication loop
Connection pooling pattern:
maintain pool of N connections
when request arrives:
if free_connection available:
use it
else:
create new connection (up to max)
after request:
return connection to pool
Reconnection pattern:
max_retries = 5;
while (retries < max_retries) {
if (connect_success) break;
sleep(backoff_time);
backoff_time *= 2; // exponential backoff
retries++;
}
Telltale signs:
- Retry loops with delays
- Connection state checking
- Timeout handling
- Fallback server lists
Behavioral Patterns
Encryption + Network (Data Exfiltration)
Pattern sequence:
1. Collect files/data
2. Compress (optional)
3. Encrypt
4. Send over network
5. Clean up local copies
What to look for:
- File enumeration → encryption function → network send
- Temporary file creation → processing → deletion
- Cross-reference encryption function to network functions
Decrypt + Execute (Payload Loading)
Pattern sequence:
1. Read encrypted payload from resource/file/network
2. Decrypt in memory
3. Execute (direct call, injection, or create process)
What to look for:
- Buffer allocated with execute permissions
- Decryption function → function pointer cast → indirect call
- XOR loop → memory copy → execution transfer
Time-Based Triggering
Pattern:
while (true) {
current_time = get_time();
if (current_time >= trigger_time) {
execute_payload();
break;
}
sleep(check_interval);
}
What to look for:
- Time/date API calls
- Comparison with specific dates
- Sleep/delay in loops
- Activation conditions based on temporal logic
Polymorphic Behavior
Pattern:
code_variant = select_variant(seed);
decrypt_code(code_variant);
execute_decrypted_code();
re-encrypt_code(new_seed);
What to look for:
- Self-modifying code
- Multiple code variants
- Decryption before execution
- Encryption after execution
- Memory protection changes (read/write/execute toggling)
Code Quality Indicators
Hand-Written vs. Generated Code
Hand-written characteristics:
- Inconsistent formatting
- Comments (if not stripped)
- Meaningful variable names (if symbols present)
- Idiomatic patterns for the language
- Error handling mixed with logic
Generated/compiled characteristics:
- Very consistent structure
- Compiler optimization patterns
- Systematic variable naming (if stripped)
- Uniform error handling
- Recognizable library code patterns
Obfuscated Code Indicators
Deliberately obscured:
- Meaningless variable/function names
- Unnecessary complexity
- Dead code branches
- Opaque predicates (always true/false conditions)
- Indirect calls through pointer manipulations
- String obfuscation
Compiler optimizations (benign):
- Loop unrolling
- Function inlining
- Constant folding
- Dead code elimination
- Register allocation patterns
Distinction: Obfuscation creates complexity without performance benefit; optimization creates complexity for performance.
Library Code vs. Custom Code
Library code:
- Standard algorithms (qsort, hash functions)
- Consistent with open-source implementations
- Well-structured, parameterized
- Minimal dependencies on surrounding code
Custom code:
- Unique patterns
- Integrated with application logic
- Application-specific data structures
- More likely to have bugs/vulnerabilities
Investigation priority: Focus on custom code - that's where unique behavior lives.
Using This Reference
Pattern Matching Workflow
- Observe structure - What loops, branches, data structures appear?
- Compare to patterns - Does this match known algorithmic patterns?
- Verify with evidence - Check for characteristic constants, operations, structure
- Document pattern - Bookmark with pattern name for reference
- Improve code - Rename variables/functions to reflect pattern (e.g.,
aes_encrypt,rc4_keystream)
Example Investigation
Observation: Function with nested loops, array lookups, XOR operations
Compare: Matches "Block Cipher" or "Stream Cipher" patterns
Verify:
- Check for large constant array (S-box?)
- Count outer loop iterations (rounds?)
- Look for key schedule function
Find: 256-byte array starting 63 7c 77 7b...
14 iterations in outer loop
Conclusion: AES-256 (14 rounds, standard S-box)
Improve:
rename-variables: state→aes_state, table→aes_sbox
set-function-prototype: void aes_encrypt(uint8_t* data, uint8_t* key)
set-comment: "AES-256 encryption using standard S-box"
Pattern Combination
Real-world code combines multiple patterns:
Example: Malware C2 Communication
[Command Dispatcher] receives command from network
↓
[State Machine] tracks connection state
↓
[Callback Functions] handle specific commands
↓
[Buffer Management] manages received data
↓
[Encryption] protects command payloads
When you identify one pattern, look for related patterns in:
- Functions that call this one (higher-level orchestration)
- Functions called by this one (lower-level primitives)
- Cross-references to shared data structures
Progressive Understanding
Don't need to identify every pattern perfectly:
First pass: "This looks like crypto (lots of XOR and loops)" Second pass: "Probably a stream cipher (simple state, no large tables)" Third pass: "Matches RC4 pattern (256-byte init, swap operations)" Fourth pass: "Confirmed RC4 (found KSA and PRGA pattern)"
Each pass refines understanding and guides further investigation.