gh-cyberkaida-reverse-engin…/skills/deep-analysis/patterns.md

# Reverse Engineering Patterns Reference

This document contains higher-level patterns and concepts to recognize during deep analysis. Focus on algorithmic patterns, behavioral patterns, and code structure rather than platform-specific implementation details.

## Cryptographic Algorithm Patterns

### Block Cipher Recognition

**Conceptual characteristics:**
- **Substitution-Permutation Network (SPN)**: Repeated rounds of substitution (S-boxes) and permutation (bit shuffling)
- **Feistel Network**: Split data in half, operate on one half using the other as key input, swap halves, repeat
- **Fixed block size**: Typically 64 bits (DES, Blowfish) or 128 bits (AES)
- **Multiple rounds**: 8-16+ iterations of core transformation
- **Key schedule**: Derive round keys from master key

**What to look for in decompiled code:**
```
Nested loops:
  Outer: rounds (8, 10, 12, 14, 16, 32 iterations)
  Inner: processing blocks of fixed size

Array lookups (S-boxes):
  result = table[input_byte]
  Often 256-element arrays (0x100 size)

Bit manipulation:
  XOR, rotation (>> combined with <<), permutation

State updates:
  Array or struct representing current cipher state
  Transformed each round
```

**Telltale signs:**
- Large constant arrays (256+ bytes) that look like random data
- Fixed iteration counts (not data-dependent)
- Heavy use of XOR operations
- Byte-level array indexing: `array[data[i]]`

**Investigation strategy:**
1. `read-memory` at constant arrays - compare to known S-boxes
2. Count loop iterations - indicates cipher type/key size
3. `search-strings-regex` for algorithm names
4. Check cross-references to constants - find cipher initialization

### Stream Cipher Recognition

**Conceptual characteristics:**
- **Keystream generation**: Produce pseudo-random byte stream from key
- **Simple combination**: XOR plaintext with keystream
- **State-based**: Internal state evolves as keystream is produced
- **No fixed blocks**: Can encrypt arbitrary lengths

**What to look for:**
```
State initialization:
  Array or struct setup from key
  Often 256-byte arrays

Keystream generation loop:
  State updates via modular arithmetic
  Index computations: i = (i + 1) % N
  Swap operations common

XOR combination:
  output[i] = input[i] ^ keystream[i]
  Simple, obvious pattern
```

**Telltale signs:**
- Array swap operations: `temp = a[i]; a[i] = a[j]; a[j] = temp`
- Modulo operations: `% 256` or `& 0xFF`
- XOR in simple loop
- Smaller code footprint than block ciphers (no large constants)

### Public Key Cryptography Recognition

**Conceptual characteristics:**
- **Large integer arithmetic**: Numbers hundreds or thousands of bits
- **Modular exponentiation**: `result = base^exponent mod modulus`
- **Performance**: Very slow compared to symmetric crypto (indicates usage for key exchange, not bulk data)

**What to look for:**
```
Multi-precision arithmetic:
  Arrays representing big integers
  Functions for add/subtract/multiply on arrays

Square-and-multiply pattern:
  Loop over exponent bits
  Square operation each iteration
  Conditional multiply based on bit value

Modulo operations on large numbers:
  Division with large divisors
  Barrett reduction or Montgomery multiplication
```

**Telltale signs:**
- Very large buffers (128, 256, 512 bytes+)
- Bit-by-bit exponent processing
- Characteristic magic constants (e.g., 0x10001 = 65537 for RSA)
- Slow execution (thousands of operations per byte)

### Hash Function Recognition

**Conceptual characteristics:**
- **Compression function**: Transform fixed-size input to fixed-size output
- **Block processing**: Process data in chunks (512 bits typical)
- **State accumulation**: Running state updated with each block
- **Padding**: Add bytes to make input multiple of block size
- **One-way**: Lots of mixing, no reversibility

**What to look for:**
```
Initialization:
  Fixed magic constants
  MD5: 0x67452301, 0xefcdab89, 0x98badcfe, 0x10325476
  SHA-1: 0x67452301, 0xefcdab89, 0x98badcfe, 0x10325476, 0xc3d2e1f0
  SHA-256: 8 different constants

Round function:
  Fixed iteration count (64, 80 rounds)
  Lots of bitwise operations (rotations, XOR, AND, OR)
  State mixing (each output bit depends on many input bits)

Padding logic:
  Append 0x80 byte
  Length encoding at end
```

**Telltale signs:**
- Characteristic initialization constants
- Fixed 64 or 80 round loops
- Bitwise rotation: `(x << n) | (x >> (32-n))`
- Message schedule computation (W array expansion)

### Simple XOR Obfuscation

**Conceptual characteristics:**
- **Trivial operation**: `output = input XOR key`
- **Symmetric**: Encryption and decryption identical
- **Weak security**: Easy to break, used for obfuscation not protection

**What to look for:**
```
Single-byte key:
  for (i = 0; i < len; i++)
    data[i] ^= 0x42;

Multi-byte key:
  for (i = 0; i < len; i++)
    data[i] ^= key[i % keylen];

Rolling key:
  key = seed;
  for (i = 0; i < len; i++) {
    data[i] ^= key;
    key = update_key(key);  // LCG or similar
  }
```

**Telltale signs:**
- Very short functions (5-10 lines)
- XOR with constants or simple patterns
- Often applied to strings or config data
- Paired with static data arrays that need decoding

---

## Control Flow Patterns

### State Machine Recognition

**Conceptual characteristics:**
- **Explicit states**: Enumeration or integer representing current state
- **State transitions**: Switch/if-else on state variable
- **Event-driven**: External input triggers transitions

**What to look for:**
```
State variable:
  int state = INITIAL_STATE;

Dispatch loop:
  while (running) {
    switch (state) {
      case STATE_A: /* handle A, maybe transition to B */
      case STATE_B: /* handle B, maybe transition to C */
      ...
    }
  }

State tables (more advanced):
  next_state = transition_table[current_state][input];
  action = action_table[current_state][input];
```

**Telltale signs:**
- Large switch statements with many cases
- State variable repeatedly assigned new values
- Enumeration or #define constants for states
- Patterns like IDLE, CONNECTING, CONNECTED, DISCONNECTED

**Common uses:**
- Network protocol handling
- Parser implementation
- UI event handling
- Command processing

### Command Dispatcher Recognition

**Conceptual characteristics:**
- **Command codes**: Numeric identifiers for operations
- **Handler lookup**: Map command ID to handler function
- **Extensibility**: Easy to add new commands

**What to look for:**
```
Command dispatch table:
  switch (command_id) {
    case CMD_EXECUTE:  handle_execute(params); break;
    case CMD_UPLOAD:   handle_upload(params); break;
    case CMD_DOWNLOAD: handle_download(params); break;
    ...
  }

Function pointer table:
  handler = command_table[command_id];
  handler(params);

String-based dispatch:
  if (strcmp(cmd, "exec") == 0) handle_execute();
  else if (strcmp(cmd, "upload") == 0) handle_upload();
```

**Telltale signs:**
- Large switch on integer or string
- Array of function pointers
- Command ID constants or strings
- Common command names: exec, upload, download, shell, sleep, etc.

**Common uses:**
- Remote access tools (RAT)
- Backdoor command handling
- Plugin systems
- IPC/RPC mechanisms

### Callback Pattern Recognition

**Conceptual characteristics:**
- **Inversion of control**: Library calls your code, not you calling library
- **Function pointers**: Pass address of your function to framework
- **Asynchronous**: Often used for async operations

**What to look for:**
```
Callback registration:
  library_set_callback(MY_EVENT, my_handler_function);

Callback function signature:
  void my_callback(event_type, data, user_context)

Common callback contexts:
  - Network data received
  - Timer expired
  - File I/O complete
  - User interaction
```

**Telltale signs:**
- Function pointers passed as parameters
- Functions with generic names like "handler", "callback", "on_event"
- Often have opaque pointer parameter (void* user_data)

### Loop Patterns

**Simple iteration:**
```
for (i = 0; i < count; i++)
  - Linear processing
  - Transform/encrypt each element
```

**Nested loops (2D processing):**
```
for (i = 0; i < height; i++)
  for (j = 0; j < width; j++)
    - Image processing
    - Matrix operations
    - Block cipher on 2D state
```

**Do-while patterns:**
```
do {
  read_chunk();
  process_chunk();
} while (more_data);
  - File/network processing
  - Guaranteed first execution
```

**While-true with break:**
```
while (1) {
  if (condition) break;
  process();
}
  - Server loops
  - State machines
  - Event loops
```

---

## Data Structure Patterns

### Buffer Management

**Fixed-size buffers:**
```
char buffer[1024];
read(fd, buffer, sizeof(buffer));
  - Stack-allocated
  - Size known at compile time
  - Often seen with unsafe functions (strcpy, sprintf)
```

**Dynamic buffers:**
```
size = calculate_size();
buffer = malloc(size);
  - Heap-allocated
  - Size determined at runtime
  - Look for malloc/free pairs or memory leaks
```

**Ring buffers (circular):**
```
write_pos = (write_pos + 1) % BUFFER_SIZE;
read_pos = (read_pos + 1) % BUFFER_SIZE;
  - Fixed-size, reusable
  - Modulo arithmetic for wrap-around
  - Used in queues, streaming
```

### Linked Structures

**Linked list:**
```
struct node {
  data_type data;
  struct node* next;  // singly-linked
  struct node* prev;  // doubly-linked (optional)
};
```

**Recognition:**
- Pointer fields in structures
- Traversal loops: `while (node != NULL) { node = node->next; }`
- Insertion/deletion operations

**Tree structures:**
```
struct tree_node {
  data_type data;
  struct tree_node* left;
  struct tree_node* right;
};
```

**Recognition:**
- Two pointer fields (left/right)
- Recursive functions
- Comparison operations for ordering

### String Handling Patterns

**Length-prefixed strings:**
```
struct {
  uint32_t length;
  char data[];
}
```

**Null-terminated strings:**
```
while (*str != '\0') str++;  // strlen pattern
```

**Wide strings:**
```
wchar_t* wstr;
uint16_t* utf16_str;
  - 2 or 4 bytes per character
  - String operations work on larger units
```

**Detection:**
- Character-by-character loops
- Null byte checks
- String manipulation function calls
- UTF-8/UTF-16 encoding/decoding

---

## Network Protocol Patterns

### Protocol Structure Recognition

**Request-Response:**
```
send_request(command, params);
response = receive_response();
process_response(response);
```

**Characteristics:**
- Client initiates
- Server responds
- Blocking or polling wait for response
- Examples: HTTP, DNS, RPC

**Continuous Stream:**
```
while (connected) {
  data = receive_data();
  process_chunk(data);
}
```

**Characteristics:**
- Persistent connection
- Data flows continuously
- No strict request-response pairing
- Examples: video streaming, log shipping

**Message-Oriented:**
```
while (true) {
  message = receive_message();  // reads length, then payload
  dispatch_message(message);
}
```

**Characteristics:**
- Discrete messages with boundaries
- Length prefix or delimiter
- Message type/ID field
- Examples: custom C2 protocols, message queues

### Serialization Patterns

**Binary serialization:**
```
Write primitives in sequence:
  write_uint32(length);
  write_bytes(data, length);
  write_uint8(flags);
```

**Characteristics:**
- Dense, efficient
- Fixed byte order (endianness)
- Magic numbers for structure identification
- Version fields for compatibility

**Text-based serialization:**
```
JSON: {"key": "value", "num": 42}
XML: <root><item>value</item></root>
```

**Characteristics:**
- Human-readable
- Delimiter characters ({}, <>, quotes)
- String parsing and generation code
- Less efficient but more flexible

**Detection strategies:**
1. Look for sprintf/snprintf for text generation
2. Check for JSON/XML parsing libraries
3. Find memcpy sequences for binary packing
4. Identify byte-swapping (htonl/ntohl pattern)

### Connection Management

**Connection establishment pattern:**
```
Create socket
→ Connect to server
→ Send handshake/authentication
→ Receive acknowledgment
→ Enter main communication loop
```

**Connection pooling pattern:**
```
maintain pool of N connections
when request arrives:
  if free_connection available:
    use it
  else:
    create new connection (up to max)
after request:
  return connection to pool
```

**Reconnection pattern:**
```
max_retries = 5;
while (retries < max_retries) {
  if (connect_success) break;
  sleep(backoff_time);
  backoff_time *= 2;  // exponential backoff
  retries++;
}
```

**Telltale signs:**
- Retry loops with delays
- Connection state checking
- Timeout handling
- Fallback server lists

---

## Behavioral Patterns

### Encryption + Network (Data Exfiltration)

**Pattern sequence:**
```
1. Collect files/data
2. Compress (optional)
3. Encrypt
4. Send over network
5. Clean up local copies
```

**What to look for:**
- File enumeration → encryption function → network send
- Temporary file creation → processing → deletion
- Cross-reference encryption function to network functions

### Decrypt + Execute (Payload Loading)

**Pattern sequence:**
```
1. Read encrypted payload from resource/file/network
2. Decrypt in memory
3. Execute (direct call, injection, or create process)
```

**What to look for:**
- Buffer allocated with execute permissions
- Decryption function → function pointer cast → indirect call
- XOR loop → memory copy → execution transfer

### Time-Based Triggering

**Pattern:**
```
while (true) {
  current_time = get_time();
  if (current_time >= trigger_time) {
    execute_payload();
    break;
  }
  sleep(check_interval);
}
```

**What to look for:**
- Time/date API calls
- Comparison with specific dates
- Sleep/delay in loops
- Activation conditions based on temporal logic

### Polymorphic Behavior

**Pattern:**
```
code_variant = select_variant(seed);
decrypt_code(code_variant);
execute_decrypted_code();
re-encrypt_code(new_seed);
```

**What to look for:**
- Self-modifying code
- Multiple code variants
- Decryption before execution
- Encryption after execution
- Memory protection changes (read/write/execute toggling)

---

## Code Quality Indicators

### Hand-Written vs. Generated Code

**Hand-written characteristics:**
- Inconsistent formatting
- Comments (if not stripped)
- Meaningful variable names (if symbols present)
- Idiomatic patterns for the language
- Error handling mixed with logic

**Generated/compiled characteristics:**
- Very consistent structure
- Compiler optimization patterns
- Systematic variable naming (if stripped)
- Uniform error handling
- Recognizable library code patterns

### Obfuscated Code Indicators

**Deliberately obscured:**
- Meaningless variable/function names
- Unnecessary complexity
- Dead code branches
- Opaque predicates (always true/false conditions)
- Indirect calls through pointer manipulations
- String obfuscation

**Compiler optimizations (benign):**
- Loop unrolling
- Function inlining
- Constant folding
- Dead code elimination
- Register allocation patterns

**Distinction:** Obfuscation creates complexity without performance benefit; optimization creates complexity for performance.

### Library Code vs. Custom Code

**Library code:**
- Standard algorithms (qsort, hash functions)
- Consistent with open-source implementations
- Well-structured, parameterized
- Minimal dependencies on surrounding code

**Custom code:**
- Unique patterns
- Integrated with application logic
- Application-specific data structures
- More likely to have bugs/vulnerabilities

**Investigation priority:** Focus on custom code - that's where unique behavior lives.

---

## Using This Reference

### Pattern Matching Workflow

1. **Observe structure** - What loops, branches, data structures appear?
2. **Compare to patterns** - Does this match known algorithmic patterns?
3. **Verify with evidence** - Check for characteristic constants, operations, structure
4. **Document pattern** - Bookmark with pattern name for reference
5. **Improve code** - Rename variables/functions to reflect pattern (e.g., `aes_encrypt`, `rc4_keystream`)

### Example Investigation

```
Observation: Function with nested loops, array lookups, XOR operations

Compare: Matches "Block Cipher" or "Stream Cipher" patterns

Verify:
  - Check for large constant array (S-box?)
  - Count outer loop iterations (rounds?)
  - Look for key schedule function

Find: 256-byte array starting 63 7c 77 7b...
      14 iterations in outer loop

Conclusion: AES-256 (14 rounds, standard S-box)

Improve:
  rename-variables: state→aes_state, table→aes_sbox
  set-function-prototype: void aes_encrypt(uint8_t* data, uint8_t* key)
  set-comment: "AES-256 encryption using standard S-box"
```

### Pattern Combination

Real-world code combines multiple patterns:

**Example: Malware C2 Communication**
```
[Command Dispatcher] receives command from network
  ↓
[State Machine] tracks connection state
  ↓
[Callback Functions] handle specific commands
  ↓
[Buffer Management] manages received data
  ↓
[Encryption] protects command payloads
```

When you identify one pattern, look for related patterns in:
- Functions that call this one (higher-level orchestration)
- Functions called by this one (lower-level primitives)
- Cross-references to shared data structures

### Progressive Understanding

Don't need to identify every pattern perfectly:

**First pass:** "This looks like crypto (lots of XOR and loops)"
**Second pass:** "Probably a stream cipher (simple state, no large tables)"
**Third pass:** "Matches RC4 pattern (256-byte init, swap operations)"
**Fourth pass:** "Confirmed RC4 (found KSA and PRGA pattern)"

Each pass refines understanding and guides further investigation.