Files
2025-11-29 18:17:15 +08:00

17 KiB

Reverse Engineering Patterns Reference

This document contains higher-level patterns and concepts to recognize during deep analysis. Focus on algorithmic patterns, behavioral patterns, and code structure rather than platform-specific implementation details.

Cryptographic Algorithm Patterns

Block Cipher Recognition

Conceptual characteristics:

  • Substitution-Permutation Network (SPN): Repeated rounds of substitution (S-boxes) and permutation (bit shuffling)
  • Feistel Network: Split data in half, operate on one half using the other as key input, swap halves, repeat
  • Fixed block size: Typically 64 bits (DES, Blowfish) or 128 bits (AES)
  • Multiple rounds: 8-16+ iterations of core transformation
  • Key schedule: Derive round keys from master key

What to look for in decompiled code:

Nested loops:
  Outer: rounds (8, 10, 12, 14, 16, 32 iterations)
  Inner: processing blocks of fixed size

Array lookups (S-boxes):
  result = table[input_byte]
  Often 256-element arrays (0x100 size)

Bit manipulation:
  XOR, rotation (>> combined with <<), permutation

State updates:
  Array or struct representing current cipher state
  Transformed each round

Telltale signs:

  • Large constant arrays (256+ bytes) that look like random data
  • Fixed iteration counts (not data-dependent)
  • Heavy use of XOR operations
  • Byte-level array indexing: array[data[i]]

Investigation strategy:

  1. read-memory at constant arrays - compare to known S-boxes
  2. Count loop iterations - indicates cipher type/key size
  3. search-strings-regex for algorithm names
  4. Check cross-references to constants - find cipher initialization

Stream Cipher Recognition

Conceptual characteristics:

  • Keystream generation: Produce pseudo-random byte stream from key
  • Simple combination: XOR plaintext with keystream
  • State-based: Internal state evolves as keystream is produced
  • No fixed blocks: Can encrypt arbitrary lengths

What to look for:

State initialization:
  Array or struct setup from key
  Often 256-byte arrays

Keystream generation loop:
  State updates via modular arithmetic
  Index computations: i = (i + 1) % N
  Swap operations common

XOR combination:
  output[i] = input[i] ^ keystream[i]
  Simple, obvious pattern

Telltale signs:

  • Array swap operations: temp = a[i]; a[i] = a[j]; a[j] = temp
  • Modulo operations: % 256 or & 0xFF
  • XOR in simple loop
  • Smaller code footprint than block ciphers (no large constants)

Public Key Cryptography Recognition

Conceptual characteristics:

  • Large integer arithmetic: Numbers hundreds or thousands of bits
  • Modular exponentiation: result = base^exponent mod modulus
  • Performance: Very slow compared to symmetric crypto (indicates usage for key exchange, not bulk data)

What to look for:

Multi-precision arithmetic:
  Arrays representing big integers
  Functions for add/subtract/multiply on arrays

Square-and-multiply pattern:
  Loop over exponent bits
  Square operation each iteration
  Conditional multiply based on bit value

Modulo operations on large numbers:
  Division with large divisors
  Barrett reduction or Montgomery multiplication

Telltale signs:

  • Very large buffers (128, 256, 512 bytes+)
  • Bit-by-bit exponent processing
  • Characteristic magic constants (e.g., 0x10001 = 65537 for RSA)
  • Slow execution (thousands of operations per byte)

Hash Function Recognition

Conceptual characteristics:

  • Compression function: Transform fixed-size input to fixed-size output
  • Block processing: Process data in chunks (512 bits typical)
  • State accumulation: Running state updated with each block
  • Padding: Add bytes to make input multiple of block size
  • One-way: Lots of mixing, no reversibility

What to look for:

Initialization:
  Fixed magic constants
  MD5: 0x67452301, 0xefcdab89, 0x98badcfe, 0x10325476
  SHA-1: 0x67452301, 0xefcdab89, 0x98badcfe, 0x10325476, 0xc3d2e1f0
  SHA-256: 8 different constants

Round function:
  Fixed iteration count (64, 80 rounds)
  Lots of bitwise operations (rotations, XOR, AND, OR)
  State mixing (each output bit depends on many input bits)

Padding logic:
  Append 0x80 byte
  Length encoding at end

Telltale signs:

  • Characteristic initialization constants
  • Fixed 64 or 80 round loops
  • Bitwise rotation: (x << n) | (x >> (32-n))
  • Message schedule computation (W array expansion)

Simple XOR Obfuscation

Conceptual characteristics:

  • Trivial operation: output = input XOR key
  • Symmetric: Encryption and decryption identical
  • Weak security: Easy to break, used for obfuscation not protection

What to look for:

Single-byte key:
  for (i = 0; i < len; i++)
    data[i] ^= 0x42;

Multi-byte key:
  for (i = 0; i < len; i++)
    data[i] ^= key[i % keylen];

Rolling key:
  key = seed;
  for (i = 0; i < len; i++) {
    data[i] ^= key;
    key = update_key(key);  // LCG or similar
  }

Telltale signs:

  • Very short functions (5-10 lines)
  • XOR with constants or simple patterns
  • Often applied to strings or config data
  • Paired with static data arrays that need decoding

Control Flow Patterns

State Machine Recognition

Conceptual characteristics:

  • Explicit states: Enumeration or integer representing current state
  • State transitions: Switch/if-else on state variable
  • Event-driven: External input triggers transitions

What to look for:

State variable:
  int state = INITIAL_STATE;

Dispatch loop:
  while (running) {
    switch (state) {
      case STATE_A: /* handle A, maybe transition to B */
      case STATE_B: /* handle B, maybe transition to C */
      ...
    }
  }

State tables (more advanced):
  next_state = transition_table[current_state][input];
  action = action_table[current_state][input];

Telltale signs:

  • Large switch statements with many cases
  • State variable repeatedly assigned new values
  • Enumeration or #define constants for states
  • Patterns like IDLE, CONNECTING, CONNECTED, DISCONNECTED

Common uses:

  • Network protocol handling
  • Parser implementation
  • UI event handling
  • Command processing

Command Dispatcher Recognition

Conceptual characteristics:

  • Command codes: Numeric identifiers for operations
  • Handler lookup: Map command ID to handler function
  • Extensibility: Easy to add new commands

What to look for:

Command dispatch table:
  switch (command_id) {
    case CMD_EXECUTE:  handle_execute(params); break;
    case CMD_UPLOAD:   handle_upload(params); break;
    case CMD_DOWNLOAD: handle_download(params); break;
    ...
  }

Function pointer table:
  handler = command_table[command_id];
  handler(params);

String-based dispatch:
  if (strcmp(cmd, "exec") == 0) handle_execute();
  else if (strcmp(cmd, "upload") == 0) handle_upload();

Telltale signs:

  • Large switch on integer or string
  • Array of function pointers
  • Command ID constants or strings
  • Common command names: exec, upload, download, shell, sleep, etc.

Common uses:

  • Remote access tools (RAT)
  • Backdoor command handling
  • Plugin systems
  • IPC/RPC mechanisms

Callback Pattern Recognition

Conceptual characteristics:

  • Inversion of control: Library calls your code, not you calling library
  • Function pointers: Pass address of your function to framework
  • Asynchronous: Often used for async operations

What to look for:

Callback registration:
  library_set_callback(MY_EVENT, my_handler_function);

Callback function signature:
  void my_callback(event_type, data, user_context)

Common callback contexts:
  - Network data received
  - Timer expired
  - File I/O complete
  - User interaction

Telltale signs:

  • Function pointers passed as parameters
  • Functions with generic names like "handler", "callback", "on_event"
  • Often have opaque pointer parameter (void* user_data)

Loop Patterns

Simple iteration:

for (i = 0; i < count; i++)
  - Linear processing
  - Transform/encrypt each element

Nested loops (2D processing):

for (i = 0; i < height; i++)
  for (j = 0; j < width; j++)
    - Image processing
    - Matrix operations
    - Block cipher on 2D state

Do-while patterns:

do {
  read_chunk();
  process_chunk();
} while (more_data);
  - File/network processing
  - Guaranteed first execution

While-true with break:

while (1) {
  if (condition) break;
  process();
}
  - Server loops
  - State machines
  - Event loops

Data Structure Patterns

Buffer Management

Fixed-size buffers:

char buffer[1024];
read(fd, buffer, sizeof(buffer));
  - Stack-allocated
  - Size known at compile time
  - Often seen with unsafe functions (strcpy, sprintf)

Dynamic buffers:

size = calculate_size();
buffer = malloc(size);
  - Heap-allocated
  - Size determined at runtime
  - Look for malloc/free pairs or memory leaks

Ring buffers (circular):

write_pos = (write_pos + 1) % BUFFER_SIZE;
read_pos = (read_pos + 1) % BUFFER_SIZE;
  - Fixed-size, reusable
  - Modulo arithmetic for wrap-around
  - Used in queues, streaming

Linked Structures

Linked list:

struct node {
  data_type data;
  struct node* next;  // singly-linked
  struct node* prev;  // doubly-linked (optional)
};

Recognition:

  • Pointer fields in structures
  • Traversal loops: while (node != NULL) { node = node->next; }
  • Insertion/deletion operations

Tree structures:

struct tree_node {
  data_type data;
  struct tree_node* left;
  struct tree_node* right;
};

Recognition:

  • Two pointer fields (left/right)
  • Recursive functions
  • Comparison operations for ordering

String Handling Patterns

Length-prefixed strings:

struct {
  uint32_t length;
  char data[];
}

Null-terminated strings:

while (*str != '\0') str++;  // strlen pattern

Wide strings:

wchar_t* wstr;
uint16_t* utf16_str;
  - 2 or 4 bytes per character
  - String operations work on larger units

Detection:

  • Character-by-character loops
  • Null byte checks
  • String manipulation function calls
  • UTF-8/UTF-16 encoding/decoding

Network Protocol Patterns

Protocol Structure Recognition

Request-Response:

send_request(command, params);
response = receive_response();
process_response(response);

Characteristics:

  • Client initiates
  • Server responds
  • Blocking or polling wait for response
  • Examples: HTTP, DNS, RPC

Continuous Stream:

while (connected) {
  data = receive_data();
  process_chunk(data);
}

Characteristics:

  • Persistent connection
  • Data flows continuously
  • No strict request-response pairing
  • Examples: video streaming, log shipping

Message-Oriented:

while (true) {
  message = receive_message();  // reads length, then payload
  dispatch_message(message);
}

Characteristics:

  • Discrete messages with boundaries
  • Length prefix or delimiter
  • Message type/ID field
  • Examples: custom C2 protocols, message queues

Serialization Patterns

Binary serialization:

Write primitives in sequence:
  write_uint32(length);
  write_bytes(data, length);
  write_uint8(flags);

Characteristics:

  • Dense, efficient
  • Fixed byte order (endianness)
  • Magic numbers for structure identification
  • Version fields for compatibility

Text-based serialization:

JSON: {"key": "value", "num": 42}
XML: <root><item>value</item></root>

Characteristics:

  • Human-readable
  • Delimiter characters ({}, <>, quotes)
  • String parsing and generation code
  • Less efficient but more flexible

Detection strategies:

  1. Look for sprintf/snprintf for text generation
  2. Check for JSON/XML parsing libraries
  3. Find memcpy sequences for binary packing
  4. Identify byte-swapping (htonl/ntohl pattern)

Connection Management

Connection establishment pattern:

Create socket
→ Connect to server
→ Send handshake/authentication
→ Receive acknowledgment
→ Enter main communication loop

Connection pooling pattern:

maintain pool of N connections
when request arrives:
  if free_connection available:
    use it
  else:
    create new connection (up to max)
after request:
  return connection to pool

Reconnection pattern:

max_retries = 5;
while (retries < max_retries) {
  if (connect_success) break;
  sleep(backoff_time);
  backoff_time *= 2;  // exponential backoff
  retries++;
}

Telltale signs:

  • Retry loops with delays
  • Connection state checking
  • Timeout handling
  • Fallback server lists

Behavioral Patterns

Encryption + Network (Data Exfiltration)

Pattern sequence:

1. Collect files/data
2. Compress (optional)
3. Encrypt
4. Send over network
5. Clean up local copies

What to look for:

  • File enumeration → encryption function → network send
  • Temporary file creation → processing → deletion
  • Cross-reference encryption function to network functions

Decrypt + Execute (Payload Loading)

Pattern sequence:

1. Read encrypted payload from resource/file/network
2. Decrypt in memory
3. Execute (direct call, injection, or create process)

What to look for:

  • Buffer allocated with execute permissions
  • Decryption function → function pointer cast → indirect call
  • XOR loop → memory copy → execution transfer

Time-Based Triggering

Pattern:

while (true) {
  current_time = get_time();
  if (current_time >= trigger_time) {
    execute_payload();
    break;
  }
  sleep(check_interval);
}

What to look for:

  • Time/date API calls
  • Comparison with specific dates
  • Sleep/delay in loops
  • Activation conditions based on temporal logic

Polymorphic Behavior

Pattern:

code_variant = select_variant(seed);
decrypt_code(code_variant);
execute_decrypted_code();
re-encrypt_code(new_seed);

What to look for:

  • Self-modifying code
  • Multiple code variants
  • Decryption before execution
  • Encryption after execution
  • Memory protection changes (read/write/execute toggling)

Code Quality Indicators

Hand-Written vs. Generated Code

Hand-written characteristics:

  • Inconsistent formatting
  • Comments (if not stripped)
  • Meaningful variable names (if symbols present)
  • Idiomatic patterns for the language
  • Error handling mixed with logic

Generated/compiled characteristics:

  • Very consistent structure
  • Compiler optimization patterns
  • Systematic variable naming (if stripped)
  • Uniform error handling
  • Recognizable library code patterns

Obfuscated Code Indicators

Deliberately obscured:

  • Meaningless variable/function names
  • Unnecessary complexity
  • Dead code branches
  • Opaque predicates (always true/false conditions)
  • Indirect calls through pointer manipulations
  • String obfuscation

Compiler optimizations (benign):

  • Loop unrolling
  • Function inlining
  • Constant folding
  • Dead code elimination
  • Register allocation patterns

Distinction: Obfuscation creates complexity without performance benefit; optimization creates complexity for performance.

Library Code vs. Custom Code

Library code:

  • Standard algorithms (qsort, hash functions)
  • Consistent with open-source implementations
  • Well-structured, parameterized
  • Minimal dependencies on surrounding code

Custom code:

  • Unique patterns
  • Integrated with application logic
  • Application-specific data structures
  • More likely to have bugs/vulnerabilities

Investigation priority: Focus on custom code - that's where unique behavior lives.


Using This Reference

Pattern Matching Workflow

  1. Observe structure - What loops, branches, data structures appear?
  2. Compare to patterns - Does this match known algorithmic patterns?
  3. Verify with evidence - Check for characteristic constants, operations, structure
  4. Document pattern - Bookmark with pattern name for reference
  5. Improve code - Rename variables/functions to reflect pattern (e.g., aes_encrypt, rc4_keystream)

Example Investigation

Observation: Function with nested loops, array lookups, XOR operations

Compare: Matches "Block Cipher" or "Stream Cipher" patterns

Verify:
  - Check for large constant array (S-box?)
  - Count outer loop iterations (rounds?)
  - Look for key schedule function

Find: 256-byte array starting 63 7c 77 7b...
      14 iterations in outer loop

Conclusion: AES-256 (14 rounds, standard S-box)

Improve:
  rename-variables: state→aes_state, table→aes_sbox
  set-function-prototype: void aes_encrypt(uint8_t* data, uint8_t* key)
  set-comment: "AES-256 encryption using standard S-box"

Pattern Combination

Real-world code combines multiple patterns:

Example: Malware C2 Communication

[Command Dispatcher] receives command from network
  ↓
[State Machine] tracks connection state
  ↓
[Callback Functions] handle specific commands
  ↓
[Buffer Management] manages received data
  ↓
[Encryption] protects command payloads

When you identify one pattern, look for related patterns in:

  • Functions that call this one (higher-level orchestration)
  • Functions called by this one (lower-level primitives)
  • Cross-references to shared data structures

Progressive Understanding

Don't need to identify every pattern perfectly:

First pass: "This looks like crypto (lots of XOR and loops)" Second pass: "Probably a stream cipher (simple state, no large tables)" Third pass: "Matches RC4 pattern (256-byte init, swap operations)" Fourth pass: "Confirmed RC4 (found KSA and PRGA pattern)"

Each pass refines understanding and guides further investigation.