Files
gh-basher83-lunar-claude-pl…/skills/netbox-powerdns-integration/reference/netbox-best-practices.md
2025-11-29 18:00:21 +08:00

22 KiB

NetBox Best Practices for Virgo-Core

NetBox Version: 4.3.0 Audience: Infrastructure automation engineers

Comprehensive best practices for using NetBox as the source of truth for the Matrix cluster infrastructure, including data organization, security, performance, and integration patterns.


Table of Contents


Data Organization

Hierarchical Structure

Follow this order when setting up infrastructure in NetBox:

1. Sites          → Create physical locations first
2. Prefixes       → Define IP networks (IPAM)
3. VLANs          → Network segmentation
4. Device Types   → Hardware models
5. Device Roles   → Purpose categories
6. Clusters       → Virtualization clusters
7. Devices        → Physical hardware
8. Interfaces     → Network interfaces
9. IP Addresses   → Assign IPs to interfaces
10. VMs           → Virtual machines

Why this order?

  • Parent objects must exist before children
  • Avoids circular dependencies
  • Enables atomic operations

Site Organization

Good:

# One site per physical location
site = nb.dcim.sites.create(
    name="Matrix Cluster",
    slug="matrix",
    description="3-node Proxmox VE cluster at home lab",
    tags=[{"name": "production"}, {"name": "homelab"}]
)

Bad:

# Don't create separate sites for logical groupings
site_proxmox = nb.dcim.sites.create(name="Proxmox Nodes", ...)
site_vms = nb.dcim.sites.create(name="Virtual Machines", ...)

Use device roles and tags for logical grouping, not separate sites.

Consistent Data Entry

Required fields: Always populate these

device = nb.dcim.devices.create(
    name="foxtrot",                    # ✅ Required
    device_type=device_type.id,        # ✅ Required
    device_role=role.id,               # ✅ Required
    site=site.id,                      # ✅ Required
    status="active",                   # ✅ Required
    description="AMD Ryzen 9 9955HX",  # ✅ Recommended
    tags=[{"name": "proxmox-node"}]    # ✅ Recommended
)

Optional but recommended:

  • description - Hardware specs, purpose
  • tags - For filtering and automation
  • comments - Additional notes
  • custom_fields - Serial numbers, purchase dates

Naming Conventions

Device Names

Use hostname only (no domain):

device = nb.dcim.devices.create(name="foxtrot", ...)   # ✅ Good
device = nb.dcim.devices.create(name="foxtrot.spaceships.work", ...)  # ❌ Bad

Rationale: Domain goes in DNS name field, not device name.

Interface Names

Match actual OS interface names:

# Linux
interface = nb.dcim.interfaces.create(name="enp1s0f0", ...)  # ✅ Good

# Not generic names
interface = nb.dcim.interfaces.create(name="eth0", ...)      # ❌ Bad (unless actually eth0)

Why? Enables automation that references interfaces by name.

DNS Naming Convention

Follow the Matrix cluster pattern: <service>-<number>-<purpose>.<domain>

# ✅ Good examples
dns_name="docker-01-nexus.spaceships.work"
dns_name="k8s-01-master.spaceships.work"
dns_name="proxmox-foxtrot-mgmt.spaceships.work"

# ❌ Bad examples
dns_name="server1.spaceships.work"         # Not descriptive
dns_name="nexus.spaceships.work"           # Missing number
dns_name="DOCKER-01.spaceships.work"       # Uppercase not allowed

See ../workflows/naming-conventions.md for complete rules.

Slugs

Lowercase with hyphens:

site = nb.dcim.sites.create(slug="matrix", ...)        # ✅ Good
site = nb.dcim.sites.create(slug="Matrix_Cluster", ...)  # ❌ Bad

Pattern: ^[a-z0-9-]+$


IP Address Management

Plan IP Hierarchy

Matrix cluster example:

192.168.0.0/16 (Home network supernet)
├── 192.168.3.0/24 (Management)
│   ├── 192.168.3.1     (Gateway)
│   ├── 192.168.3.5-7   (Proxmox nodes)
│   ├── 192.168.3.10+   (VMs)
│   └── 192.168.3.200+  (Reserved for future)
├── 192.168.5.0/24 (CEPH Public, MTU 9000)
├── 192.168.7.0/24 (CEPH Private, MTU 9000)
└── 192.168.8.0/24 (Corosync, VLAN 9)

Use Prefix Roles

Create roles for clarity:

# Create roles
role_mgmt = nb.ipam.roles.create(name='Management', slug='management')
role_storage = nb.ipam.roles.create(name='Storage', slug='storage')
role_cluster = nb.ipam.roles.create(name='Cluster', slug='cluster')

# Apply to prefixes
prefix = nb.ipam.prefixes.create(
    prefix='192.168.3.0/24',
    role=role_mgmt.id,
    description='Management network for Matrix cluster'
)

Reserve Important IPs

Explicitly reserve gateway, broadcast, network addresses:

# Gateway
gateway = nb.ipam.ip_addresses.create(
    address='192.168.3.1/24',
    status='active',
    role='anycast',
    description='Management network gateway'
)

# DNS servers
dns1 = nb.ipam.ip_addresses.create(
    address='192.168.3.2/24',
    status='reserved',
    description='Primary DNS server'
)

Use Prefixes as IP Pools

Enable automatic IP assignment:

prefix = nb.ipam.prefixes.create(
    prefix='192.168.3.0/24',
    is_pool=True,  # ✅ Allow automatic IP assignment
    ...
)

# Get next available IP
ip = prefix.available_ips.create(dns_name='docker-02.spaceships.work')

Don't manually track available IPs - let NetBox do it.

IP Status Values

Use appropriate status:

Status Use Case
active Currently in use
reserved Reserved for specific purpose
deprecated Planned for decommission
dhcp Managed by DHCP server
# Production VM
ip = nb.ipam.ip_addresses.create(address='192.168.3.10/24', status='active')

# Future expansion
ip = nb.ipam.ip_addresses.create(address='192.168.3.50/24', status='reserved')

VRF for Isolation

Use VRFs for true isolation:

# Management VRF (enforce unique IPs)
vrf_mgmt = nb.ipam.vrfs.create(
    name='management',
    enforce_unique=True,
    description='Management network VRF'
)

# Lab VRF (allow overlapping IPs)
vrf_lab = nb.ipam.vrfs.create(
    name='lab',
    enforce_unique=False,
    description='Lab/testing VRF'
)

When to use VRFs:

  • Multiple environments (prod, dev, lab)
  • Overlapping IP ranges
  • Security isolation

Device Management

Create Device Types First

Always create device type before devices:

# 1. Create manufacturer
manufacturer = nb.dcim.manufacturers.get(slug='minisforum')
if not manufacturer:
    manufacturer = nb.dcim.manufacturers.create(
        name='MINISFORUM',
        slug='minisforum'
    )

# 2. Create device type
device_type = nb.dcim.device_types.create(
    manufacturer=manufacturer.id,
    model='MS-A2',
    slug='ms-a2',
    u_height=0,  # Not rack mounted
    is_full_depth=False
)

# 3. Create device
device = nb.dcim.devices.create(
    name='foxtrot',
    device_type=device_type.id,
    ...
)

Use Device Roles Consistently

Create specific roles:

roles = [
    ('Proxmox Node', 'proxmox-node', '2196f3'),    # Blue
    ('Docker Host', 'docker-host', '4caf50'),      # Green
    ('K8s Master', 'k8s-master', 'ff9800'),        # Orange
    ('K8s Worker', 'k8s-worker', 'ffc107'),        # Amber
    ('Storage', 'storage', '9c27b0'),              # Purple
]

for name, slug, color in roles:
    nb.dcim.device_roles.create(
        name=name,
        slug=slug,
        color=color,
        vm_role=True  # If role applies to VMs too
    )

Consistent naming helps automation:

# Get all Proxmox nodes
proxmox_nodes = nb.dcim.devices.filter(role='proxmox-node')

# Get all Kubernetes workers
k8s_workers = nb.virtualization.virtual_machines.filter(role='k8s-worker')

Always Set Primary IP

Set primary IP after creating device and IPs:

# Create device
device = nb.dcim.devices.create(name='foxtrot', ...)

# Create interface
iface = nb.dcim.interfaces.create(device=device.id, name='enp2s0', ...)

# Create IP
ip = nb.ipam.ip_addresses.create(
    address='192.168.3.5/24',
    assigned_object_type='dcim.interface',
    assigned_object_id=iface.id
)

# ✅ Set as primary (critical for automation!)
device.primary_ip4 = ip.id
device.save()

Why? Primary IP is used by:

  • Ansible dynamic inventory
  • Monitoring tools
  • DNS automation

Document Interfaces

Include descriptions:

# Management
mgmt = nb.dcim.interfaces.create(
    device=device.id,
    name='enp2s0',
    type='2.5gbase-t',
    mtu=1500,
    description='Management interface (vmbr0)',
    tags=[{'name': 'management'}]
)

# CEPH public
ceph_pub = nb.dcim.interfaces.create(
    device=device.id,
    name='enp1s0f0',
    type='10gbase-x-sfpp',
    mtu=9000,
    description='CEPH public network (vmbr1)',
    tags=[{'name': 'ceph-public'}, {'name': 'jumbo-frames'}]
)

Virtualization

Create Cluster First

Create cluster before VMs:

# 1. Get/create cluster type
cluster_type = nb.virtualization.cluster_types.get(slug='proxmox')
if not cluster_type:
    cluster_type = nb.virtualization.cluster_types.create(
        name='Proxmox VE',
        slug='proxmox'
    )

# 2. Create cluster
cluster = nb.virtualization.clusters.create(
    name='Matrix',
    type=cluster_type.id,
    site=site.id,
    description='3-node Proxmox VE 9.x cluster'
)

# 3. Create VMs in cluster
vm = nb.virtualization.virtual_machines.create(
    name='docker-01',
    cluster=cluster.id,
    ...
)

Standardize VM Sizing

Use consistent resource allocations:

Role vCPUs Memory (MB) Disk (GB)
Small (dev) 2 2048 20
Medium (app) 4 8192 100
Large (database) 8 16384 200
XL (compute) 16 32768 500
VM_SIZES = {
    'small': {'vcpus': 2, 'memory': 2048, 'disk': 20},
    'medium': {'vcpus': 4, 'memory': 8192, 'disk': 100},
    'large': {'vcpus': 8, 'memory': 16384, 'disk': 200},
}

# Create VM with standard size
vm = nb.virtualization.virtual_machines.create(
    name='docker-01',
    cluster=cluster.id,
    **VM_SIZES['medium']
)

VM Network Configuration

Complete network setup:

# 1. Create VM
vm = nb.virtualization.virtual_machines.create(...)

# 2. Create interface
vm_iface = nb.virtualization.interfaces.create(
    virtual_machine=vm.id,
    name='eth0',
    type='virtual',
    enabled=True,
    mtu=1500
)

# 3. Assign IP from pool
prefix = nb.ipam.prefixes.get(prefix='192.168.3.0/24')
vm_ip = prefix.available_ips.create(
    dns_name='docker-01-nexus.spaceships.work',
    assigned_object_type='virtualization.vminterface',
    assigned_object_id=vm_iface.id,
    tags=[{'name': 'production-dns'}]  # ✅ Triggers PowerDNS sync
)

# 4. Set as primary IP
vm.primary_ip4 = vm_ip.id
vm.save()

Tagging Strategy

Tag Categories

Organize tags by purpose:

Infrastructure Type:

  • proxmox-node, ceph-node, docker-host, k8s-master, k8s-worker

Environment:

  • production, staging, development, lab

DNS Automation:

  • production-dns, lab-dns (triggers PowerDNS sync)

Management:

  • terraform, ansible, manual

Networking:

  • management, ceph-public, ceph-private, jumbo-frames

Tag Naming Convention

Lowercase with hyphens:

tags = [
    {'name': 'proxmox-node'},      # ✅ Good
    {'name': 'production-dns'},    # ✅ Good
    {'name': 'Proxmox Node'},      # ❌ Bad (spaces, capitals)
    {'name': 'production_dns'},    # ❌ Bad (underscores)
]

Apply Tags Consistently

Tag at multiple levels:

# Tag device
device = nb.dcim.devices.create(
    name='foxtrot',
    tags=[{'name': 'proxmox-node'}, {'name': 'ceph-node'}, {'name': 'production'}]
)

# Tag interface
iface = nb.dcim.interfaces.create(
    device=device.id,
    name='enp1s0f0',
    tags=[{'name': 'ceph-public'}, {'name': 'jumbo-frames'}]
)

# Tag IP
ip = nb.ipam.ip_addresses.create(
    address='192.168.3.5/24',
    tags=[{'name': 'production-dns'}, {'name': 'terraform'}]
)

Why? Enables granular filtering:

# Get all CEPH nodes
ansible-playbook -i netbox-inventory.yml setup-ceph.yml --limit tag_ceph_node

# Get all production DNS-enabled IPs
ips = nb.ipam.ip_addresses.filter(tag='production-dns')

Security

API Token Management

Store tokens in Infisical (Virgo-Core standard):

from infisical import InfisicalClient

def get_netbox_token() -> str:
    """Get NetBox API token from Infisical."""
    client = InfisicalClient()
    secret = client.get_secret(
        secret_name="NETBOX_API_TOKEN",
        project_id="7b832220-24c0-45bc-a5f1-ce9794a31259",
        environment="prod",
        path="/matrix"
    )
    return secret.secret_value

# Use token
nb = pynetbox.api('https://netbox.spaceships.work', token=get_netbox_token())

Never hardcode tokens:

# ❌ NEVER DO THIS
token = "a1b2c3d4e5f6..."
nb = pynetbox.api(url, token=token)

Use Minimal Permissions

Create tokens with appropriate scopes:

Use Case Permissions
Read-only queries Read only
Terraform automation Read + Write (DCIM, IPAM, Virtualization)
Full automation Read + Write (all)
Emergency admin Full access

Create separate tokens for different purposes:

NETBOX_API_TOKEN_READONLY   → Read-only queries
NETBOX_API_TOKEN_TERRAFORM  → Terraform automation
NETBOX_API_TOKEN_ANSIBLE    → Ansible dynamic inventory

HTTPS Only

Always use HTTPS in production:

# ✅ Production
nb = pynetbox.api('https://netbox.spaceships.work', token=token)

# ❌ Never HTTP in production
nb = pynetbox.api('http://netbox.spaceships.work', token=token)

For self-signed certs (dev/lab only):

# ⚠️ Dev/testing only
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

nb = pynetbox.api(
    'https://netbox.local',
    token=token,
    ssl_verify=False  # Only for self-signed certs in lab
)

Rotate Tokens Regularly

Best practice: Rotate every 90 days

# 1. Create new token in NetBox UI
# 2. Update Infisical secret
infisical secrets set NETBOX_API_TOKEN="new-token-here"

# 3. Test new token
./tools/netbox_api_client.py sites list

# 4. Delete old token in NetBox UI

Audit API Usage

Log API calls in production:

import logging

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    filename='/var/log/netbox-api.log'
)

logger = logging.getLogger(__name__)

def audit_api_call(action: str, resource: str, details: dict):
    """Log API calls for security audit."""
    logger.info(f"API Call: {action} {resource} - User: {os.getenv('USER')} - {details}")

# Usage
ip = nb.ipam.ip_addresses.create(address='192.168.1.1/24')
audit_api_call('CREATE', 'ip-address', {'address': '192.168.1.1/24'})

Performance

Use Filtering Server-Side

Filter on server:

# ✅ Efficient: Server filters results
devices = nb.dcim.devices.filter(site='matrix', status='active')

Don't filter client-side:

# ❌ Inefficient: Downloads all devices then filters
all_devices = nb.dcim.devices.all()
matrix_devices = [d for d in all_devices if d.site.slug == 'matrix']

Request Only Needed Fields

Use field selection:

# Get only specific fields
devices = nb.dcim.devices.filter(site='matrix', fields=['name', 'primary_ip4'])

Use Pagination for Large Datasets

Process in batches:

# Paginate automatically
for device in nb.dcim.devices.filter(site='matrix'):
    process_device(device)  # pynetbox handles pagination

# Manual pagination for control
page_size = 100
offset = 0
while True:
    devices = nb.dcim.devices.filter(limit=page_size, offset=offset)
    if not devices:
        break
    for device in devices:
        process_device(device)
    offset += page_size

Cache Lookups

Cache static data:

from functools import lru_cache

@lru_cache(maxsize=128)
def get_site(site_slug: str):
    """Cached site lookup."""
    return nb.dcim.sites.get(slug=site_slug)

@lru_cache(maxsize=256)
def get_device_type(slug: str):
    """Cached device type lookup."""
    return nb.dcim.device_types.get(slug=slug)

Use Bulk Operations

Bulk create is faster:

# ✅ Fast: Bulk create
ips = [
    {'address': f'192.168.3.{i}/24', 'status': 'active'}
    for i in range(10, 20)
]
nb.ipam.ip_addresses.create(ips)

# ❌ Slow: Loop with individual creates
for i in range(10, 20):
    nb.ipam.ip_addresses.create(address=f'192.168.3.{i}/24', status='active')

API Integration

Error Handling

Always handle errors:

import pynetbox
from requests.exceptions import HTTPError

try:
    device = nb.dcim.devices.get(name='foxtrot')
    if not device:
        console.print("[yellow]Device not found[/yellow]")
        return None

except HTTPError as e:
    if e.response.status_code == 404:
        console.print("[red]Resource not found[/red]")
    elif e.response.status_code == 403:
        console.print("[red]Permission denied[/red]")
    else:
        console.print(f"[red]HTTP Error: {e}[/red]")
    sys.exit(1)

except pynetbox.RequestError as e:
    console.print(f"[red]NetBox API Error: {e.error}[/red]")
    sys.exit(1)

except Exception as e:
    console.print(f"[red]Unexpected error: {e}[/red]")
    sys.exit(1)

Validate Before Creating

Validate input before API calls:

import ipaddress
import re

def validate_ip(ip_str: str) -> bool:
    """Validate IP address format."""
    try:
        ipaddress.ip_interface(ip_str)
        return True
    except ValueError:
        return False

def validate_dns_name(name: str) -> bool:
    """Validate DNS naming convention."""
    pattern = r'^[a-z0-9-]+-\d{2}-[a-z0-9-]+\.[a-z0-9.-]+$'
    return bool(re.match(pattern, name))

# Use before API calls
if not validate_ip(ip_address):
    raise ValueError(f"Invalid IP address: {ip_address}")

if not validate_dns_name(dns_name):
    raise ValueError(f"Invalid DNS name: {dns_name}")

ip = nb.ipam.ip_addresses.create(address=ip_address, dns_name=dns_name)

Check Before Create

Check existence before creating:

# Check if device exists
device = nb.dcim.devices.get(name='foxtrot')

if device:
    console.print("[yellow]Device already exists, updating...[/yellow]")
    device.status = 'active'
    device.save()
else:
    console.print("[green]Creating new device...[/green]")
    device = nb.dcim.devices.create(name='foxtrot', ...)

Automation Patterns

Idempotent Operations

Design operations to be safely re-run:

def ensure_vm_exists(name: str, cluster: str, **kwargs) -> pynetbox.core.response.Record:
    """Ensure VM exists (idempotent)."""
    # Check if exists
    vm = nb.virtualization.virtual_machines.get(name=name)

    if vm:
        # Update if needed
        updated = False
        for key, value in kwargs.items():
            if getattr(vm, key) != value:
                setattr(vm, key, value)
                updated = True

        if updated:
            vm.save()
            console.print(f"[yellow]Updated VM: {name}[/yellow]")
        else:
            console.print(f"[green]VM unchanged: {name}[/green]")

        return vm
    else:
        # Create new
        vm = nb.virtualization.virtual_machines.create(
            name=name,
            cluster=nb.virtualization.clusters.get(name=cluster).id,
            **kwargs
        )
        console.print(f"[green]Created VM: {name}[/green]")
        return vm

Terraform Integration

See terraform-provider-guide.md for complete examples.

Key pattern:

# Use NetBox as data source
data "netbox_prefix" "management" {
  prefix = "192.168.3.0/24"
}

# Create IP in NetBox via Terraform
resource "netbox_ip_address" "vm_ip" {
  ip_address  = cidrhost(data.netbox_prefix.management.prefix, 10)
  dns_name    = "docker-01-nexus.spaceships.work"
  status      = "active"
  tags        = ["terraform", "production-dns"]
}

Ansible Dynamic Inventory

See ../workflows/ansible-dynamic-inventory.md.

Key pattern:

# netbox-dynamic-inventory.yml
plugin: netbox.netbox.nb_inventory
api_endpoint: https://netbox.spaceships.work
token: !vault |
  $ANSIBLE_VAULT;...
group_by:
  - device_roles
  - tags
  - site

Troubleshooting

Common Issues

Problem: "Permission denied" errors

Solution: Check API token permissions

# Test token
curl -H "Authorization: Token YOUR_TOKEN" \
  https://netbox.spaceships.work/api/

Problem: IP not syncing to PowerDNS

Solution: Check tags

# IP must have tag matching zone rules
ip = nb.ipam.ip_addresses.get(address='192.168.3.10/24')
print(f"Tags: {[tag.name for tag in ip.tags]}")
# Must include 'production-dns' or matching tag

Problem: Slow API queries

Solution: Use filtering and pagination

# ❌ Slow
all_devices = nb.dcim.devices.all()

# ✅ Fast
devices = nb.dcim.devices.filter(site='matrix', limit=50)

Debug Mode

Enable verbose logging:

import logging

# Enable debug logging
logging.basicConfig(level=logging.DEBUG)

# Now pynetbox will log all API calls
nb = pynetbox.api('https://netbox.spaceships.work', token=token)
devices = nb.dcim.devices.all()


Next: Review API Integration Patterns