zhongwei/gh-vuer-ai-vuer-skill-marketplace-skills-docs-vuer-ai

Files

Zhongwei Li 265175ed82 Initial commit

2025-11-30 09:05:02 +08:00

5.9 KiB

Raw Permalink Blame History

Collecting Render from Virtual Cameras

Overview

This tutorial covers two methods for capturing rendered images from virtual cameras in Vuer, with a focus on the recommended ondemand approach.

Methods Overview

Method 1: Frame/Time Mode (Legacy)

Uses event handlers to collect rendered images. Only supported in stream='frame' or stream='time' mode.

Limitations:

Less backend control
Continuous rendering even when not needed
Higher resource usage

Method 2: OnDemand Mode (Recommended)

Uses a synchronous grab_render RPC API. Only available in stream='ondemand' mode.

Advantages:

Superior backend control
Renders only when explicitly requested
Lower computational overhead
Support for depth rendering

Complete Example: OnDemand Mode

import asyncio
import numpy as np
from vuer import Vuer, VuerSession
from vuer.schemas import Scene, CameraView, DefaultScene, Box, Urdf
from PIL import Image

app = Vuer()

@app.spawn(start=True)
async def main(session: VuerSession):
    # Set up the scene
    session.set @ Scene(
        DefaultScene(),

        # Add some objects to render
        Box(
            args=[1, 1, 1],
            position=[0, 0.5, 0],
            color="red",
            key="box",
        ),

        Urdf(
            src="/static/robot.urdf",
            position=[2, 0, 0],
            key="robot",
        ),

        # Configure camera with ondemand streaming
        CameraView(
            key="ego",
            fov=50,
            width=640,
            height=480,
            position=[0, 2, 5],
            rotation=[0, 0, 0],
            stream="ondemand",
            renderDepth=True,  # Enable depth rendering
            near=0.1,
            far=100,
        ),
    )

    # Wait for scene to initialize
    await asyncio.sleep(0.5)

    # Capture renders from different positions
    for i in range(10):
        # Update camera position
        x = 5 * np.cos(i * 0.2)
        z = 5 * np.sin(i * 0.2)

        session.update @ CameraView(
            key="ego",
            position=[x, 2, z],
            rotation=[0, i * 0.2, 0],
        )

        # Small delay for camera update
        await asyncio.sleep(0.1)

        # Grab the render
        result = session.grab_render(downsample=1, key="ego")

        if result:
            # Process RGB image
            rgb_data = result.get("rgb")
            if rgb_data:
                # Convert to numpy array
                img_array = np.frombuffer(rgb_data, dtype=np.uint8)
                img_array = img_array.reshape((480, 640, 3))

                # Save image
                img = Image.fromarray(img_array)
                img.save(f"render_{i:03d}.png")
                print(f"Saved render_{i:03d}.png")

            # Process depth map
            depth_data = result.get("depth")
            if depth_data:
                depth_array = np.frombuffer(depth_data, dtype=np.float32)
                depth_array = depth_array.reshape((480, 640))

                # Save depth map
                depth_img = Image.fromarray(
                    (depth_array * 255).astype(np.uint8)
                )
                depth_img.save(f"depth_{i:03d}.png")
                print(f"Saved depth_{i:03d}.png")

    print("Finished capturing renders")

    # Keep session alive
    while True:
        await asyncio.sleep(1.0)

app.run()

Key API: `grab_render()`

result = session.grab_render(downsample=1, key="ego")

Parameters

downsample: Downsample factor (1 = no downsampling, 2 = half resolution)
key: Camera key to capture from

Returns

Dictionary containing:

rgb: RGB image data as bytes
depth: Depth map data as float32 array (if renderDepth=True)

Depth Rendering

Enable depth map capture by setting renderDepth=True:

CameraView(
    key="ego",
    renderDepth=True,
    stream="ondemand",
    # ... other parameters
)

Benefits:

Captures depth without changing object materials
Available since 2024 update
Minimal computational overhead

Legacy Method: Event Handler

For stream='frame' or stream='time' mode:

async def handle_camera_view(event, session):
    """Handle CAMERA_VIEW events"""
    if event.key != "ego":
        return

    # Access rendered image
    image_data = event.value.get("image")

    # Process image data
    print(f"Received image: {len(image_data)} bytes")

app.add_handler("CAMERA_VIEW", handle_camera_view)

Multi-Camera Capture

Capture from multiple cameras:

@app.spawn(start=True)
async def main(session: VuerSession):
    # Set up multiple cameras
    session.set @ Scene(
        DefaultScene(),

        CameraView(
            key="front-camera",
            position=[0, 1, 5],
            stream="ondemand",
            width=640,
            height=480,
        ),

        CameraView(
            key="top-camera",
            position=[0, 10, 0],
            rotation=[-1.57, 0, 0],
            stream="ondemand",
            width=640,
            height=480,
        ),
    )

    await asyncio.sleep(0.5)

    # Capture from both cameras
    front_render = session.grab_render(key="front-camera")
    top_render = session.grab_render(key="top-camera")

    # Process renders...

Best Practices

Use ondemand mode - More efficient for programmatic rendering
Enable depth rendering - Get depth maps without material changes
Add small delays - Wait for camera updates before grabbing
Set appropriate resolution - Balance quality and performance
Use downsampling - Reduce data size when full resolution isn't needed

Performance Considerations

The ondemand approach:

Minimizes computational overhead
Only renders when explicitly requested
Ideal for resource-constrained applications
Perfect for dataset generation and batch processing

Source

Documentation: https://docs.vuer.ai/en/latest/tutorials/camera/grab_render_virtual_camera.html

5.9 KiB Raw Permalink Blame History