Skip to main content

Performance Monitoring

PLEM's monitoring system provides 1kHz control loop performance data via ROS2 topics. It uses a wait-free design that never blocks the RT loop.

Key Topics

TopicRateQoSDescription
/rt_raw1kHzBestEffortTiming, joint state, control values
/rt_eventsOn eventReliableTransientFaults, mode changes, safety triggers
/rt_monitor_stats10HzReliableQueue status, overflow counts
Multi-Robot Environment

When robot_id is set, namespaces are automatically applied like /{robot_id}/rt_raw.

RtSample Key Fields

FieldTypeDescription
loop_exec_usfloatLoop processing time [µs]
loop_jitter_usfloatDeviation from target cycle (1000µs) [µs]
deadline_missuint8Deadline miss flag (0/1)
actual_pos[0..5]floatJoint position [rad]
actual_vel[0..5]floatJoint velocity [rad/s]
actual_torque[0..5]floatJoint torque [Nm]
cmd_torque[0..5]floatCommand torque [Nm]

Full field list: ros2 topic echo /rt_raw --once

Timing Metrics

  • loop_exec_us: Processing time from loop wakeup to completion. Normal: 30-60 µs
  • loop_jitter_us: actual cycle - target cycle. Positive means late wakeup (kernel scheduling delay). Normal: 5-15 µs

PlotJuggler Visualization

# Install
sudo apt install ros-humble-plotjuggler-ros

# Run
ros2 run plotjuggler plotjuggler

Setup:

  1. StreamingStart: ROS2 Topic Subscriber → Check /rt_rawOK
  2. Drag fields from left panel to plot:
    • loop_jitter_us (timing stability)
    • loop_exec_us (computational load)
  3. Set time window to 10 seconds

Threshold Display: Right-click plot → Add Custom Series for warning line (50µs yellow) and error line (200µs red).

Save Layout: File → Save Layout As... → plem_rt_monitor.xml

Performance Criteria and Thresholds

Performance Thresholds

Immediate action is required when exceeding these thresholds. Sustained warning conditions can affect system stability.

MetricHealthyWarningErrorAction
loop_jitter_us< 15 µs50-100 µs> 100 µsSee diagnostic patterns below
loop_exec_us< 60 µs80-100 µs> 100 µsReview control parameters
deadline_miss0> 0> 10/secInvestigate immediately
rt_overflow_delta0> 0> 100/secConsider increasing queue size

Performance Degradation Diagnosis

High Jitter Patterns

PatternSymptomsPossible CauseSolution
Periodic spikes180µs spikes every 5 secondsKernel background tasks (RCU, kworker)Check CPU isolation and kernel thread affinity
Gradual increase10→80µs over timeMemory pressure (heap fragmentation, swapping)Verify memory locking settings
Random large spikesIrregular 843µs, 1205µs, etc.CPU not isolated or RT priority not setCheck RT priority and CPU isolation

Slow Execution Time Patterns

PatternSymptomsPossible CauseSolution
Sustained high executionContinuously > 80µsExpensive control computationReview control parameters, check optimization options
Sudden jumpStep change 52µs → 153µsMode change (entering TRAJECTORY)Expected behavior. Verify exec_us stays < 100µs

Diagnostic Commands

# Monitor jitter patterns
ros2 topic echo /rt_raw --field loop_jitter_us

# Average execution time (1000 samples)
ros2 topic echo /rt_raw --field loop_exec_us | \
awk '{sum+=$1; count++} count==1000 {print "Average: " sum/count " µs"; exit}'

# Alert on high jitter
ros2 topic echo /rt_raw --field loop_jitter_us | \
awk '$1 > 50 {print "Warning: jitter " $1 " µs"}'

# Correlate with events
ros2 topic echo /rt_events

Queue Management

Data transfer from RT loop to ROS2 topics uses wait-free queues. When the queue is full, new samples are dropped.

# Monitor overflow (0 is normal)
ros2 topic echo /rt_monitor_stats --field rt_overflow_delta
ScenarioQueue SizeRationale
Development (default)81928-second buffer, handles debugger pauses
Production40964-second buffer, lower memory usage
Rosbag Recording1638416-second buffer for disk I/O bursts
Queue Size Recommendation

We recommend using the default (8192) unless RAM is severely constrained.

Quick Reference

# RT performance
ros2 topic hz /rt_raw # Publishing rate (should be ~1kHz)
ros2 topic echo /rt_raw --field loop_jitter_us
ros2 topic echo /rt_raw --field loop_exec_us
ros2 topic echo /rt_monitor_stats

# Event monitoring
ros2 topic echo /rt_events

# Multi-robot environment
ros2 topic echo /arm_left/rt_raw --field loop_jitter_us

Next Steps:

  1. Set up PlotJuggler with recommended layout
  2. Verify baseline metrics for your application
  3. Use diagnostic pattern tables and commands to analyze root causes when issues occur