Humanoid Modeling Using Real Gait Data

Train a humanoid robot to imitate human gait by using MuJoCo, Opensim, dm_control and Stable Baselines-3.

Project Overview

This project was developed for the Reinforcement Learning for Optimizations in Biomechanics (RL4OB) course at Technical University of Munich (TUM).
Our goal was to teach a humanoid robot to imitate human gait by using real motion capture data and training an RL agent in a physics simulation.

The project was conducted by utilizing OpenSim, MuJoCo, and dm_control environments and Stable Baselines-3.


🔹 Data Collection

  • Human motion data was collected using a Vicon Motion Capture System by TUM MIRMI.
  • Reflective markers were attached to the subject’s body, and their 3D positions (X, Y, Z) were recorded during walking.
  • These trajectories served as target reference motions for our RL agent.

Reference Human Gait (MIRMI Vicon Data simulated by Opensim)


🔹 Data Processing

Static Trial

  • Unnecessary markers were removed and coordinates rotated to align with OpenSim’s reference frame.
  • Data was stored as .trc files.
  • The OpenSim Scale Tool was used to personalize the biomechanical model to the subject’s anatomy.

Dynamic Trial

  • Marker trajectories during gait were cleaned using moving mean filters.
  • Non-relevant joints were removed to match the humanoid configuration.
  • Extracted gait cycles were used for Inverse Kinematics (IK) computations in OpenSim.

🔹 Simulation and Reinforcement Learning

We used DeepMind’s dm_control Humanoid-v5 environment integrated with the MuJoCo physics engine.
Each simulation step followed the standard RL interaction loop:

State → Action → Reward → Next State

  • Agent: Humanoid robot
  • State: Joint positions and velocities
  • Action: Torque commands to joints
  • Environment: MuJoCo physics simulation
  • Goal: Maximize total reward over training episodes

🔹 Reward Function

To teach realistic human-like gait, we designed a multi-component reward:

  • R_track: Encourages humanoid joints to follow real human joint angles
  • R_upright: Rewards keeping torso upright and balanced
  • R_control: Penalizes excessive control effort for smoother, energy-efficient motion

[ R_{total} = w_1 R_{track} + w_2 R_{upright} + w_3 R_{control} ]

Each episode was initialized from a real gait-cycle frame extracted from .sto data to ensure realistic starting poses.


🔹 Experiments & Results

We trained multiple policies using Proximal Policy Optimization (PPO) with varying parameters:

Experiment Timesteps Gravity Focus Demo Link
Reference Human Gait — — MIRMI Vicon Data View
PPO_100000 100k -9.8 m/s² Standing stability View
PPO_1500000 (Upward) 1.5M -9.8 m/s² Standing & partial motion View
PPO_4000000 4M -9.8 m/s² Gait learning View
PPO_8000000 8M -9.8 m/s² Extended training View
PPO_500000 (1.0 gravity) 500k -1.0 m/s² Reduced-gravity learning View
PPO_1000000 (1.0 gravity) 1M -1.0 m/s² Extended reduced-gravity training View

The humanoid successfully learned partial gait-like movements.
However, full stable walking was not achieved — highlighting the challenge of combining stability and motion imitation in high-dimensional control.


  • Team: OÄŸuzhan EÅŸen and Arif Güvenkaya
  • Course: Reinforcement Learning for Optimization in Biomechanics
  • Supervisor: Gheorghe Lisca
  • Tools: Python, MuJoCo, OpenSim, dm_control, Stable Baselines-3