Humanoid Modeling Using Real Gait Data

Project Overview

This project was developed for the Reinforcement Learning for Optimizations in Biomechanics (RL4OB) course at Technical University of Munich (TUM).
Our goal was to teach a humanoid robot to imitate human gait by using real motion capture data and training an RL agent in a physics simulation.

The project was conducted by utilizing OpenSim, MuJoCo, and dm_control environments and Stable Baselines-3.

🔹 Data Collection

Human motion data was collected using a Vicon Motion Capture System by TUM MIRMI.
Reflective markers were attached to the subject’s body, and their 3D positions (X, Y, Z) were recorded during walking.
These trajectories served as target reference motions for our RL agent.

Reference Human Gait (MIRMI Vicon Data simulated by Opensim)

🔹 Data Processing

Static Trial

Unnecessary markers were removed and coordinates rotated to align with OpenSim’s reference frame.
Data was stored as .trc files.
The OpenSim Scale Tool was used to personalize the biomechanical model to the subject’s anatomy.

Dynamic Trial

Marker trajectories during gait were cleaned using moving mean filters.
Non-relevant joints were removed to match the humanoid configuration.
Extracted gait cycles were used for Inverse Kinematics (IK) computations in OpenSim.

🔹 Simulation and Reinforcement Learning

We used DeepMind’s dm_control Humanoid-v5 environment integrated with the MuJoCo physics engine.
Each simulation step followed the standard RL interaction loop:

State → Action → Reward → Next State

Agent: Humanoid robot
State: Joint positions and velocities
Action: Torque commands to joints
Environment: MuJoCo physics simulation
Goal: Maximize total reward over training episodes

🔹 Reward Function

To teach realistic human-like gait, we designed a multi-component reward:

R_track: Encourages humanoid joints to follow real human joint angles
R_upright: Rewards keeping torso upright and balanced
R_control: Penalizes excessive control effort for smoother, energy-efficient motion

[ R_{total} = w_1 R_{track} + w_2 R_{upright} + w_3 R_{control} ]

Each episode was initialized from a real gait-cycle frame extracted from .sto data to ensure realistic starting poses.

🔹 Experiments & Results

We trained multiple policies using Proximal Policy Optimization (PPO) with varying parameters:

Experiment	Timesteps	Gravity	Focus	Demo Link
Reference Human Gait	—	—	MIRMI Vicon Data	View
PPO_100000	100k	-9.8 m/s²	Standing stability	View
PPO_1500000 (Upward)	1.5M	-9.8 m/s²	Standing & partial motion	View
PPO_4000000	4M	-9.8 m/s²	Gait learning	View
PPO_8000000	8M	-9.8 m/s²	Extended training	View
PPO_500000 (1.0 gravity)	500k	-1.0 m/s²	Reduced-gravity learning	View
PPO_1000000 (1.0 gravity)	1M	-1.0 m/s²	Extended reduced-gravity training	View

The humanoid successfully learned partial gait-like movements.
However, full stable walking was not achieved — highlighting the challenge of combining stability and motion imitation in high-dimensional control.

Team: Oğuzhan Eşen and Arif Güvenkaya
Course: Reinforcement Learning for Optimization in Biomechanics
Supervisor: Gheorghe Lisca
Tools: Python, MuJoCo, OpenSim, dm_control, Stable Baselines-3