OpenArm Dataset
Overview
OpenArm Dataset is a format for storing data collected by OpenArm. This dataset format is designed to be flexible, easy to use.
repository: https://github.com/enactic/openarm_dataset
Dataset Structure
dataset
├── episodes
│ ├── 0
│ │ ├── action
│ │ │ ├── arms
│ │ │ │ ├── left
│ │ │ │ │ └── qpos.parquet
│ │ │ │ └── right
│ │ │ │ └── qpos.parquet
│ │ │ └── lifter
│ │ │ └── elevation.parquet
│ │ ├── cameras
│ │ │ ├── ceiling
│ │ │ │ ├── 1778841171518984960.jpeg
│ │ │ │ | ...
│ │ │ ├── head_left
│ │ │ │ ├── 1778841171520606976.jpeg
│ │ │ │ │ ...
│ │ │ ├── head_right
│ │ │ │ ├── 1778841171520606976.jpeg
│ │ │ │ │ ...
│ │ │ ├── wrist_left
│ │ │ │ ├── 1778841171528291840.jpeg
│ │ │ │ │ ...
│ │ │ └── wrist_right
│ │ │ ├── 1778841171519556096.jpeg
│ │ │ │ ...
│ │ └── obs
│ │ ├── arms
│ │ │ ├── left
│ │ │ │ └── state.parquet
│ │ │ └── right
│ │ │ └── state.parquet
│ │ └── lifter
│ │ └── elevation.parquet
│ ├── 1
│ │ ├── action
│ │ ├── cameras
│ │ └── obs
│ └── ...
└── metadata.yaml
Install
pip install openarm_dataset
or install from source:
git clone https://github.com/enactic/openarm_dataset
cd openarm_dataset
uv sync
Requires Python 3.10+.
Open a dataset
Point Dataset at the root directory of a recording:
>>> import openarm_dataset
>>> dataset = openarm_dataset.Dataset("tests/fixture/dataset_0.3.0")
>>> dataset.num_episodes
2
>>> dataset.meta.e
dataset.meta.episodes dataset.meta.equipment
>>> dataset.meta.episodes
[{'id': '0', 'success': False, 'task_index': 0}, {'id': '3', 'success': True, 'task_index': 0}]
>>> dataset.meta.tasks
[{'prompt': 'Run test.', 'description': 'Longer task description if need.'}]
Load observations and actions
Observations and actions are returned as a dict of pandas DataFrames keyed by
signal name, indexed by timestamp. Pass use_unixtime=True for a float index
instead of a datetime index.
>>> obs = dataset.load_obs(0)
>>> obs.keys()
dict_keys(['arms/right/qpos', 'arms/right/qvel', 'arms/right/qtorque', 'arms/left/qpos', 'arms/left/qvel', 'arms/left/qtorque', 'lifter/elevation'])
>>> obs["arms/right/qpos"].shape
(746, 8)
>>> action = dataset.load_action(0)
>>> action["arms/right/qpos"].columns.tolist()
['joint1', 'joint2', 'joint3', 'joint4', 'joint5', 'joint6', 'joint7', 'gripper']
Load camera frames
>>> cameras = dataset.load_cameras(0)
>>> cameras.keys()
dict_keys(['left_wrist', 'right_wrist', 'ceiling', 'head'])
>>> ceiling = cameras["ceiling"]
>>> ceiling.num_frames
3
>>> frame = ceiling.get_frame(0).load() # returns a numpy array
>>> frame.shape # (H, W, 3) uint8
(600, 960, 3)
>>> for f in ceiling.frames():
... frame=f.load() # iterate frames
>>> for path in ceiling.all_files:
... print(path) # iterate frame file paths
Sample synchronized timesteps
dataset.sample aligns observations, actions, and camera frames onto a fixed
rate so a policy can consume them as one timestep:
>>> samples = dataset.sample(hz=30, episode_index=0)
>>> samples[0].timestamp
np.float64(1772010251.6202147)
>>> samples[0].obs.keys()
dict_keys(['arms/right/qpos', 'arms/right/qvel', 'arms/right/qtorque', 'arms/left/qpos', 'arms/left/qvel', 'arms/left/qtorque', 'lifter/elevation'])
>>> [(name, img.load().shape) for name, img in samples[0].cameras.items()]
[('wrist_left', (600, 960, 3)), ('wrist_right', (600, 960, 3)), ('ceiling', (600, 960, 3)), ('head', (600, 960, 3))]
Conversion to other formats
Conversion to Lerobot Dataset (v2.1)
openarm-dataset-convert /path/to/OpenArmDataset /path/to/lerobot_dataset --format lerobot_v2.1