Latent Space Roadmap

Enabling Visual Action Planning for Object Manipulation through Latent Space Roadmap

Martina Lippi, Petra Poklukar, Michael C. Welle*, Anastasia Varava, Hang Yin, Alessandro Marino, and Danica Kragic

We present a framework for visual action planning of complex manipulation tasks with high-dimensional state spaces, focusing on manipulation of deformable objects. We propose a Latent Space Roadmap (LSR) for task planning which is a graph-based structure globally capturing the system dynamics in a low-dimensional latent space. Our framework consists of three parts: (1) a Mapping Module (MM) that maps observations given in the form of images into a structured latent space extracting the respective states as well as generates observations from the latent states, (2) the LSR which builds and connects clusters containing similar states in order to find the latent plans between start and goal states extracted by MM, and (3) the Action Proposal Module that complements the latent plan found by the LSR with the corresponding actions. We present a thorough investigation of our framework on simulated box stacking and rope/box manipulation tasks, and a folding task executed on a real robot.

*Contributed equally and listed in alphabetical order

Download preprint

Folding execution videos

Fold 1

Start state

Goal state

Execution videos fold 1

Fold 2

Start state

Goal state

Execution videos fold 2

Fold 3

Start state

Goal state

Execution videos fold 3

Fold 4

Start state

Goal state

Execution videos fold 4

Fold 5

Start state

Goal state

Execution videos fold 5

Fold layer

Start state

Goal state

Execution videos fold layer

Code Repository

The code used to train the vae, build the graph in the latent space, and the action proposal network including all used hyperparameter can be found on the gitrepo:

Code Repository

Contact

Martina Lippi; lippi(at)kth.se, martina.lippi(at)uniroma3.it; KTH Royal Institute of Technology, Sweden and Roma Tre University, Italy

Petra Poklukar; poklukar(at)kth.se; KTH Royal Institute of Technology, Sweden

Michael C. Welle; mwelle(at)kth.se; KTH Royal Institute of Technology, Sweden

Anastasiia Varava; varava(at)kth.se; KTH Royal Institute of Technology, Sweden

Hang Yin; hyin(at)kth.se; KTH Royal Institute of Technology, Sweden

Alessandro Marino; al.marino(at)unicas.it; University of Cassino and Southern Lazio, Italy

Danica Kragic; dani(at)kth.se; KTH Royal Institute of Technology, Sweden

Enabling Visual Action Planning for Object Manipulation through Latent Space Roadmap

Martina Lippi*, Petra Poklukar*, Michael C. Welle*, Anastasia Varava, Hang Yin, Alessandro Marino, and Danica Kragic

Method

Folding execution videos

Fold 1

Start state

Goal state

Fold 2

Start state

Goal state

Fold 3

Start state

Goal state

Fold 4

Start state

Goal state

Fold 5

Start state

Goal state

Fold layer

Start state

Goal state

Code Repository

Contact

Martina Lippi, Petra Poklukar, Michael C. Welle*, Anastasia Varava, Hang Yin, Alessandro Marino, and Danica Kragic