Abstract: This work addresses action segmentation in videos under sparse timestamp supervision, where only a single frame per action segment—referred to as a timestamp—is labeled during training. We ...