Supercuts [⧉✂︎|>]

Reconstructing Video Through Word-Level ML Segmentation

This project explores video segmentation and reassembly using machine learning and generative techniques. Starting with a source video, I use a speech-to-text model to generate a word-level transcript. Each word is then isolated into a separate video clip using precise timecodes.

From there, I've experimented with two recomposition strategies. 1. Text-Based Reconstruction — A new sequence of words is generated to define an edit of the video by assembling the matching clips in that order. 2. Intensity-Based Sorting — Using audio RMS amplitude, each clip is assigned a rough loudness value, and clips are sorted and reassembled accordingly.