This project explores video segmentation and reassembly using machine learning and generative techniques.
Starting with a source video, I use a speech-to-text model to generate a word-level transcript. Each word is then isolated into a separate video clip using precise timecodes.
From there, I’ve experimented with two recomposition strategies.
1. Text-Based Reconstruction. A new sequence of words is generated to define an edit of the video by assembling the matching clips in that order.
2. Intensity-Based Sorting. Using audio RMS amplitude, each clip is assigned a rough loudness value. A separate cut is generated by sorting the clips along an intensity curve.