Multi-modal Sports Video Summarization
Built a multi-modal pipeline using BERT and CNNs to detect and align important football moments from commentary and scoreboard video. Final output is a stitched highlight video.
- Fine-tuned BERT for audio commentary classification.
- Used image processing on scoreboard to extract goal/match events.
- Weighted timestamp fusion for final highlight stitching.
- UI built using Streamlit for interactive demo.
Here’s the Architecture of the system:

This research was published in the 2025 12th International Conference on Emerging Trends in Engineering & Technology - Signal and Information Processing.