This notebook demonstrates how to use GPT's visual capabilities with a video. GPT-4 doesn't take videos as input directly, but we can use vision and the new 128K context widnow to describe the static frames of a whole video at once. We'll walk through two examples:
- Using GPT-4 to get a description of a video
- Generating a voiceover for a video with GPT-4 and the TTS API