|
| 1 | +# Video Analyzer – Unlock Insights from Your Videos |
| 2 | + |
| 3 | +> **Status:** Draft – initial contribution by @ochoaughini |
| 4 | +> This page introduces a reference application that demonstrates how to use the Gemini API to transcribe, summarise and search video content. |
| 5 | +
|
| 6 | +## What the app does |
| 7 | + |
| 8 | +* Drag-and-drop a local video (mp4 / mov / webm). |
| 9 | +* Automatically extracts audio → **speech-to-text** using Gemini audio transcription. |
| 10 | +* Captures still frames every *N* seconds and sends them to Gemini **multimodal** endpoint for scene description. |
| 11 | +* Generates: |
| 12 | + * An SRT subtitle file. |
| 13 | + * A bullet-point **summary** (topics, key moments). |
| 14 | + * Embeddings index allowing **semantic search** over the transcript. |
| 15 | + |
| 16 | +## Quick start |
| 17 | + |
| 18 | +```bash |
| 19 | +pip install google-generativeai moviepy ffmpeg-python |
| 20 | +python video_analyzer.py --input my_video.mp4 --model gemini-pro-vision |
| 21 | +``` |
| 22 | + |
| 23 | +The script will write: |
| 24 | + |
| 25 | +* `my_video.srt` – subtitles |
| 26 | +* `my_video.summary.txt` – text summary |
| 27 | +* `my_video.index.json` – embedding index for search |
| 28 | + |
| 29 | +## Core code snippets |
| 30 | + |
| 31 | +### 1 · Extract audio and transcribe |
| 32 | +```python |
| 33 | +import google.generativeai as genai |
| 34 | +from moviepy.editor import VideoFileClip |
| 35 | + |
| 36 | +audio_path = "tmp_audio.mp3" |
| 37 | +VideoFileClip(video_path).audio.write_audiofile(audio_path, logger=None) |
| 38 | + |
| 39 | +model = genai.GenerativeModel("gemini-pro") |
| 40 | +transcription = model.generate_content(Path(audio_path).read_bytes(), mime_type="audio/mpeg") |
| 41 | +``` |
| 42 | + |
| 43 | +### 2 · Describe video frames |
| 44 | +```python |
| 45 | +from pathlib import Path |
| 46 | +from PIL import Image |
| 47 | + |
| 48 | +def sample_frames(video_path, every_sec=5): |
| 49 | + clip = VideoFileClip(video_path) |
| 50 | + for t in range(0, int(clip.duration), every_sec): |
| 51 | + frame = clip.get_frame(t) |
| 52 | + img = Image.fromarray(frame) |
| 53 | + fname = f"frame_{t:04}.png" |
| 54 | + img.save(fname) |
| 55 | + yield fname |
| 56 | + |
| 57 | +vision_model = genai.GenerativeModel("gemini-pro-vision") |
| 58 | + |
| 59 | +scene_descriptions = [] |
| 60 | +for frame_file in sample_frames(video_path): |
| 61 | + desc = vision_model.generate_content(Path(frame_file).read_bytes(), mime_type="image/png") |
| 62 | + scene_descriptions.append(desc.text) |
| 63 | +``` |
| 64 | + |
| 65 | +### 3 · Summarise and index |
| 66 | +```python |
| 67 | +summary = model.generate_content( |
| 68 | + "Summarise this transcript:\n" + transcription.text |
| 69 | +).text |
| 70 | + |
| 71 | +embeddings = model.embed_content(transcription.text.split("\n")) |
| 72 | +``` |
| 73 | + |
| 74 | +## Folder layout |
| 75 | +``` |
| 76 | +video-analyzer/ |
| 77 | +├── video_analyzer.py # main script |
| 78 | +├── templates/ # optional web UI |
| 79 | +└── README.md # setup & usage docs |
| 80 | +``` |
| 81 | + |
| 82 | +## Next steps |
| 83 | +* Add a Streamlit front-end. |
| 84 | +* Integrate **Gemini function-calling** for automatic action extraction. |
| 85 | +* Accept YouTube URLs (download + analyse). |
| 86 | + |
| 87 | +--- |
| 88 | +**Contributing** – please feel free to open issues or PRs to improve this example. |
0 commit comments