-
Notifications
You must be signed in to change notification settings - Fork 26
Youtube video finder using geminillm #32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
iamwatchdogs
merged 8 commits into
Grow-with-Open-Source:main
from
veerababu1729:Youtube_video_finder_using_geminillm
Aug 7, 2025
Merged
Changes from 5 commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
36f5f0e
Create app.py
veerababu1729 a6901bf
Delete Youtube_video_frinder_using_GeminiLLM directory
veerababu1729 afb481b
Create app.py
veerababu1729 658ae5b
Add files via upload
veerababu1729 5a46925
updated readme with proper, correct explanation and workflow
veerababu1729 f991a1f
Update Youtube_video_finder_using_geminillm/README.md
veerababu1729 8aa32e4
Update Youtube_video_finder_using_geminillm/README.md
veerababu1729 6bb5880
Update Youtube_video_finder_using_geminillm/app.py
veerababu1729 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
"This is a Open source project initiated by me -you can fork, edit, and make a pull request" | ||
|
||
````markdown | ||
# YouTube Relevance Finder with Gemini AI | ||
|
||
This Python script searches YouTube for recent videos based on a user query and ranks them by relevance using Google's Gemini AI model and Youtube API. It filters results by duration and recency, scores video titles for relevance, and returns the top-ranked videos. | ||
|
||
## 🔍 Features | ||
|
||
- Searches YouTube for videos from the past 14 days using Youtube API which is publicly available. | ||
- Filters videos by duration (4–20 minutes) | ||
- Uses Gemini AI to score title relevance to a query | ||
- Prints the top relevant video links with scores and metadata | ||
|
||
## 🛠️ Setup | ||
|
||
1. **Clone the repository**: | ||
```bash | ||
git clone https://github.com/yourusername/your-repo-name.git | ||
cd your-repo-name | ||
```` | ||
|
||
2. **Install dependencies**: | ||
|
||
```bash | ||
pip install google-api-python-client google-generativeai | ||
``` | ||
|
||
3. **Set up environment variables**: | ||
Create a `.env` file or export in terminal: | ||
|
||
```bash | ||
export YT_API_KEY=your_youtube_api_key | ||
export GEMINI_API_KEY=your_gemini_api_key | ||
``` | ||
|
||
## 🚀 Usage | ||
|
||
Run the script: | ||
|
||
```bash | ||
python your_script_name.py | ||
``` | ||
|
||
You'll be prompted to enter a search query. The script will then display a list of the top relevant YouTube videos based on that query. | ||
|
||
## 📄 Example Output | ||
|
||
``` | ||
1. | ||
• Title: Learn Python in 10 Minutes | ||
• URL: https://youtu.be/xyz123 | ||
• Score: 9.2 | ||
• Duration: 10m30s | ||
• Published: 2025-05-01T12:34:56Z | ||
``` | ||
|
||
## 📌 Notes | ||
|
||
* Make sure you have valid API keys for both YouTube Data API v3 and Google Gemini. | ||
* The script currently uses the `gemini-1.5-flash-latest` model. | ||
|
||
## 📃 License | ||
|
||
Open source – feel free to use and modify | ||
veerababu1729 marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,180 @@ | ||
import os | ||
import datetime | ||
from googleapiclient.discovery import build | ||
import google.generativeai as genai | ||
|
||
# ——— CONFIG ——— | ||
# Initialize clients with environment variables | ||
yt = build("youtube", "v3", developerKey=os.environ["YT_API_KEY"]) | ||
|
||
# Configure the Google Generative AI client | ||
genai.configure(api_key=os.environ["GEMINI_API_KEY"]) | ||
|
||
# Initialize the Gemini model | ||
model = genai.GenerativeModel('gemini-1.5-flash-latest') | ||
|
||
|
||
def search_videos(query, max_filtered_results=20): | ||
""" | ||
Search for YouTube videos matching a query, filtering by recency and duration. | ||
|
||
This function keeps searching until it finds enough videos that meet the criteria | ||
or exhausts the search results. | ||
""" | ||
# Calculate publishedAfter timestamp (14 days ago) | ||
fourteen_days_ago = (datetime.datetime.utcnow() | ||
- datetime.timedelta(days=14)).isoformat("T") + "Z" | ||
|
||
filtered_videos = [] | ||
next_page_token = None | ||
page_count = 0 | ||
max_pages = 5 # Limit the number of pages to search to avoid excessive API calls | ||
|
||
# Continue searching until we have enough filtered videos or run out of results | ||
while len(filtered_videos) < max_filtered_results and page_count < max_pages: | ||
# Step 1: Search for videos matching the query | ||
search_response = yt.search().list( | ||
q=query, | ||
part="id,snippet", | ||
type="video", | ||
order="relevance", | ||
publishedAfter=fourteen_days_ago, | ||
maxResults=50, # Maximum allowed by the API | ||
pageToken=next_page_token | ||
).execute() | ||
|
||
page_count += 1 | ||
|
||
# Step 2: Collect video IDs from this page | ||
video_ids = [item["id"]["videoId"] for item in search_response.get("items", [])] | ||
|
||
# Break if no more videos found | ||
if not video_ids: | ||
break | ||
|
||
# Step 3: Get details for the fetched videos | ||
details = yt.videos().list( | ||
part="contentDetails,snippet", | ||
id=",".join(video_ids) | ||
).execute() | ||
|
||
# Step 4: Filter by duration (4–20 minutes) | ||
for item in details.get("items", []): | ||
try: | ||
# Parse duration (ISO 8601 format, e.g. "PT5M30S") | ||
dur = item["contentDetails"]["duration"].replace("PT","") | ||
|
||
# Skip videos with hours or without minutes | ||
if "H" in dur or "M" not in dur: | ||
continue | ||
|
||
# Split minutes and seconds | ||
parts = dur.split("M") | ||
mins = int(parts[0]) | ||
secs = parts[1].replace("S","") if len(parts) > 1 else "0" | ||
seconds = int(secs) if secs else 0 | ||
|
||
total_seconds = mins * 60 + seconds | ||
|
||
# Filter by duration (4 to 20 minutes inclusive) | ||
if 4 * 60 <= total_seconds <= 20 * 60: | ||
filtered_videos.append({ | ||
"id": item["id"], | ||
"title": item["snippet"]["title"], | ||
"duration": total_seconds, | ||
"publishedAt": item["snippet"]["publishedAt"] | ||
}) | ||
|
||
# If we've found enough videos, we can stop | ||
if len(filtered_videos) >= max_filtered_results: | ||
break | ||
except Exception as e: | ||
print(f"Could not parse duration for video {item.get('id', 'N/A')}: {e}") | ||
continue | ||
|
||
# Check if there are more pages of results | ||
next_page_token = search_response.get("nextPageToken") | ||
if not next_page_token: | ||
break | ||
|
||
print(f"Found {len(filtered_videos)} qualifying videos so far. Searching next page...") | ||
|
||
print(f"Search completed. Found {len(filtered_videos)} videos meeting criteria.") | ||
return filtered_videos | ||
|
||
|
||
def score_title(title, query): | ||
"""Score a video title's relevance to the query using Gemini AI.""" | ||
prompt = ( | ||
f"Query: {query}\n" | ||
f"Title: {title}\n" | ||
"Rate relevance & quality 1–10 (just give the number)." | ||
) | ||
try: | ||
response = model.generate_content(prompt) | ||
score_text = response.text.strip() | ||
# Try to extract just the number if there's additional text | ||
import re | ||
match = re.search(r'\b([0-9]|10)(\.[0-9]+)?\b', score_text) | ||
if match: | ||
score = float(match.group(0)) | ||
else: | ||
score = float(score_text) | ||
return score | ||
except ValueError: | ||
print(f"Model returned non-numeric score for '{title}': '{score_text}'") | ||
return 5.0 # Default middle score instead of 0 | ||
except Exception as e: | ||
print(f"Error scoring title '{title}': {e}") | ||
if 'response' in locals() and hasattr(response, 'text'): | ||
print(f"API response text: {response.text}") | ||
return 5.0 # Default middle score | ||
|
||
|
||
def pick_best(query, num_results=20): | ||
""" | ||
Find and score the best YouTube videos for a query. | ||
|
||
Args: | ||
query: Search query string | ||
num_results: Number of top videos to return | ||
""" | ||
# Get more videos than we need to ensure we have enough after scoring | ||
vids = search_videos(query, max_filtered_results=max(30, num_results * 1.5)) | ||
|
||
if not vids: | ||
print("No suitable videos found after applying filters.") | ||
return | ||
|
||
# Score each video | ||
print(f"Scoring {len(vids)} videos...") | ||
for i, v in enumerate(vids): | ||
v["score"] = score_title(v["title"], query) | ||
print(f" Scored video {i+1}/{len(vids)}: '{v['title']}' - Score: {v['score']:.2f}") | ||
|
||
# Sort by score in descending order | ||
vids.sort(key=lambda x: x.get("score", 0.0), reverse=True) | ||
|
||
# Print the top num_results | ||
result_count = min(num_results, len(vids)) | ||
print(f"\n--- Top {result_count} Relevant Videos ---") | ||
|
||
for i, video in enumerate(vids[:num_results]): | ||
print(f"\n{i+1}.") | ||
print(f" • Title: {video.get('title', 'N/A')}") | ||
print(f" • URL: https://youtu.be/{video.get('id', 'N/A')}") | ||
print(f" • Score: {video.get('score', 0.0):.2f}") | ||
duration_sec = video.get('duration', 0) | ||
print(f" • Duration: {duration_sec // 60}m{duration_sec % 60:02d}s") | ||
print(f" • Published: {video.get('publishedAt', 'N/A')}") | ||
|
||
|
||
# —— RUN IT! —— | ||
if __name__ == "__main__": | ||
# Check if API keys are set | ||
if "YT_API_KEY" not in os.environ or "GEMINI_API_KEY" not in os.environ: | ||
print("Error: YouTube and/or Gemini API keys not set in environment variables.") | ||
else: | ||
user_query = input("Enter your search (voice-to-text or text): ") | ||
# Call pick_best with the desired number of results | ||
pick_best(user_query, num_results=20) | ||
veerababu1729 marked this conversation as resolved.
Show resolved
Hide resolved
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.