This project is an automated web scraping solution to extract FIFA World Rankings and team statistics from the official FIFA website.
The goal is to collect, clean, and store football data for analysis, visualization, or research purposes.
Key Highlights:
- Automated extraction of FIFA rankings and team information
- Data cleaning and structured storage using Pandas
- Integration of Selenium and BeautifulSoup for dynamic content
- Exporting data to CSV for easy analysis
Technology | Purpose |
---|---|
Python π | Scripting and data handling |
Selenium β‘ | Browser automation for dynamic content |
BeautifulSoup π² | HTML parsing and data extraction |
Pandas π | Data structuring and CSV export |
ChromeDriver π | Browser control for Selenium |
Jupyter Notebook π | Development and testing environment |
Manual extraction of FIFA rankings is time-consuming and prone to errors.
This project automates the process to:
- Collect FIFA World Rankings
- Capture team names, ranks, points, and country codes
- Export the data in a structured format for analysis
- Use Selenium to open and interact with FIFAβs dynamic pages
- Wait for tables to fully load before parsing
- Use BeautifulSoup to extract:
- Team names π·οΈ
- Ranking positions π₯π₯π₯
- Points and statistics π
import time
import pandas as pd
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from bs4 import BeautifulSoup
service = Service("path_to_chromedriver.exe")
options = webdriver.ChromeOptions()
options.add_argument("--start-maximized")
driver = webdriver.Chrome(service=service, options=options)
url = "https://www.fifa.com/fifa-world-ranking/"
driver.get(url)
time.sleep(5) # Wait for dynamic content to load
soup = BeautifulSoup(driver.page_source, "html.parser")
teams = []
for row in soup.find_all("tr", class_="ranking-row"):
rank = row.find("td", class_="rank").text.strip()
team = row.find("td", class_="team-name").text.strip()
points = row.find("td", class_="points").text.strip()
teams.append([rank, team, points])
df = pd.DataFrame(teams, columns=["Rank", "Team", "Points"])
df.to_csv("FIFA_Rankings.csv", index=False)
Challenge | Solution |
---|---|
Dynamic content loading | Added time.sleep() and Selenium waits |
Complex HTML structure | Used browser inspect tools to locate elements |
Missing data | Added checks to skip empty rows |
Large dataset | Stored results in CSV for structured analysis |
- Successfully scraped all FIFA-ranked teams β
- Data exported to FIFA_Rankings.csv π
Rank | Team | Points |
---|---|---|
1 | Argentina | 1841 |
2 | France | 1827 |
3 | Brazil | 1818 |
4 | Belgium | 1778 |
5 | England | 1769 |
- Schedule automatic scraping for real-time updates
- Visualize rankings using matplotlib or seaborn π
- Store historical data in a database for trend analysis
- Extend scraping to include team stats, goals, and player rankings
- Hands-on experience with Selenium and BeautifulSoup integration
- Understanding dynamic web content and HTML parsing
- Improved Python, automation, and data handling skills
- Learned to handle real-world web scraping challenges
This project demonstrates the ability to automate FIFA ranking extraction, producing structured datasets for analysis or reporting. It showcases skills in Python programming, web scraping, and data management, useful for sports analytics, data science, and research projects.