Comprehensive scraper and analyzer for Model Context Protocol (MCP) servers.
- Comprehensive Data Collection: Scrapes repository metadata, documentation, tools, and technical details
- LLM-Enhanced Analysis: Uses Ollama for intelligent content analysis and classification
- Multiple Output Formats: JSON, CSV, and summary statistics
- Rate Limiting: Respectful scraping with built-in delays
- GitHub API Integration: Enhanced metadata collection
pip install -r requirements.txtpython mcp_scraper.pypython mcp_scraper.py --no-ollamapython mcp_scraper.py --output my_data/The scraper extracts the following information for each MCP server:
- Name and repository URL
- Description and category (reference/third-party/official)
- Stars, forks, last updated
- Programming language and license
- Author/organization
- Available tools and descriptions
- Installation type (local/API/both)
- API key requirements
- Free vs paid status
- Target platform/service
- README content analysis
- Examples and documentation availability
- Configuration requirements
mcp_servers.json: Complete dataset in JSON formatmcp_servers.csv: Tabular data for analysissummary_stats.json: Aggregate statistics and insights
The tool uses Ollama (llama3 model) to:
- Classify installation types
- Detect API requirements
- Identify pricing models
- Extract technical dependencies
- Analyze tool functionality
Ensure Ollama is installed and the llama3 model is available:
ollama pull llama3