Enterprise-grade web scraper with advanced anti-detection capabilities
A sophisticated web scraping tool that uses cutting-edge TLS fingerprinting, human behavior simulation, and intelligent evasion techniques to bypass modern anti-bot systems. Built with Go for high performance and reliability.
- 📋 Complete Command Reference
- 🏗️ Architecture Overview
- 🎯 Feature Implementation Status
- 📚 Enhanced Configuration Reference
- 🛡️ Advanced TLS Fingerprinting: uTLS library with Chrome, Firefox, Safari, Edge signatures
- 🌐 HTTP/2 Support: Full HTTP/2 transport with enhanced authenticity
- 🧠 JavaScript Engine: Headless browser automation with chromedp
- 🎭 Behavioral Simulation: Human-like interaction patterns with 4 behavior types
- 🔄 Intelligent Proxy Management: Health monitoring and smart rotation
- 🧩 CAPTCHA Integration: Multi-service solving with 4 major providers
- 🚄 Concurrent Processing: Worker pools with intelligent request management
- 💾 Memory Optimization: Automatic garbage collection and resource management
- 📋 Priority Queueing: Smart request prioritization and load balancing
- 🔗 Connection Pooling: HTTP connection reuse for optimal performance
- 💻 Comprehensive CLI: 45+ configuration flags for complete control
- 📊 Real-time Monitoring: Performance, memory, and queue statistics
- 🔧 Flexible Configuration: JSON headers, file-based data, proxy rotation
- 📈 Detailed Analytics: Success rates, latency tracking, error analysis
# Clone the repository
git clone https://github.com/AbdullahSaidAbdeaaziz/anti-bot-scraper.git
cd anti-bot-scraper
# Build the scraper
go build -o bin/scraper.exe ./cmd/scraper
# Test basic functionality
./bin/scraper.exe -url https://httpbin.org/headers -verbose# Simple GET request with Chrome fingerprint
./bin/scraper.exe -url https://httpbin.org/headers
# POST request with custom data
./bin/scraper.exe -url https://httpbin.org/post -method POST -data '{"test":"data"}'
# Advanced browser simulation with JavaScript
./bin/scraper.exe -url https://example.com -browser firefox -enable-js -js-mode behavior
# High-performance concurrent scraping
./bin/scraper.exe -url https://httpbin.org/headers -enable-concurrent -max-concurrent 20 -show-performance-stats# Single URL processing
./scraper -url https://httpbin.org/headers -verbose
# Multiple URLs from file
./scraper -urls-file examples/urls.txt -num-requests 3 -verbose
# Multiple requests per URL with delays
./scraper -url https://httpbin.org/ip -num-requests 5 \
-delay-min 1s -delay-max 3s -delay-randomize# Fixed TLS profile
./scraper -url https://httpbin.org/headers -tls-profile chrome -verbose
# Randomized TLS profiles across requests
./scraper -urls-file examples/urls.txt -tls-randomize -num-requests 3 -verbose
# TLS randomization with specific timing
./scraper -url https://httpbin.org/headers -tls-randomize \
-num-requests 10 -delay-min 500ms -delay-max 2s# Automatic header profile matching TLS fingerprint
./scraper -url https://httpbin.org/headers -header-mimicry \
-header-profile auto -enable-sec-headers -verbose
# Custom header configuration
./scraper -url https://httpbin.org/headers -header-mimicry \
-header-profile firefox -accept-language "en-US,en;q=0.5" \
-accept-encoding "gzip, deflate, br" -enable-sec-headers=false
# Override User-Agent with custom value
./scraper -url https://httpbin.org/user-agent \
-custom-user-agent "Mozilla/5.0 (Custom Browser)" -verbose# Session-based cookie persistence
./scraper -url https://httpbin.org/cookies/set/test/value \
-cookie-jar -cookie-persistence session -verbose
# Proxy-based cookie isolation
./scraper -url https://httpbin.org/cookies \
-cookie-jar -cookie-persistence proxy -proxy-file examples/proxies.txt
# Advanced redirect handling
./scraper -url https://httpbin.org/redirect/5 \
-follow-redirects -max-redirects 10 -redirect-timeout 30s -verbose
# Cookie file persistence
./scraper -url https://httpbin.org/cookies/set/persistent/data \
-cookie-file session.cookies -cookie-jar -verbose# Load proxies from file with round-robin rotation
./scraper -urls-file examples/urls.txt -proxy-file examples/proxies.txt \
-num-requests 5 -verbose
# Combined with enhanced evasion features
./scraper -url https://httpbin.org/ip -proxy-file examples/proxies.txt \
-tls-randomize -header-mimicry -num-requests 3 \
-delay-min 1s -delay-max 3s -verbose# All enhanced features combined
./scraper -urls-file examples/urls.txt \
-num-requests 2 \
-tls-randomize \
-header-mimicry \
-header-profile auto \
-enable-sec-headers \
-cookie-jar \
-cookie-persistence session \
-follow-redirects \
-max-redirects 5 \
-delay-min 800ms \
-delay-max 2500ms \
-delay-randomize \
-proxy-file examples/proxies.txt \
-accept-language "en-US,en;q=0.9" \
-output json \
-verbose# HTTP/2 with Chrome fingerprint
./bin/scraper.exe -url https://httpbin.org/headers -http-version 2 -browser chrome
# Automatic protocol selection
./bin/scraper.exe -url https://httpbin.org/headers -http-version auto -verbose# Standard JavaScript execution
./bin/scraper.exe -url https://example.com -enable-js -js-mode standard
# Human behavior simulation
./bin/scraper.exe -url https://example.com -enable-js -js-mode behavior -viewport 1920x1080
# Wait for specific elements
./bin/scraper.exe -url https://example.com -enable-js -js-mode wait-element -js-wait-selector ".content"# Normal human behavior
./bin/scraper.exe -url https://example.com -enable-behavior -behavior-type normal
# Cautious browsing pattern
./bin/scraper.exe -url https://example.com -enable-behavior -behavior-type cautious -show-behavior-info
# Random behavior with custom timing
./bin/scraper.exe -url https://example.com -enable-behavior -behavior-type random \
-behavior-min-delay 800ms -behavior-max-delay 3s -enable-random-activity# Health-aware proxy rotation
./bin/scraper.exe -url https://httpbin.org/ip \
-proxies "proxy1:8080,proxy2:8080,proxy3:8080" \
-proxy-rotation health-aware -enable-proxy-health -show-proxy-metrics
# Custom health monitoring
./bin/scraper.exe -url https://httpbin.org/ip -proxies "proxy1:8080,proxy2:8080" \
-enable-proxy-health -proxy-health-interval 2m -proxy-health-timeout 5s# 2captcha integration
./bin/scraper.exe -url https://example.com/captcha \
-enable-captcha -captcha-service 2captcha -captcha-api-key "YOUR_API_KEY" -show-captcha-info
# Multiple service support
./bin/scraper.exe -url https://example.com/recaptcha \
-enable-captcha -captcha-service anticaptcha -captcha-api-key "YOUR_KEY" \
-captcha-min-score 0.7 -captcha-timeout 3m# Enterprise-grade concurrent scraping
./bin/scraper.exe -url https://httpbin.org/headers \
-enable-concurrent -max-concurrent 50 -worker-pool-size 10 \
-requests-per-second 15 -show-performance-stats
# Memory-optimized processing
./bin/scraper.exe -url https://httpbin.org/json \
-enable-concurrent -enable-memory-optimization -max-memory-mb 256 \
-enable-intelligent-queue -show-memory-stats
# Full performance suite
./bin/scraper.exe -url https://httpbin.org/headers \
-enable-concurrent -enable-memory-optimization -enable-intelligent-queue \
-max-concurrent 30 -worker-pool-size 8 -max-memory-mb 512 \
-connection-pool-size 50 -show-performance-stats -show-memory-stats| Flag | Description | Default | Example |
|---|---|---|---|
-url |
Target URL (required) | - | https://example.com |
-browser |
Browser fingerprint | chrome |
chrome, firefox, safari, edge |
-method |
HTTP method | GET |
GET, POST |
-data |
POST data (JSON or @file) | - | '{"key":"value"}' or @data.json |
-headers |
Custom headers (JSON or @file) | - | '{"Auth":"Bearer token"}' |
-output |
Output format | text |
text, json |
| Flag | Description | Default | Example |
|---|---|---|---|
-http-version |
HTTP protocol version | 1.1 |
1.1, 2, auto |
-user-agent |
Custom User-Agent | - | "Custom Bot 1.0" |
-timeout |
Request timeout | 30s |
60s, 2m |
-retries |
Number of retries | 3 |
5 |
-rate-limit |
Rate limit between requests | 1s |
500ms, 2s |
-follow-redirect |
Follow redirects | true |
true, false |
| Flag | Description | Default | Example |
|---|---|---|---|
-proxy |
Single proxy URL | - | http://proxy:8080 |
-proxies |
Multiple proxies (comma-separated) | - | proxy1:8080,proxy2:8080 |
-proxy-rotation |
Rotation mode | per-request |
per-request, on-error, health-aware |
-enable-proxy-health |
Enable health monitoring | false |
true |
-proxy-health-interval |
Health check interval | 5m |
2m, 10m |
-proxy-health-timeout |
Health check timeout | 10s |
5s, 30s |
-proxy-health-test-url |
Health test URL | https://httpbin.org/ip |
Custom URL |
-proxy-max-failures |
Max failures before disable | 3 |
5 |
-show-proxy-metrics |
Show proxy statistics | false |
true |
| Flag | Description | Default | Example |
|---|---|---|---|
-enable-js |
Enable JavaScript engine | false |
true |
-js-mode |
JavaScript execution mode | standard |
standard, behavior, wait-element |
-js-code |
Custom JavaScript code | - | "console.log('test')" |
-js-timeout |
JavaScript timeout | 30s |
60s |
-js-wait-selector |
CSS selector to wait for | - | ".content", "#result" |
-headless |
Run browser headless | true |
true, false |
-no-images |
Disable image loading | false |
true |
-viewport |
Browser viewport size | 1920x1080 |
1366x768 |
| Flag | Description | Default | Example |
|---|---|---|---|
-enable-behavior |
Enable behavior simulation | false |
true |
-behavior-type |
Behavior pattern | normal |
normal, cautious, aggressive, random |
-behavior-min-delay |
Minimum action delay | 500ms |
300ms |
-behavior-max-delay |
Maximum action delay | 2s |
5s |
-enable-mouse-movement |
Enable mouse simulation | true |
true, false |
-enable-scroll-simulation |
Enable scroll simulation | true |
true, false |
-enable-typing-delay |
Enable typing delays | true |
true, false |
-enable-random-activity |
Enable random interactions | false |
true |
-show-behavior-info |
Show behavior details | false |
true |
| Flag | Description | Default | Example |
|---|---|---|---|
-enable-captcha |
Enable CAPTCHA solving | false |
true |
-captcha-service |
CAPTCHA service provider | 2captcha |
2captcha, deathbycaptcha, anticaptcha, capmonster |
-captcha-api-key |
Service API key | - | "your-api-key-here" |
-captcha-timeout |
Solving timeout | 5m |
3m, 10m |
-captcha-poll-interval |
Solution polling interval | 5s |
3s, 10s |
-captcha-max-retries |
Max solving retries | 3 |
5 |
-captcha-min-score |
Min reCAPTCHA v3 score | 0.3 |
0.7 |
-show-captcha-info |
Show CAPTCHA details | false |
true |
| Flag | Description | Default | Example |
|---|---|---|---|
-enable-concurrent |
Enable concurrent processing | false |
true |
-max-concurrent |
Maximum concurrent requests | 10 |
50 |
-worker-pool-size |
Number of worker goroutines | 5 |
20 |
-requests-per-second |
Rate limit (RPS) | 5.0 |
15.0 |
-connection-pool-size |
HTTP connection pool size | 20 |
100 |
-max-idle-conns |
Maximum idle connections | 10 |
50 |
-idle-conn-timeout |
Idle connection timeout | 90s |
2m |
-queue-size |
Request queue buffer size | 1000 |
5000 |
-show-performance-stats |
Show performance metrics | false |
true |
| Flag | Description | Default | Example |
|---|---|---|---|
-enable-memory-optimization |
Enable memory management | false |
true |
-max-memory-mb |
Maximum memory usage (MB) | 512 |
1024 |
-enable-intelligent-queue |
Enable priority queueing | false |
true |
-show-memory-stats |
Show memory statistics | false |
true |
| Flag | Description | Default | Example |
|---|---|---|---|
-verbose |
Verbose output | false |
true |
-show-headers |
Show response headers | false |
true |
-version |
Show version information | false |
true |
┌─────────────────────────────────────────────────────────────────┐
│ Anti-Bot Scraper Architecture │
├─────────────────────────────────────────────────────────────────┤
│ CLI Interface (45+ Configuration Flags) │
├─────────────────────────────────────────────────────────────────┤
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ TLS Fingerprint │ │ JavaScript │ │ Behavior │ │
│ │ • uTLS Library │ │ • chromedp │ │ • Human Actions │ │
│ │ • HTTP/2 │ │ • 3 Exec Modes │ │ • 4 Patterns │ │
│ │ • 4 Browsers │ │ • Element Wait │ │ • Timing Sim │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Proxy Management│ │ CAPTCHA Solver │ │ Performance │ │
│ │ • Health Monitor│ │ • 4 Services │ │ • Worker Pools │ │
│ │ • Smart Rotation│ │ • 7 Types │ │ • Memory Opt │ │
│ │ • Failover │ │ • Auto Detect │ │ • Conn Pooling │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ Core HTTP Engine (uTLS + HTTP/2 + Connection Pooling) │
└─────────────────────────────────────────────────────────────────┘
- 🛡️ TLS Fingerprinting: Advanced uTLS with 4 browser signatures
- 🌐 HTTP/2 Support: Full HTTP/2 transport with automatic fallback
- 🧠 JavaScript Engine: Complete chromedp integration with 3 execution modes
- 🎭 Behavioral Simulation: 4 behavior types with realistic human patterns
- 🔄 Proxy Management: Health monitoring with intelligent rotation
- 🧩 CAPTCHA Integration: 4 services supporting 7 CAPTCHA types
- ⚡ Concurrent Processing: Worker pools with intelligent request management
- 💾 Memory Optimization: Automatic garbage collection and resource management
- 📋 Priority Queueing: Smart request prioritization and load balancing
- 🔗 Connection Pooling: HTTP connection reuse for optimal performance
- 💻 Professional CLI: 45+ configuration flags for complete control
- 📊 Real-time Monitoring: Performance, memory, and queue statistics
- 🎨 Dynamic Fingerprints: Browser version rotation
- 🌍 Geolocation Awareness: IP-based country detection
- 📊 Web Dashboard: Real-time monitoring interface
- 💾 Database Integration: Result persistence and querying
- 🔧 Config Files: YAML/TOML configuration support
# Monitor product prices with human-like behavior
./bin/scraper.exe -url "https://shop.example.com/product/123" \
-enable-js -js-mode behavior -enable-behavior -behavior-type cautious \
-browser firefox -http-version 2 -verbose# Extract content with JavaScript execution and CAPTCHA solving
./bin/scraper.exe -url "https://social.example.com/posts" \
-enable-js -js-mode wait-element -js-wait-selector ".post-content" \
-enable-captcha -captcha-service 2captcha -captcha-api-key "YOUR_KEY" \
-enable-behavior -behavior-type normal# High-performance concurrent data collection
./bin/scraper.exe -url "https://api.example.com/data" \
-enable-concurrent -max-concurrent 100 -worker-pool-size 25 \
-requests-per-second 50 -enable-memory-optimization -max-memory-mb 1024 \
-proxies "proxy1:8080,proxy2:8080,proxy3:8080" -proxy-rotation health-aware# Track product availability with intelligent queueing
./bin/scraper.exe -url "https://store.example.com/inventory" \
-enable-concurrent -enable-intelligent-queue -max-concurrent 20 \
-enable-js -js-mode standard -browser chrome -http-version auto \
-show-performance-stats -show-memory-stats# Headers from file
echo '{"Authorization":"Bearer token","Custom-Header":"value"}' > headers.json
./bin/scraper.exe -url https://api.example.com -headers @headers.json
# POST data from file
echo '{"user":"admin","action":"login"}' > data.json
./bin/scraper.exe -url https://api.example.com/login -method POST -data @data.json# Health-aware rotation with custom settings
./bin/scraper.exe -url https://httpbin.org/ip \
-proxies "proxy1.example.com:8080,proxy2.example.com:8080,proxy3.example.com:8080" \
-proxy-rotation health-aware -enable-proxy-health \
-proxy-health-interval 30s -proxy-health-timeout 5s \
-proxy-max-failures 2 -show-proxy-metrics# Complete browser simulation with all features
./bin/scraper.exe -url "https://example.com" \
-browser firefox -http-version 2 -enable-js -js-mode behavior \
-enable-behavior -behavior-type normal -enable-mouse-movement \
-enable-scroll-simulation -enable-typing-delay -viewport 1366x768 \
-behavior-min-delay 800ms -behavior-max-delay 3s# Benchmark concurrent vs sequential performance
# Sequential (baseline)
time ./bin/scraper.exe -url https://httpbin.org/delay/1
# Concurrent (10x faster)
time ./bin/scraper.exe -url https://httpbin.org/delay/1 \
-enable-concurrent -max-concurrent 10 -show-performance-stats# Monitor memory usage during high-volume operations
./bin/scraper.exe -url https://httpbin.org/json \
-enable-concurrent -max-concurrent 50 -enable-memory-optimization \
-max-memory-mb 256 -show-memory-stats -verbose# Test TLS fingerprinting detection
./bin/scraper.exe -url https://tls.browserleaks.com/json -browser chrome -verbose
# Test HTTP/2 support
./bin/scraper.exe -url https://http2.akamai.com/demo -http-version 2 -verbose
# Test proxy functionality
./bin/scraper.exe -url https://httpbin.org/ip -proxy "proxy.example.com:8080" -verbose
# Test JavaScript capabilities
./bin/scraper.exe -url https://httpbin.org/html -enable-js -js-mode standard -verbose# Comprehensive performance testing
./bin/scraper.exe -url https://httpbin.org/headers \
-enable-concurrent -enable-memory-optimization -enable-intelligent-queue \
-show-performance-stats -show-memory-stats -verbose \
-max-concurrent 25 -worker-pool-size 10 -requests-per-second 20-
CAPTCHA Solving Failures
# Check API key and service status ./bin/scraper.exe -url https://example.com -enable-captcha \ -captcha-service 2captcha -captcha-api-key "YOUR_KEY" \ -show-captcha-info -verbose
-
Proxy Connection Issues
# Test proxy health and rotation ./bin/scraper.exe -url https://httpbin.org/ip \ -proxies "proxy1:8080,proxy2:8080" -enable-proxy-health \ -show-proxy-metrics -verbose
-
JavaScript Execution Problems
# Debug JavaScript issues ./bin/scraper.exe -url https://example.com -enable-js \ -js-mode standard -headless=false -verbose -
Memory Usage Optimization
# Monitor and optimize memory usage ./bin/scraper.exe -url https://example.com -enable-concurrent \ -enable-memory-optimization -max-memory-mb 256 \ -show-memory-stats -verbose
- Respect robots.txt: Always check and follow website policies
- Rate Limiting: Use appropriate delays to avoid overwhelming servers
- Terms of Service: Ensure compliance with website terms
- Legal Compliance: Follow applicable laws and regulations
- API Keys: Store CAPTCHA service keys securely
- Proxy Authentication: Use secure proxy credentials
- Data Handling: Protect scraped data according to privacy laws
- Network Security: Use VPNs and secure connections when necessary
- uTLS: Advanced TLS fingerprinting library
- chromedp: Chrome DevTools Protocol for JavaScript execution
- golang.org/x/net: Extended networking libraries
- Go 1.21+: Modern Go runtime with HTTP/2 support
- Operating System: Windows, macOS, Linux
- Memory: Minimum 512MB RAM (2GB+ recommended for concurrent processing)
- Network: Stable internet connection
- Chrome/Chromium: Required for JavaScript engine functionality
# Verify Go version
go version # Should be 1.21 or higher
# Install Chrome/Chromium (for JavaScript engine)
# Windows: Download from google.com/chrome
# macOS: brew install --cask google-chrome
# Linux: apt-get install chromium-browser
# Build the scraper
go build -o bin/scraper.exe ./cmd/scraper| Flag | Description | Default | Example |
|---|---|---|---|
-url |
Single URL to scrape | - | https://example.com |
-urls-file |
File containing multiple URLs | - | examples/urls.txt |
-num-requests |
Number of requests per URL | 1 |
5 |
-proxy-file |
File containing proxy list | - | examples/proxies.txt |
| Flag | Description | Default | Options |
|---|---|---|---|
-tls-profile |
TLS profile for fingerprinting | chrome |
chrome, firefox, safari, edge |
-tls-randomize |
Randomize TLS profiles across requests | false |
true, false |
-browser |
Browser fingerprint (legacy) | chrome |
chrome, firefox, safari, edge |
| Flag | Description | Default | Example |
|---|---|---|---|
-delay-min |
Minimum delay between requests | 1s |
500ms, 2s |
-delay-max |
Maximum delay between requests | 3s |
1s, 5s |
-delay-randomize |
Randomize delays within range | true |
true, false |
-rate-limit |
Rate limit between requests | 1s |
500ms, 2s |
| Flag | Description | Default | Options |
|---|---|---|---|
-header-mimicry |
Enable browser-consistent headers | true |
true, false |
-header-profile |
Header profile to use | auto |
auto, chrome, firefox, safari, edge |
-custom-user-agent |
Custom User-Agent override | - | Custom string |
-enable-sec-headers |
Include security headers | true |
true, false |
-accept-language |
Accept-Language header value | auto |
en-US,en;q=0.9 |
-accept-encoding |
Accept-Encoding header value | auto |
gzip, deflate, br |
| Flag | Description | Default | Options |
|---|---|---|---|
-cookie-jar |
Enable in-memory cookie jar | true |
true, false |
-cookie-persistence |
Cookie persistence mode | session |
session, proxy, none |
-cookie-file |
File to save/load cookies | - | session.cookies |
-clear-cookies |
Clear cookies before each request | false |
true, false |
| Flag | Description | Default | Options |
|---|---|---|---|
-follow-redirects |
Follow HTTP redirects | true |
true, false |
-max-redirects |
Maximum redirects to follow | 10 |
5, 20 |
-redirect-timeout |
Timeout for redirect chains | 30s |
10s, 60s |
https://httpbin.org/headers
https://httpbin.org/user-agent
https://httpbin.org/ip
# Comments start with #
https://httpbin.org/get
http://proxy1.example.com:8080
http://user:pass@proxy2.example.com:3128
socks5://proxy3.example.com:1080
# HTTP and SOCKS5 proxies supported
socks5://user:pass@proxy4.example.com:1080
# Single URL with enhanced features
./scraper -url https://httpbin.org/headers -header-mimicry -verbose
# Multiple URLs with randomization
./scraper -urls-file examples/urls.txt -tls-randomize -num-requests 2# Comprehensive evasion setup
./scraper -urls-file examples/urls.txt \
-num-requests 3 \
-tls-randomize \
-header-mimicry \
-header-profile auto \
-delay-min 1s \
-delay-max 3s \
-delay-randomize \
-cookie-jar \
-follow-redirects \
-proxy-file examples/proxies.txt \
-verboseMIT License - see LICENSE file for details.
- uTLS Team: For the excellent TLS fingerprinting library
- chromedp Team: For browser automation capabilities
- Go Community: For robust networking and HTTP libraries
- Security Researchers: For insights into anti-bot detection methods
Built with ❤️ for ethical web scraping and security research
Last Updated: July 26, 2025