Tired of manually browsing Amazon for the best deals? 🌐 Meet Amazon Product Search — your trusty Python library to scrape product details from Amazon's search results with just a few lines of code. Powered by BeautifulSoup4 (bs4), Requests, and multithreading for speed, this library helps you efficiently gather product titles, prices, reviews, images, and direct links. 🎉
- Product Search: Search for products by name, type, brand, and price range. 📱💻
- Detailed Data: Scrape titles, prices, reviews, images, and URLs. 🎯
- Fast and Efficient: Uses multithreading to speed up data extraction.
- Easy-to-use: Simple API for quick integration. ✨
Get started with Amazon Product Search by installing it via PyPI or GitHub.
The easiest way to install the library is using pip from PyPI:
pip install amazon-product-search-v2This installs the latest stable release. Note the package name is now amazon-product-search-v2.
If you want the very latest development version (which may have new features or bug fixes, but could also be less stable), clone the repository and install it in editable mode:
git clone --depth 1 https://github.com/ManojPanda3/amazon-product-search
cd amazon-product-search
pip install -e .This allows you to modify the code and have the changes immediately reflected without reinstalling.
First, import the Amazon class from the amazon_product_search module:
from amazon_product_search import AmazonImportant: Even though the package name on PyPI is amazon-product-search-v2, you still import the module as amazon_product_search. The code inside the package hasn't changed its import paths.
The core functionality is provided by the Amazon class.
amazon = Amazon(is_debuging=False) # Set is_debuging to True for verbose outputresults = amazon.search(productName="iPhone", productType="electronics", brand="Apple", priceRange="80000-100000")Parameters:
productName(str, required): The search term (e.g., "iPhone", "laptop").productType(str, optional): Filters by product type (e.g., "electronics", "books").brand(str, optional): Filters by brand (e.g., "Apple", "Samsung").priceRange(str, optional): Filters by price range using the format "min_price-max_price" (e.g., "100-200").
Returns:
list[dict]: A list of dictionaries, where each dictionary represents a product and contains the following keys:"title"(str | None): The product title."link"(str | None): The URL to the product page."review"(str | None): A string representing the product review (e.g., "4.5 out of 5 stars")."price"(str | None): The product price."image"(str | None): The URL of the product image.
from amazon_product_search import Amazon
amazon = Amazon()
products = amazon.search("iPhone", productType="electronics", brand="Apple", priceRange="80000-100000")
for product in products:
print(f"Title: {product['title']}")
print(f"Price: {product['price']}")
print(f"Review: {product['review']}")
print(f"Image: {product['image']}")
print(f"Link: {product['link']}")
print("-" * 40)This library works by:
- Constructing a Search URL: It builds a URL for Amazon's search results page based on the provided search parameters.
- Making an HTTP Request: It sends an HTTP GET request to the Amazon search URL using the
requestslibrary. It includes headers to mimic a web browser. - Parsing the HTML: It uses
BeautifulSoup4to parse the HTML response and extract the relevant product information from the search result elements. - Multithreading: It uses
concurrent.futures.ThreadPoolExecutorto process multiple search result elements concurrently, significantly speeding up the data extraction. - Returning Data: It returns the extracted data as a list of dictionaries.
- Rate Limiting: Amazon may rate-limit or block your IP address if you make too many requests in a short period. Use this library responsibly. Consider adding delays or using proxies if you need to scrape a large amount of data. The library includes a
timeoutin the request to help prevent hanging. - Terms of Service: Scraping may be against Amazon's Terms of Service. Use this tool for personal and educational purposes only, and be aware of the potential legal and ethical implications.
- Website Changes: Amazon frequently updates its website structure. If the scraping stops working, the HTML parsing logic may need to be adjusted.
- Error Handling: The library includes basic error handling (e.g., for network errors), but you may need to add more robust error handling for production use.
ValueError: Error product Name is required: You must provide aproductNamewhen calling thesearch()method.Exception: Error while geting data from Amazon: This indicates a problem fetching data from Amazon. It could be a network issue, a problem with your request, or Amazon blocking your request. Enable debugging (is_debuging=True) for more details.- Empty Results: If you get an empty list, it could be that no products matched your search criteria, or that Amazon's HTML structure has changed, and the parsing logic needs to be updated.
- Missing Data (None Values): If some fields (like
revieworprice) areNone, it means the library couldn't find that specific data for that product on the page. This is normal, as Amazon's page structure can vary. ModuleNotFoundError: No module named 'amazon_product_search': Make sure you've installed the package correctly usingpip install amazon-product-search-v2. If you installed from GitHub, make sure you're in the correct virtual environment and that you installed withpip install -e ..
Contributions are welcome! If you find a bug, have a feature request, or want to improve the code, please open an issue or submit a pull request on GitHub.
This project is licensed under the MIT License - see the LICENSE file for details.