Reddit Comments Scraper

Extract complete Reddit comment threads with full conversation context, user details, and engagement metrics. This scraper makes it easy to collect, analyze, and visualize Reddit discussions for research, monitoring, or automation.

Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for Reddit Comments Scraper you've just found your team — Let’s Chat. 👆👆

Introduction

The Reddit Comments Scraper is a data extraction tool that collects Reddit comments — including all nested replies — from a given post URL. It captures user info, comment structure, timestamps, and engagement stats.

Why It’s Useful

Collects all levels of conversation, preserving context and hierarchy.
Helps researchers, developers, and analysts understand community sentiment.
Ideal for market research, content monitoring, and academic analysis.

Features

Feature	Description
Full Thread Extraction	Captures every comment and nested reply to maintain discussion hierarchy.
User Information	Includes author name, avatar URL, and profile link.
Engagement Metrics	Tracks upvotes and other interaction statistics.
Content Detection	Identifies content type (text, image, etc.).
Duplicate Filtering	Prevents repeated comment entries for clean datasets.
Proxy Support	Optional proxy configuration for large-scale scraping.

What Data This Scraper Extracts

Field Name	Field Description
comment_id	Unique identifier for the comment.
post_id	Reddit post identifier associated with the comment.
author	Username of the commenter.
permalink	Direct link to the comment.
upvotes	Number of upvotes the comment received.
content_type	Type of content (e.g., text, image).
parent_id	ID of the parent comment if it’s a reply.
author_avatar	URL to the author’s profile image.
userUrl	Direct link to the Reddit user profile.
contentText	The actual text content of the comment.
created_time	Timestamp of when the comment was created (ISO format).
replies	Array containing nested reply objects.

Example Output

[
  {
    "comment_id": "t1_lhk1f7n",
    "post_id": "t3_1epeshq",
    "author": "AutoModerator",
    "permalink": "https://www.reddit.com/r/ChatGPT/comments/1epeshq/comment/lhk1f7n/",
    "upvotes": 1,
    "content_type": "text",
    "parent_id": null,
    "author_avatar": "https://styles.redditmedia.com/t5_1yz875/styles/profileIcon_klqlly9fc4l41.png",
    "userUrl": "https://www.reddit.com/user/AutoModerator",
    "contentText": "Moderator Announcement\nHey u/Maxie445!\nIf your post is a screenshot of a ChatGPT conversation...",
    "created_time": "2024-08-11T07:12:09.272000+0000"
  },
  {
    "comment_id": "t1_lhkeis2",
    "post_id": "t3_1epeshq",
    "author": "Alternative_Lynx_155",
    "upvotes": 1434,
    "content_type": "text",
    "contentText": "That is crazy. When I was younger I thought thispersondoesnotexist.com was scary...",
    "created_time": "2024-08-11T09:39:54.843000+0000",
    "replies": [
      {
        "comment_id": "t1_lhmhxjf",
        "author": "who_am_i_to_say_so",
        "upvotes": 279,
        "contentText": "I just spent 30 mins f5'ing that page. It's so addicting!"
      }
    ]
  }
]

Directory Structure Tree

reddit-comments-scraper/
├── src/
│   ├── main.py
│   ├── extractors/
│   │   ├── reddit_parser.py
│   │   └── utils_date.py
│   ├── outputs/
│   │   └── exporters.py
│   └── config/
│       └── settings.example.json
├── data/
│   ├── inputs.sample.json
│   └── sample_output.json
├── requirements.txt
└── README.md

Use Cases

Researchers use it to analyze community sentiment so they can publish insights about online behavior.
Marketers use it to monitor product feedback threads and improve engagement strategies.
Developers use it to train NLP models using authentic conversational data.
Content moderators use it to track harmful or spammy replies and enhance moderation tools.
Analysts use it to study topic trends across subreddits for market or social analysis.

FAQs

Q: Does it capture deleted or removed comments? A: No. It only retrieves active comments visible in the public thread.

Q: Can I limit the number of comments scraped? A: Yes, use the maxItems parameter to define how many comments you’d like to collect.

Q: What formats can I export the data to? A: You can export data in JSON, JSONL, CSV, XML, HTML, or Excel.

Q: Is authentication required? A: No. It works on publicly accessible Reddit posts without login credentials.

Performance Benchmarks and Results

Primary Metric: Average scraping speed — around 200 comments per minute on standard connections. Reliability Metric: 99% success rate on valid Reddit post URLs. Efficiency Metric: Lightweight and stable under concurrent thread extractions. Quality Metric: 98% data completeness across metadata fields and comment nesting.

“Bitbash is a top-tier automation partner, innovative, reliable, and dedicated to delivering real results every time.”

Nathan Pennington
Marketer
★★★★★

“Bitbash delivers outstanding quality, speed, and professionalism, truly a team you can rely on.”

Eliza
SEO Affiliate Expert
★★★★★

“Exceptional results, clear communication, and flawless delivery. Bitbash nailed it.”

Syed
Digital Strategist
★★★★★

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Reddit Comments Scraper

Introduction

Why It’s Useful

Features

What Data This Scraper Extracts

Example Output

Directory Structure Tree

Use Cases

FAQs

Performance Benchmarks and Results

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
src		src
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

nichoxLashall/reddit-comments-scraper

Folders and files

Latest commit

History

Repository files navigation

Reddit Comments Scraper

Introduction

Why It’s Useful

Features

What Data This Scraper Extracts

Example Output

Directory Structure Tree

Use Cases

FAQs

Performance Benchmarks and Results

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages