LangChain Text Splitters

This repository provides examples and usage of LangChain text splitters, a fundamental tool for preparing large documents into smaller, manageable chunks that can be effectively processed by language models.

Splitting text is critical when working with LLMs (Large Language Models), since they have context length limits. Proper splitting ensures that the context remains meaningful and coherent, improving downstream tasks like question answering, summarization, embedding generation, and retrieval-augmented generation (RAG).

📌 Supported Text Splitters

1. Character Text Splitter

Overview: A simple splitter that breaks text into chunks based on a fixed character size.
Usage: Useful for quick prototyping or when structure does not matter.
Pros:
- Very fast and straightforward
- Works with arbitrary text without assumptions
Cons:
- May split mid-sentence or mid-word
- Does not preserve semantic meaning

2. Recursive Character Text Splitter (Most Commonly Used)

Overview: A more advanced splitter that tries to split text by preferred separators, falling back recursively if no separator is found.
Why it matters: This is the most widely used text splitter in LangChain, as it balances chunk size with semantic integrity.
Capabilities:
- Text structure splitting (paragraphs, sentences, words)
- Programming language splitting (by functions, classes, code blocks)
- Markdown splitting (headings, bullet points, sections)
- Semantic-aware splitting (tries to keep related meaning together)
Pros:
- Preserves logical boundaries in text
- Adaptable for structured documents, code, or markdown
- Provides more meaningful chunks for embeddings and retrieval
Cons:
- Slightly more computationally intensive than a simple character splitter

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Building Machine Learning Systems with Python - Second Edition.pdf		Building Machine Learning Systems with Python - Second Edition.pdf
README.md		README.md
length_based.py		length_based.py
markdown_splitting.py		markdown_splitting.py
python_code_splitting.py		python_code_splitting.py
semantic_meaning_based.py		semantic_meaning_based.py
text_structure_based.py		text_structure_based.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LangChain Text Splitters

📌 Supported Text Splitters

1. Character Text Splitter

2. Recursive Character Text Splitter (Most Commonly Used)

About

Uh oh!

Releases

Packages

Languages

tahirkorma/langchain-text-splitters

Folders and files

Latest commit

History

Repository files navigation

LangChain Text Splitters

📌 Supported Text Splitters

1. Character Text Splitter

2. Recursive Character Text Splitter (Most Commonly Used)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages