Skip to content

Examples and usage of LangChain text splitters, including CharacterTextSplitter and the widely used RecursiveCharacterTextSplitter for splitting text into meaningful chunks. Supports structured text, code, markdown, and semantic-aware splitting for LLM applications.

Notifications You must be signed in to change notification settings

tahirkorma/langchain-text-splitters

Repository files navigation

LangChain Text Splitters

This repository provides examples and usage of LangChain text splitters, a fundamental tool for preparing large documents into smaller, manageable chunks that can be effectively processed by language models.

Splitting text is critical when working with LLMs (Large Language Models), since they have context length limits. Proper splitting ensures that the context remains meaningful and coherent, improving downstream tasks like question answering, summarization, embedding generation, and retrieval-augmented generation (RAG).


📌 Supported Text Splitters

1. Character Text Splitter

  • Overview: A simple splitter that breaks text into chunks based on a fixed character size.
  • Usage: Useful for quick prototyping or when structure does not matter.
  • Pros:
    • Very fast and straightforward
    • Works with arbitrary text without assumptions
  • Cons:
    • May split mid-sentence or mid-word
    • Does not preserve semantic meaning

2. Recursive Character Text Splitter (Most Commonly Used)

  • Overview: A more advanced splitter that tries to split text by preferred separators, falling back recursively if no separator is found.

  • Why it matters: This is the most widely used text splitter in LangChain, as it balances chunk size with semantic integrity.

  • Capabilities:

    • Text structure splitting (paragraphs, sentences, words)
    • Programming language splitting (by functions, classes, code blocks)
    • Markdown splitting (headings, bullet points, sections)
    • Semantic-aware splitting (tries to keep related meaning together)
  • Pros:

    • Preserves logical boundaries in text
    • Adaptable for structured documents, code, or markdown
    • Provides more meaningful chunks for embeddings and retrieval
  • Cons:

    • Slightly more computationally intensive than a simple character splitter

About

Examples and usage of LangChain text splitters, including CharacterTextSplitter and the widely used RecursiveCharacterTextSplitter for splitting text into meaningful chunks. Supports structured text, code, markdown, and semantic-aware splitting for LLM applications.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages