Retaining Code snippet in markdown content using UnstructuredMarkdownLoader #20142
Replies: 1 comment 2 replies
-
Hey there @kavinkumarrajendran2000! 🎉 I'm Dosu, a friendly bot here to assist you with bugs, answer your questions, and guide you on your journey to becoming a contributor. While we wait for a human maintainer to join us, I'm here to help you out. Let's tackle this issue together! To modify the Given the context provided, there isn't a direct snippet of the First, ensure you have the necessary library installed: pip install markdown # or mistune, depending on what LangChain uses Then, you can extend the from langchain_community.document_loaders.markdown import UnstructuredMarkdownLoader
import markdown
class CustomMarkdownLoader(UnstructuredMarkdownLoader):
def load_content(self, content):
"""
Override the load_content method to parse Markdown content
in a way that retains code snippets.
"""
# Assuming the original loader converts Markdown to plain text
# Modify this to retain code snippets
md = markdown.Markdown(extensions=['fenced_code'])
html_content = md.convert(content)
# Further processing to extract or retain code snippets from HTML
# This part depends on how you want to handle the HTML content
# For simplicity, this example does not implement HTML parsing
return html_content
# Use your custom loader
loader = CustomMarkdownLoader()
content = loader.load_content(your_markdown_content) This example assumes that the original Please note, this example might need adjustments based on the actual implementation details of Unfortunately, I wasn't able to find the exact implementation of the Sources
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Checked other resources
Commit to Help
Example Code
Description
Hi,
I am using UnstructuredMarkdownLoader to load the Markdown content, but the ``` used for code snippet in Markdown content are removed after loading, so it's considered as a text instead code snippet. Also, the content in code snippet is removed sometimes.
**Example: MD content before loading: **
Area
After loaded using UnstructuredMarkdownLoader :(``` is removed and often code snippet content is removed)
Area
@using Charts
public class ChartData
{
public string X { get; set; }
public double Y { get; set; }
}
}
How to retain the actual code snippet content and ``` in the code snippet.
Thanks.
System Info
Python 3.9.7
langchain latest version
Beta Was this translation helpful? Give feedback.
All reactions