Author: Xingbo Wang
Contact: wangxbzb@gmail.com
Chat with Your Data is an interactive Streamlit app that lets you upload your own tabular data (CSV, Excel, etc.) and chat with it using natural language. Powered by OpenAI and LangChain, it builds a SQL database for your data, answers questions by generating SQL queries, and visualizes results automatically.
Note: This project was built in 2023. Some functions may break due to upstream changes. Please open an issue if you encounter any problems!
- Conversational Data Analysis: Ask questions about your data in plain English.
- Automatic SQL Generation: Build a SQL database for your data and convert your questions into SQL queries.
- Data Visualization: Automatically generates relevant charts (bar, line, scatter, etc.) for query results.
- Supports Multiple File Formats: CSV, Excel, JSON, Parquet, and more.
- Session History: Download and upload your chat history for later use.
git clone https://github.com/yourusername/chat_with_data_llm.git
cd chat_with_data_llm
It's recommended to use a virtual environment:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
- Get your API key from OpenAI.
- Enter your API key in the app sidebar when running the app.
streamlit run chat_data.py
- Upload your data: Use the sidebar to upload a CSV, Excel, or other supported file.
- Or use the example: Try the built-in Titanic dataset.
- Ask questions: Type questions like:
- "How many passengers survived?"
- "Show me the average fare by passenger class."
- "Plot the age distribution."
- View results: The app will show answers, SQL queries, and visualizations.
- Download/Upload history: Save your chat for later, or reload a previous session.
Contributions are welcome! To contribute:
- Fork the repo and create your branch.
- Add your feature or bugfix with clear code and docstrings.
- Add or update tests if relevant.
- Open a pull request with a clear description.
Frontend UI: Streamlit
Data Visualization: VegaLite & NL4dv
Some code for generating data visualization was borrowed from my other project: QRecNLI