Skip to content

Commit b97b420

Browse files
committed
update readme
1 parent 9f33e0c commit b97b420

File tree

1 file changed

+228
-20
lines changed

1 file changed

+228
-20
lines changed

README.md

Lines changed: 228 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,232 @@
1-
The model is trained and the weights are coming from langchain or huggingface. Deploy the model on server.
2-
3-
# TODO
4-
- make code readable
5-
- tests
6-
- pre-commit hooks
7-
- production ready code
8-
- kubernetes + helm
9-
- CI/CD
10-
- logs
11-
- sentry integration
12-
- validation (max file size, no response, edge cases, etc.)
13-
- make UI?
14-
- OOP?
15-
- save logs to a file?
16-
- cover missing lines in tests
17-
- grafana, datadog?
1+
# Document Summarization API
2+
3+
A scalable FastAPI-based web service that provides intelligent document summarization using OpenAI's language models via LangChain. The application supports multiple document formats and is containerized for easy deployment with Kubernetes orchestration.
184

195
## Features
20-
- supports different languages
216

22-
## notes:
23-
install minikube using:
7+
- 📄 **Multi-format Support**: Process PDF files and plain text documents
8+
- 🤖 **AI-Powered Summarization**: Leverages OpenAI's GPT models through LangChain
9+
- 🚀 **High Performance**: Built with FastAPI for async processing and high throughput
10+
- 🐳 **Containerized**: Docker-ready with Kubernetes manifests for scalable deployment
11+
- 🔧 **Production Ready**: Includes proper error handling, logging, and environment configuration
12+
- 📊 **Interactive API**: Swagger UI documentation available at `/docs`
13+
-**Fast**: Async processing with uvicorn ASGI server
14+
- 🔒 **Secure**: Environment-based configuration for API keys
15+
16+
## Quick Start
17+
18+
### Prerequisites
19+
20+
- Python 3.13+
21+
- Docker (optional, for containerized deployment)
22+
- Minikube (optional, for Kubernetes deployment)
23+
- OpenAI API key
24+
25+
### 1. Local Development
26+
27+
#### Clone and Setup
28+
```bash
29+
git clone https://github.com/seedlit/summarize.git
30+
cd summarize
31+
```
32+
33+
#### Install Dependencies
34+
Using uv (recommended):
35+
```bash
36+
uv sync --all-groups
37+
```
38+
39+
#### Environment Configuration
40+
Create a `.env` file in the project root:
41+
```bash
42+
OPENAI_API_KEY=your_openai_api_key_here
43+
```
44+
45+
#### Run the Application
46+
```bash
47+
# Using uv
48+
uv run uvicorn src.app:app --host 0.0.0.0 --port 8000
49+
```
50+
51+
The API will be available at:
52+
- **API**: http://localhost:8000
53+
- **Interactive Docs**: http://localhost:8000/docs
54+
- **ReDoc**: http://localhost:8000/redoc
55+
56+
### 2. Docker Deployment
57+
58+
#### Build the Docker Image
59+
```bash
60+
docker build -t summarize-app:latest .
61+
```
62+
63+
#### Run with Docker
64+
```bash
65+
docker run -it -p 8000:8000 --env-file .env summarize-app:latest
66+
```
67+
68+
### 3. Kubernetes Deployment
69+
70+
#### Prerequisites
71+
```bash
72+
# Install minikube (macOS)
2473
brew install minikube
74+
75+
# Start minikube
76+
minikube start
77+
```
78+
79+
#### Deploy to Kubernetes
80+
```bash
81+
# Create secret for environment variables
82+
kubectl create secret generic summarize-env --from-env-file=.env
83+
84+
# Load Docker image into minikube
85+
minikube image load summarize-app:latest
86+
87+
# Deploy the application
88+
kubectl apply -f k8s/deployment.yaml
89+
kubectl apply -f k8s/service.yaml
90+
91+
# Check deployment status
92+
kubectl get pods
93+
kubectl get services
94+
95+
# Access the application
96+
minikube service summarize-service --url
97+
```
98+
99+
## API Usage
100+
101+
### Summarize Document
102+
103+
**Endpoint**: `POST /summarize`
104+
105+
**Description**: Upload a document (PDF or text file) and receive an AI-generated summary.
106+
107+
#### Example using curl:
108+
```bash
109+
curl -X POST "http://localhost:8000/summarize" \
110+
-H "accept: application/json" \
111+
-H "Content-Type: multipart/form-data" \
112+
-F "file=@your_document.pdf"
113+
```
114+
115+
#### Example using Python:
116+
```python
117+
import requests
118+
119+
with open("document.pdf", "rb") as f:
120+
response = requests.post(
121+
"http://localhost:8000/summarize",
122+
files={"file": f}
123+
)
124+
125+
summary = response.json()
126+
print(summary["summary"])
127+
```
128+
129+
#### Response Format:
130+
```json
131+
{
132+
"summary": "Generated summary text here..."
133+
}
134+
```
135+
136+
## Development
137+
138+
### Code Quality Tools
139+
140+
The project includes pre-commit hooks for code quality:
141+
142+
```bash
143+
# Install pre-commit hooks
144+
uv run pre-commit install
145+
146+
# Run all checks
147+
uv run pre-commit run --all-files
148+
```
149+
150+
### Running Tests
151+
```bash
152+
# Run tests with coverage
153+
uv run pytest --cov=src tests/
154+
155+
# Run specific test file
156+
uv run pytest tests/test_summarize_document.py -v
157+
```
158+
159+
### Project Structure
160+
```
161+
summarize/
162+
├── src/
163+
│ ├── app.py # FastAPI application
164+
│ ├── summarize_document.py # Core summarization logic
165+
│ ├── utils.py # Utility functions
166+
│ ├── constants.py # Application constants
167+
│ └── exceptions.py # Custom exception classes
168+
├── tests/ # Tests
169+
├── k8s/ # Kubernetes manifests
170+
│ ├── deployment.yaml # Application deployment
171+
│ └── service.yaml # Service configuration
172+
├── Dockerfile # Container definition
173+
├── pyproject.toml # Project configuration
174+
└── README.md # This file
175+
```
176+
177+
## Configuration
178+
179+
### Environment Variables
180+
181+
| Variable | Description | Required |
182+
|----------|-------------|----------|
183+
| `OPENAI_API_KEY` | OpenAI API key for language model access | Yes |
184+
| |
185+
186+
## Scaling and Production
187+
188+
### Kubernetes Features
189+
- **Auto-scaling**: Configured for 3 replicas by default
190+
- **Load balancing**: Built-in Kubernetes service load balancing
191+
- **Health checks**: Ready for liveness and readiness probes
192+
- **Secret management**: Environment variables stored as Kubernetes secrets
193+
194+
### Performance Considerations
195+
- Async request processing with FastAPI
196+
- Containerized for horizontal scaling
197+
- Stateless design for easy load balancing
198+
- Efficient PDF processing with PyPDF
199+
200+
## Error Handling
201+
202+
The API provides comprehensive error handling:
203+
- **4XX**: Bad Request (invalid file format, missing filename)
204+
- **5XX**: Internal Server Error (summarization failures, API issues)
205+
206+
All errors return structured JSON responses with descriptive messages.
207+
208+
## Supported File Formats
209+
210+
- **PDF**: Binary PDF files with text content
211+
- **Text Files**: Plain text files (.txt)
212+
213+
## Contributing
214+
215+
1. Fork the repository
216+
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
217+
3. Make your changes
218+
4. Run tests and pre-commit hooks
219+
5. Commit your changes (`git commit -m 'Add amazing feature'`)
220+
6. Push to the branch (`git push origin feature/amazing-feature`)
221+
7. Open a Pull Request
222+
223+
## Roadmap
224+
225+
- [ ] Support for additional file formats (DOCX, RTF)
226+
- [ ] Batch processing capabilities
227+
- [ ] Caching layer for improved performance
228+
- [ ] Monitoring and metrics (Prometheus, Grafana)
229+
- [ ] Enhanced logging and error tracking (Sentry)
230+
- [ ] Web UI for document upload
231+
- [ ] Multi-language support
232+
- [ ] Custom summarization parameters

0 commit comments

Comments
 (0)