Skip to content

Commit a48fb2a

Browse files
committed
Update README.md
1 parent 559abfc commit a48fb2a

File tree

1 file changed

+183
-65
lines changed

1 file changed

+183
-65
lines changed

README.md

Lines changed: 183 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -22,11 +22,10 @@ ModelCache
2222
- [Architecture](#architecture)
2323
- [Quick start](#quick-start)
2424
- [Dependencies](#dependencies)
25-
- [Start service](#start-service)
26-
- [Start demo](#start-demo)
27-
- [Service Startup With Docker-compose](#service-startup-with-docker-compose)
28-
- [Start normal service](#start-normal-service)
29-
- [Visit the service](#visit-the-service)
25+
- [Running the service](#running-the-service)
26+
- [Demo service](#demo-service)
27+
- [Standard service](#standard-service)
28+
- [Using the service](#using-the-service)
3029
- [Write cache](#write-cache)
3130
- [Query cache](#query-cache)
3231
- [Clear cache](#clear-cache)
@@ -43,7 +42,7 @@ ModelCache
4342
- [Contributing](#contributing)
4443

4544
## News
46-
45+
- 🔥🔥[2025.06.28] Added a Websocket-based API, memory cache, multiprocessing-based embedding with configurable amount of workers, bulk-insert support in the backend, python 12 support and massive performance improvements
4746
- 🔥🔥[2024.10.22] Added tasks for 1024 developer day.
4847
- 🔥🔥[2024.04.09] Added Redis Search to store and retrieve embeddings in multi-tenant. This can reduce the interaction time between Cache and vector databases to 10ms.
4948
- 🔥🔥[2023.12.10] Integrated LLM embedding frameworks such as 'llmEmb', 'ONNX', 'PaddleNLP', 'FastText', and the image embedding framework 'timm' to bolster embedding functionality.
@@ -52,18 +51,18 @@ ModelCache
5251

5352
### Introduction
5453

55-
Codefuse-ModelCache is a semantic cache for large language models (LLMs). By caching pre-generated model results, it reduces response time for similar requests and improves user experience. <br />This project aims to optimize services by introducing a caching mechanism. It helps businesses and research institutions reduce the cost of inference deployment, improve model performance and efficiency, and provide scalable services for large models. Through open-source, we aim to share and exchange technologies related to large model semantic cache.
54+
Codefuse-ModelCache is a standalone semantic cache for large language models (LLMs).\
55+
By caching pre-generated model results, it reduces response time for similar requests and improves user experience. <br />This project aims to optimize services by introducing a caching mechanism. It helps businesses and research institutions reduce the cost of inference deployment, improve model performance and efficiency, and provide scalable services for large models. Through open-source, we aim to share and exchange technologies related to large model semantic cache.
5656

5757
## Architecture
5858

5959
![modelcache modules](docs/modelcache_modules_20240409.png)
6060

61-
## Quick start
62-
63-
You can find the start script in `flask4modelcache.py` and `flask4modelcache_demo.py`.
61+
# Quick start
6462

65-
- `flask4modelcache_demo.py`: A quick test service that embeds SQLite and FAISS. No database configuration required.
66-
- `flask4modelcache.py`: The standard service that requires MySQL and Milvus configuration.
63+
You can find the start scripts at the root of the repository.\
64+
There are standard services that require MySQL and Milvus configuration, and there are quick test services that use SQLite and FAISS (No database configuration required).\
65+
The quick test services have `_demo` at the end of the file name
6766

6867
### Dependencies
6968

@@ -74,74 +73,107 @@ You can find the start script in `flask4modelcache.py` and `flask4modelcache_dem
7473
pip install -r requirements.txt
7574
```
7675

77-
### Start service
76+
## Running the service
7877

79-
#### Start demo
78+
### Demo service
79+
Navigate to the root of the repository and run one of the following:
80+
- `python flask4modelcache_demo.py`
81+
- `python fastapi4modelcache_demo.py`
82+
- `python websocket4modelcache_demo.py`
8083

81-
1. Download the embedding model bin file from [Hugging Face](https://huggingface.co/shibing624/text2vec-base-chinese/tree/main). Place it in the `model/text2vec-base-chinese` folder.
82-
2. Start the backend service:
84+
### Standard service
8385

84-
```shell
85-
cd CodeFuse-ModelCache
86-
```
86+
You can choose to run the databases via docker-compose or installing them manually onto your machine
8787

88-
```shell
89-
python flask4modelcache_demo.py
90-
```
91-
#### Service Startup With Docker-compose
92-
1. Download the embedding model bin file from [Hugging Face](https://huggingface.co/shibing624/text2vec-base-chinese/tree/main). Place it in the `model/text2vec-base-chinese` folder.
93-
2. Configure docker network, only need to execute once
94-
```shell
95-
cd CodeFuse-ModelCache
96-
```
88+
#### Starting databases using docker-compose
89+
Navigate to the root of the repository and run
9790
```shell
98-
docker network create modelcache
91+
docker-compose up -d
9992
```
100-
3. Execute the docker-compose command
101-
```shell
102-
# When the modelcache image does not exist locally for the first time, or when the Dockerfile is changed
103-
docker-compose up --build
104-
105-
# This is not the first run and the Dockerfile has not changed
106-
docker-compose up
107-
```
108-
#### Start normal service
109-
110-
Before you start standard service, do these steps:
111-
93+
#### Manual databases insall
11294
1. Install MySQL and import the SQL file from `reference_doc/create_table.sql`.
11395
2. Install vector database Milvus.
11496
3. Configure database access in:
115-
- `modelcache/config/milvus_config.ini`
116-
- `modelcache/config/mysql_config.ini`
117-
4. Download the embedding model bin file from [Hugging Face](https://huggingface.co/shibing624/text2vec-base-chinese/tree/main). Put it in `model/text2vec-base-chinese`.
118-
5. Start the backend service:
119-
120-
```bash
121-
python flask4modelcache.py
122-
```
123-
124-
## Visit the service
125-
126-
The service provides three core RESTful API functionalities: Cache-Writing, Cache-Querying, and Cache-Clearing.
127-
128-
### Write cache
129-
97+
- `modelcache/config/milvus_config.ini`
98+
- `modelcache/config/mysql_config.ini`
99+
100+
\-\-\-\-\-\-\-\-\-\-\-\-
101+
102+
After installing and running the databases, start a backend service of your choice
103+
- `python flask4modelcache_demo.py`
104+
- `python fastapi4modelcache_demo.py`
105+
- `python websocket4modelcache_demo.py`
106+
107+
## Using the service
108+
109+
The service provides three core functionalities: Cache-Writing, Cache-Querying, and Cache-Clearing.\
110+
The service supports both a RESTful API and Websocket API
111+
112+
RESTful API - `flask4modelcache.py` and `fastapi4modelcache.py`\
113+
Websocket API - `websocket4modelcache.py`
114+
115+
### RESTful API
116+
#### Write cache
117+
118+
```json
119+
{
120+
"type": "insert",
121+
"scope": {
122+
"model": "CODEGPT-1008"
123+
},
124+
"chat_info": [
125+
{
126+
"query": [
127+
{
128+
"role": "user",
129+
"content": "Who are you?"
130+
},
131+
{
132+
"role": "system",
133+
"content": "You are an AI code assistant and you must provide neutral and harmless answers to help users solve code-related problems."
134+
}
135+
],
136+
"answer": "Hello, I am an intelligent assistant. How can I assist you?"
137+
}
138+
]
139+
}
140+
```
141+
Code example
130142
```python
131143
import json
132144
import requests
133145
url = 'http://127.0.0.1:5000/modelcache'
134146
type = 'insert'
135147
scope = {"model": "CODEGPT-1008"}
136-
chat_info = [{"query": [{"role": "system", "content": "You are an AI code assistant and you must provide neutral and harmless answers to help users solve code-related problems."}, {"role": "user", "content": "你是谁?"}],
137-
"answer": "Hello, I am an intelligent assistant. How can I assist you?"}]
148+
chat_info = [{"query": [{"role": "system", "content": "You are an AI code assistant and you must provide neutral and harmless answers to help users solve code-related problems."}, {"role": "user", "content": "Who are you?"}],"answer": "Hello, I am an intelligent assistant. How can I assist you?"}]
138149
data = {'type': type, 'scope': scope, 'chat_info': chat_info}
150+
139151
headers = {"Content-Type": "application/json"}
140152
res = requests.post(url, headers=headers, json=json.dumps(data))
141153
```
142154

143-
### Query cache
144-
155+
\-\-\-\-\-\-\-\-\-\-\-\-
156+
157+
#### Query cache
158+
```json
159+
{
160+
"type": "query",
161+
"scope": {
162+
"model": "CODEGPT-1008"
163+
},
164+
"query": [
165+
{
166+
"role": "user",
167+
"content": "Who are you?"
168+
},
169+
{
170+
"role": "system",
171+
"content": "You are an AI code assistant and you must provide neutral and harmless answers to help users solve code-related problems."
172+
}
173+
]
174+
}
175+
```
176+
Code example
145177
```python
146178
import json
147179
import requests
@@ -155,8 +187,19 @@ headers = {"Content-Type": "application/json"}
155187
res = requests.post(url, headers=headers, json=json.dumps(data))
156188
```
157189

158-
### Clear cache
159-
190+
\-\-\-\-\-\-\-\-\-\-\-\-
191+
192+
#### Clear cache
193+
```json
194+
{
195+
"type": "remove",
196+
"scope": {
197+
"model": "CODEGPT-1008"
198+
},
199+
"remove_type": "truncate_by_model"
200+
}
201+
```
202+
Code example
160203
```python
161204
import json
162205
import requests
@@ -170,6 +213,76 @@ headers = {"Content-Type": "application/json"}
170213
res = requests.post(url, headers=headers, json=json.dumps(data))
171214
```
172215

216+
### Websocket API
217+
218+
The websocket API is inherently asynchronous, so we need to wrap the request with a request id in order to be able to track it.\
219+
The service will return a response with the appropriate request id that was given for the request
220+
221+
#### Write cache
222+
```json
223+
{
224+
"requestId": "943e9450-3467-4d73-9b32-68a337691f6d",
225+
"payload": {
226+
"type": "insert",
227+
"scope": {
228+
"model": "CODEGPT-1008"
229+
},
230+
"chat_info": [
231+
{
232+
"query": [
233+
{
234+
"role": "user",
235+
"content": "Who are you?"
236+
},
237+
{
238+
"role": "system",
239+
"content": "You are an AI code assistant and you must provide neutral and harmless answers to help users solve code-related problems."
240+
}
241+
],
242+
"answer": "Hello, I am an intelligent assistant. How can I assist you?"
243+
}
244+
]
245+
}
246+
}
247+
```
248+
249+
#### Query cache
250+
```json
251+
{
252+
"requestId": "51f00484-acc9-406f-807d-29fba672473e",
253+
"payload": {
254+
"type": "query",
255+
"scope": {
256+
"model": "CODEGPT-1008"
257+
},
258+
"query": [
259+
{
260+
"role": "user",
261+
"content": "Who are you?"
262+
},
263+
{
264+
"role": "system",
265+
"content": "You are an AI code assistant and you must provide neutral and harmless answers to help users solve code-related problems."
266+
}
267+
]
268+
}
269+
}
270+
```
271+
272+
#### Clear cache
273+
```json
274+
{
275+
"requestId": "f96bbc87-5ef9-4161-9e96-3076ca97b4b9",
276+
"payload": {
277+
"type": "remove",
278+
"scope": {
279+
"model": "CODEGPT-1008"
280+
},
281+
"remove_type": "truncate_by_model"
282+
}
283+
}
284+
```
285+
173286
## Function comparison
174287

175288
We've implemented several key updates to our repository. We've resolved network issues with Hugging Face and improved inference speed by introducing local embedding capabilities. Due to limitations in SqlAlchemy, we've redesigned our relational database interaction module for more flexible operations. We've added multi-tenancy support to ModelCache, recognizing the need for multiple users and models in LLM products. Lastly, we've made initial adjustments for better compatibility with system commands and multi-turn dialogues.
@@ -297,7 +410,8 @@ We've implemented several key updates to our repository. We've resolved network
297410

298411
## Features
299412

300-
In ModelCache, we incorporated the core principles of GPTCache. ModelCache has four modules: adapter, embedding, similarity, and data_manager.
413+
In ModelCache, we incorporated the core principles of GPTCache.\
414+
ModelCache has four modules: adapter, embedding, similarity, and data_manager.
301415

302416
- The adapter module orchestrates the business logic for various tasks, integrate the embedding, similarity, and data_manager modules.
303417
- The embedding module converts text into semantic vector representations, and transforms user queries into vectors.
@@ -310,6 +424,10 @@ To make ModelCache more suitable for industrial use, we made several improvement
310424
- Embedded into LLM products using a Redis-like caching mode
311425
- Provided semantic caching without interfering with LLM calls, security audits, and other functions
312426
- Compatible with all LLM services
427+
- [x] Multiprocessing-based embedding:
428+
- True parallel embedding, serving multiple requests at once
429+
- Highly scalable, supports configuring the amount of embedding worker.
430+
- Enables efficient use of available computing resources
313431
- [x] Multiple model loading:
314432
- Supported local embedding model loading, and resolved Hugging Face network connectivity issues
315433
- Supported loading embedding layers from various pre-trained models
@@ -329,7 +447,7 @@ To make ModelCache more suitable for industrial use, we made several improvement
329447

330448
### Adapter
331449

332-
- [ ] Register adapter for Milvus:Based on the "model" parameter in the scope, initialize the corresponding Collection and perform the load operation.
450+
- [x] Register adapter for Milvus:Based on the "model" parameter in the scope, initialize the corresponding Collection and perform the load operation.
333451

334452
### Embedding model&inference
335453

@@ -351,7 +469,7 @@ To make ModelCache more suitable for industrial use, we made several improvement
351469

352470
### Service
353471

354-
- [ ] Supports FastAPI.
472+
- [x] Supports FastAPI.
355473
- [ ] Add visual interface to offer a more direct user experience.
356474

357475
## Acknowledgements

0 commit comments

Comments
 (0)