You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[Service Startup With Docker-compose](#service-startup-with-docker-compose)
28
-
-[Start normal service](#start-normal-service)
29
-
-[Visit the service](#visit-the-service)
25
+
-[Running the service](#running-the-service)
26
+
-[Demo service](#demo-service)
27
+
-[Standard service](#standard-service)
28
+
-[Using the service](#using-the-service)
30
29
-[Write cache](#write-cache)
31
30
-[Query cache](#query-cache)
32
31
-[Clear cache](#clear-cache)
@@ -43,7 +42,7 @@ ModelCache
43
42
-[Contributing](#contributing)
44
43
45
44
## News
46
-
45
+
- 🔥🔥[2025.06.28] Added a Websocket-based API, memory cache, multiprocessing-based embedding with configurable amount of workers, bulk-insert support in the backend, python 12 support and massive performance improvements
47
46
- 🔥🔥[2024.10.22] Added tasks for 1024 developer day.
48
47
- 🔥🔥[2024.04.09] Added Redis Search to store and retrieve embeddings in multi-tenant. This can reduce the interaction time between Cache and vector databases to 10ms.
49
48
- 🔥🔥[2023.12.10] Integrated LLM embedding frameworks such as 'llmEmb', 'ONNX', 'PaddleNLP', 'FastText', and the image embedding framework 'timm' to bolster embedding functionality.
@@ -52,18 +51,18 @@ ModelCache
52
51
53
52
### Introduction
54
53
55
-
Codefuse-ModelCache is a semantic cache for large language models (LLMs). By caching pre-generated model results, it reduces response time for similar requests and improves user experience. <br />This project aims to optimize services by introducing a caching mechanism. It helps businesses and research institutions reduce the cost of inference deployment, improve model performance and efficiency, and provide scalable services for large models. Through open-source, we aim to share and exchange technologies related to large model semantic cache.
54
+
Codefuse-ModelCache is a standalone semantic cache for large language models (LLMs).\
55
+
By caching pre-generated model results, it reduces response time for similar requests and improves user experience. <br />This project aims to optimize services by introducing a caching mechanism. It helps businesses and research institutions reduce the cost of inference deployment, improve model performance and efficiency, and provide scalable services for large models. Through open-source, we aim to share and exchange technologies related to large model semantic cache.
You can find the start script in `flask4modelcache.py` and `flask4modelcache_demo.py`.
61
+
# Quick start
64
62
65
-
-`flask4modelcache_demo.py`: A quick test service that embeds SQLite and FAISS. No database configuration required.
66
-
-`flask4modelcache.py`: The standard service that requires MySQL and Milvus configuration.
63
+
You can find the start scripts at the root of the repository.\
64
+
There are standard services that require MySQL and Milvus configuration, and there are quick test services that use SQLite and FAISS (No database configuration required).\
65
+
The quick test services have `_demo` at the end of the file name
67
66
68
67
### Dependencies
69
68
@@ -74,74 +73,107 @@ You can find the start script in `flask4modelcache.py` and `flask4modelcache_dem
74
73
pip install -r requirements.txt
75
74
```
76
75
77
-
### Start service
76
+
##Running the service
78
77
79
-
#### Start demo
78
+
### Demo service
79
+
Navigate to the root of the repository and run one of the following:
80
+
-`python flask4modelcache_demo.py`
81
+
-`python fastapi4modelcache_demo.py`
82
+
-`python websocket4modelcache_demo.py`
80
83
81
-
1. Download the embedding model bin file from [Hugging Face](https://huggingface.co/shibing624/text2vec-base-chinese/tree/main). Place it in the `model/text2vec-base-chinese` folder.
82
-
2. Start the backend service:
84
+
### Standard service
83
85
84
-
```shell
85
-
cd CodeFuse-ModelCache
86
-
```
86
+
You can choose to run the databases via docker-compose or installing them manually onto your machine
87
87
88
-
```shell
89
-
python flask4modelcache_demo.py
90
-
```
91
-
#### Service Startup With Docker-compose
92
-
1. Download the embedding model bin file from [Hugging Face](https://huggingface.co/shibing624/text2vec-base-chinese/tree/main). Place it in the `model/text2vec-base-chinese` folder.
93
-
2. Configure docker network, only need to execute once
94
-
```shell
95
-
cd CodeFuse-ModelCache
96
-
```
88
+
#### Starting databases using docker-compose
89
+
Navigate to the root of the repository and run
97
90
```shell
98
-
docker network create modelcache
91
+
docker-compose up -d
99
92
```
100
-
3. Execute the docker-compose command
101
-
```shell
102
-
# When the modelcache image does not exist locally for the first time, or when the Dockerfile is changed
103
-
docker-compose up --build
104
-
105
-
# This is not the first run and the Dockerfile has not changed
106
-
docker-compose up
107
-
```
108
-
#### Start normal service
109
-
110
-
Before you start standard service, do these steps:
111
-
93
+
#### Manual databases insall
112
94
1. Install MySQL and import the SQL file from `reference_doc/create_table.sql`.
113
95
2. Install vector database Milvus.
114
96
3. Configure database access in:
115
-
-`modelcache/config/milvus_config.ini`
116
-
-`modelcache/config/mysql_config.ini`
117
-
4. Download the embedding model bin file from [Hugging Face](https://huggingface.co/shibing624/text2vec-base-chinese/tree/main). Put it in `model/text2vec-base-chinese`.
118
-
5. Start the backend service:
119
-
120
-
```bash
121
-
python flask4modelcache.py
122
-
```
123
-
124
-
## Visit the service
125
-
126
-
The service provides three core RESTful API functionalities: Cache-Writing, Cache-Querying, and Cache-Clearing.
127
-
128
-
### Write cache
129
-
97
+
-`modelcache/config/milvus_config.ini`
98
+
-`modelcache/config/mysql_config.ini`
99
+
100
+
\-\-\-\-\-\-\-\-\-\-\-\-
101
+
102
+
After installing and running the databases, start a backend service of your choice
103
+
-`python flask4modelcache_demo.py`
104
+
-`python fastapi4modelcache_demo.py`
105
+
-`python websocket4modelcache_demo.py`
106
+
107
+
## Using the service
108
+
109
+
The service provides three core functionalities: Cache-Writing, Cache-Querying, and Cache-Clearing.\
110
+
The service supports both a RESTful API and Websocket API
111
+
112
+
RESTful API - `flask4modelcache.py` and `fastapi4modelcache.py`\
113
+
Websocket API - `websocket4modelcache.py`
114
+
115
+
### RESTful API
116
+
#### Write cache
117
+
118
+
```json
119
+
{
120
+
"type": "insert",
121
+
"scope": {
122
+
"model": "CODEGPT-1008"
123
+
},
124
+
"chat_info": [
125
+
{
126
+
"query": [
127
+
{
128
+
"role": "user",
129
+
"content": "Who are you?"
130
+
},
131
+
{
132
+
"role": "system",
133
+
"content": "You are an AI code assistant and you must provide neutral and harmless answers to help users solve code-related problems."
134
+
}
135
+
],
136
+
"answer": "Hello, I am an intelligent assistant. How can I assist you?"
137
+
}
138
+
]
139
+
}
140
+
```
141
+
Code example
130
142
```python
131
143
import json
132
144
import requests
133
145
url ='http://127.0.0.1:5000/modelcache'
134
146
type='insert'
135
147
scope = {"model": "CODEGPT-1008"}
136
-
chat_info = [{"query": [{"role": "system", "content": "You are an AI code assistant and you must provide neutral and harmless answers to help users solve code-related problems."}, {"role": "user", "content": "你是谁?"}],
137
-
"answer": "Hello, I am an intelligent assistant. How can I assist you?"}]
148
+
chat_info = [{"query": [{"role": "system", "content": "You are an AI code assistant and you must provide neutral and harmless answers to help users solve code-related problems."}, {"role": "user", "content": "Who are you?"}],"answer": "Hello, I am an intelligent assistant. How can I assist you?"}]
138
149
data = {'type': type, 'scope': scope, 'chat_info': chat_info}
150
+
139
151
headers = {"Content-Type": "application/json"}
140
152
res = requests.post(url, headers=headers, json=json.dumps(data))
141
153
```
142
154
143
-
### Query cache
144
-
155
+
\-\-\-\-\-\-\-\-\-\-\-\-
156
+
157
+
#### Query cache
158
+
```json
159
+
{
160
+
"type": "query",
161
+
"scope": {
162
+
"model": "CODEGPT-1008"
163
+
},
164
+
"query": [
165
+
{
166
+
"role": "user",
167
+
"content": "Who are you?"
168
+
},
169
+
{
170
+
"role": "system",
171
+
"content": "You are an AI code assistant and you must provide neutral and harmless answers to help users solve code-related problems."
We've implemented several key updates to our repository. We've resolved network issues with Hugging Face and improved inference speed by introducing local embedding capabilities. Due to limitations in SqlAlchemy, we've redesigned our relational database interaction module for more flexible operations. We've added multi-tenancy support to ModelCache, recognizing the need for multiple users and models in LLM products. Lastly, we've made initial adjustments for better compatibility with system commands and multi-turn dialogues.
@@ -297,7 +410,8 @@ We've implemented several key updates to our repository. We've resolved network
297
410
298
411
## Features
299
412
300
-
In ModelCache, we incorporated the core principles of GPTCache. ModelCache has four modules: adapter, embedding, similarity, and data_manager.
413
+
In ModelCache, we incorporated the core principles of GPTCache.\
414
+
ModelCache has four modules: adapter, embedding, similarity, and data_manager.
301
415
302
416
- The adapter module orchestrates the business logic for various tasks, integrate the embedding, similarity, and data_manager modules.
303
417
- The embedding module converts text into semantic vector representations, and transforms user queries into vectors.
@@ -310,6 +424,10 @@ To make ModelCache more suitable for industrial use, we made several improvement
310
424
- Embedded into LLM products using a Redis-like caching mode
311
425
- Provided semantic caching without interfering with LLM calls, security audits, and other functions
312
426
- Compatible with all LLM services
427
+
-[x] Multiprocessing-based embedding:
428
+
- True parallel embedding, serving multiple requests at once
429
+
- Highly scalable, supports configuring the amount of embedding worker.
430
+
- Enables efficient use of available computing resources
313
431
-[x] Multiple model loading:
314
432
- Supported local embedding model loading, and resolved Hugging Face network connectivity issues
315
433
- Supported loading embedding layers from various pre-trained models
@@ -329,7 +447,7 @@ To make ModelCache more suitable for industrial use, we made several improvement
329
447
330
448
### Adapter
331
449
332
-
- [] Register adapter for Milvus:Based on the "model" parameter in the scope, initialize the corresponding Collection and perform the load operation.
450
+
-[x] Register adapter for Milvus:Based on the "model" parameter in the scope, initialize the corresponding Collection and perform the load operation.
333
451
334
452
### Embedding model&inference
335
453
@@ -351,7 +469,7 @@ To make ModelCache more suitable for industrial use, we made several improvement
351
469
352
470
### Service
353
471
354
-
- [] Supports FastAPI.
472
+
-[x] Supports FastAPI.
355
473
-[ ] Add visual interface to offer a more direct user experience.
0 commit comments