Skip to content

Commit 2a0a00f

Browse files
committed
add more content
1 parent 2556eb6 commit 2a0a00f

File tree

6 files changed

+351
-37
lines changed

6 files changed

+351
-37
lines changed

README.md

Lines changed: 51 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -23,14 +23,14 @@ ModelCache
2323
- [Quick start](#quick-start)
2424
- [Dependencies](#dependencies)
2525
- [Start service](#start-service)
26-
- [Start Demo](#start-demo)
26+
- [Start demo](#start-demo)
2727
- [Start normal service](#start-normal-service)
28-
- [Access the service](#access-the-service)
28+
- [Visit the service](#visit-the-service)
2929
- [Write cache](#write-cache)
3030
- [Query cache](#query-cache)
3131
- [Clear cache](#clear-cache)
3232
- [Function comparison](#function-comparison)
33-
- [Core-Features](#core-features)
33+
- [Features](#features)
3434
- [Todo List](#todo-list)
3535
- [Adapter](#adapter)
3636
- [Embedding model\&inference](#embedding-modelinference)
@@ -60,12 +60,12 @@ Codefuse-ModelCache is a semantic cache for large language models (LLMs). By cac
6060

6161
You can find the start script in `flask4modelcache.py` and `flask4modelcache_demo.py`.
6262

63-
- `flask4modelcache_demo.py` is a quick test service that embeds sqlite and faiss. You do not need to be concerned about database-related matters.
64-
- `flask4modelcache.py` is the normal service that requires configuration of MySQL and Milvus.
63+
- `flask4modelcache_demo.py`: A quick test service that embeds SQLite and FAISS. No database configuration required.
64+
- `flask4modelcache.py`: The standard service that requires MySQL and Milvus configuration.
6565

6666
### Dependencies
6767

68-
- Python: V3.8 and above
68+
- Python: V3.8 or above
6969
- Package installation
7070

7171
```shell
@@ -74,10 +74,10 @@ You can find the start script in `flask4modelcache.py` and `flask4modelcache_dem
7474

7575
### Start service
7676

77-
#### Start Demo
77+
#### Start demo
7878

79-
1. Download the embedding model bin file on [Hugging Face](https://huggingface.co/shibing624/text2vec-base-chinese/tree/main). Place the downloaded bin file in the model/text2vec-base-chinese folder.
80-
2. Start the backend service by using `flask4modelcache_dome.py`.
79+
1. Download the embedding model bin file from [Hugging Face](https://huggingface.co/shibing624/text2vec-base-chinese/tree/main). Place it in the `model/text2vec-base-chinese` folder.
80+
2. Start the backend service:
8181

8282
```shell
8383
cd CodeFuse-ModelCache
@@ -89,19 +89,23 @@ You can find the start script in `flask4modelcache.py` and `flask4modelcache_dem
8989

9090
#### Start normal service
9191

92-
Before you start normal service, make sure that you have completed these steps:
92+
Before you start standard service, do these steps:
9393

94-
1. Install the relational database MySQL and import the SQL file to create the data tables. You can find the SQL file in `reference_doc/create_table.sql`.
94+
1. Install MySQL and import the SQL file from `reference_doc/create_table.sql`.
9595
2. Install vector database Milvus.
96-
3. Add the database access information to the configuration files:
97-
1. `modelcache/config/milvus_config.ini`
98-
2. `modelcache/config/mysql_config.ini`
99-
4. Download the embedding model bin file from [Hugging Face](https://huggingface.co/shibing624/text2vec-base-chinese/tree/main). Put the bin file in the `model/text2vec-base-chinese` directory.
100-
5. Start the backend service by using the `flask4modelcache.py` script.
96+
3. Configure database access in:
97+
- `modelcache/config/milvus_config.ini`
98+
- `modelcache/config/mysql_config.ini`
99+
4. Download the embedding model bin file from [Hugging Face](https://huggingface.co/shibing624/text2vec-base-chinese/tree/main). Put it in `model/text2vec-base-chinese`.
100+
5. Start the backend service:
101101

102-
## Access the service
102+
```bash
103+
python flask4modelcache.py
104+
```
103105

104-
The current service provides three core functionalities through RESTful API.: Cache-Writing, Cache-Querying, and Cache-Clearing. Demos:
106+
## Visit the service
107+
108+
The service provides three core RESTful API functionalities: Cache-Writing, Cache-Querying, and Cache-Clearing.
105109

106110
### Write cache
107111

@@ -273,25 +277,35 @@ We've implemented several key updates to our repository. We've resolved network
273277
</tr>
274278
</table>
275279
276-
## Core-Features
277-
278-
In ModelCache, we adopted the main idea of GPTCache, includes core modules: adapter, embedding, similarity, and data_manager. The adapter module is responsible for handling the business logic of various tasks and can connect the embedding, similarity, and data_manager modules. The embedding module is mainly responsible for converting text into semantic vector representations, it transforms user queries into vector form.The rank module is used for sorting and evaluating the similarity of the recalled vectors. The data_manager module is primarily used for managing the database. In order to better facilitate industrial applications, we have made architectural and functional upgrades as follows:
279-
280-
- [x] We have modified it similar to Redis and embedded it into the LLMs product, providing semantic caching capabilities. This ensures that it does not interfere with LLM calls, security audits, and other functionalities, achieving compatibility with all large-scale model services.
281-
- [x] Multiple Model Loading Schemes:
282-
- Support loading local embedding models to address Hugging Face network connectivity issues.
283-
- Support loading various pretrained model embedding layers.
284-
- [x] Data Isolation Capability
285-
- Environment Isolation: Can pull different database configurations based on the environment to achieve environment isolation (dev, prepub, prod).
286-
- Multi-tenant Data Isolation: Dynamically create collections based on the model for data isolation, addressing data isolation issues in multi-model/services scenarios in LLMs products.
287-
- [x] Support for System Commands: Adopting a concatenation approach to address the issue of system commands in the prompt format.
288-
- [x] Differentiation of Long and Short Texts: Long texts pose more challenges for similarity evaluation. To address this, we have added differentiation between long and short texts, allowing for separate configuration of threshold values for determining similarity.
289-
- [x] Milvus Performance Optimization: The consistency_level of Milvus has been adjusted to "Session" level, which can result in better performance.
290-
- [x] Data Management Capability:
291-
- Ability to clear the cache, used for data management after model upgrades.
292-
- Hitquery recall for subsequent data analysis and model iteration reference.
293-
- Asynchronous log write-back capability for data analysis and statistics.
294-
- Added model field and data statistics field for feature expansion.
280+
## Features
281+
282+
In ModelCache, we incorporated the core principles of GPTCache. ModelCache has four modules: adapter, embedding, similarity, and data_manager.
283+
284+
- The adapter module orchestrates the business logic for various tasks, integrate the embedding, similarity, and data_manager modules.
285+
- The embedding module converts text into semantic vector representations, and transforms user queries into vectors.
286+
- The rank module ranks and evaluate the similarity of recalled vectors.
287+
- The data_manager module manages the databases.
288+
289+
To make ModelCache more suitable for industrial use, we made several improvements to its architecture and functionality:
290+
291+
- [x] Architectural adjustment (lightweight integration):
292+
- Embedded into LLM products using a Redis-like caching mode
293+
- Provided semantic caching without interfering with LLM calls, security audits, and other functions
294+
- Compatible with all LLM services
295+
- [x] Multiple model loading:
296+
- Supported local embedding model loading, and resolved Hugging Face network connectivity issues
297+
- Supported loading embedding layers from various pre-trained models
298+
- [x] Data isolation
299+
- Environment isolation: Read different database configurations based on the environment. Isolate development, staging, and production environments.
300+
- Multi-tenant data isolation: Dynamically create collections based on models for data isolation, addressing data separation issues in multi-model/service scenarios within large language model products
301+
- [x] Supported system instruction: Adopted a concatenation approach to resolve issues with system instructions in the prompt paradigm.
302+
- [x] Long and short text differentiation: Long texts bring more challenges for similarity assessment. Added differentiation between long and short texts, allowing for separate threshold configurations.
303+
- [x] Milvus performance optimization: Adjusted Milvus consistency level to "Session" level for better performance.
304+
- [x] Data management:
305+
- One-click cache clearing to enable easy data management after model upgrades.
306+
- Recall of hit queries for subsequent data analysis and model iteration reference.
307+
- Asynchronous log write-back for data analysis and statistics
308+
- Added model field and data statistics field to enhance features
295309
296310
## Todo List
297311

docs/1.what-is-model-cache.md

Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
# What is ModelCache
2+
3+
In ModelCache, we adopted the main idea of GPTCache, includes core modules: adapter, embedding, similarity, and data_manager. The adapter module is responsible for handling the business logic of various tasks and can connect the embedding, similarity, and data_manager modules. The embedding module is mainly responsible for converting text into semantic vector representations, it transforms user queries into vector form.The rank module is used for sorting and evaluating the similarity of the recalled vectors. The data_manager module is primarily used for managing the database. In order to better facilitate industrial applications, we have made architectural and functional upgrades as follows:
4+
5+
## Architecture
6+
7+
![modelcache modules](modelcache_modules_20240409.png)
8+
9+
## Function comparison
10+
11+
We've implemented several key updates to our repository. We've resolved network issues with Hugging Face and improved inference speed by introducing local embedding capabilities. Due to limitations in SqlAlchemy, we've redesigned our relational database interaction module for more flexible operations. We've added multi-tenancy support to ModelCache, recognizing the need for multiple users and models in LLM products. Lastly, we've made initial adjustments for better compatibility with system commands and multi-turn dialogues.
12+
13+
<table>
14+
<tr>
15+
<th rowspan="2">Module</th>
16+
<th rowspan="2">Function</th>
17+
18+
</tr>
19+
<tr>
20+
<th>ModelCache</th>
21+
<th>GPTCache</th>
22+
</tr>
23+
<tr>
24+
<td rowspan="2">Basic Interface</td>
25+
<td>Data query interface</td>
26+
<td class="checkmark">&#9745; </td>
27+
<td class="checkmark">&#9745; </td>
28+
</tr>
29+
<tr>
30+
<td>Data writing interface</td>
31+
<td class="checkmark">&#9745; </td>
32+
<td class="checkmark">&#9745; </td>
33+
</tr>
34+
<tr>
35+
<td rowspan="3">Embedding</td>
36+
<td>Embedding model configuration</td>
37+
<td class="checkmark">&#9745; </td>
38+
<td class="checkmark">&#9745; </td>
39+
</tr>
40+
<tr>
41+
<td>Large model embedding layer</td>
42+
<td class="checkmark">&#9745; </td>
43+
<td></td>
44+
</tr>
45+
<tr>
46+
<td>BERT model long text processing</td>
47+
<td class="checkmark">&#9745; </td>
48+
<td></td>
49+
</tr>
50+
<tr>
51+
<td rowspan="2">Large model invocation</td>
52+
<td>Decoupling from large models</td>
53+
<td class="checkmark">&#9745; </td>
54+
<td></td>
55+
</tr>
56+
<tr>
57+
<td>Local loading of embedding model</td>
58+
<td class="checkmark">&#9745; </td>
59+
<td></td>
60+
</tr>
61+
<tr>
62+
<td rowspan="2">Data isolation</td>
63+
<td>Model data isolation</td>
64+
<td class="checkmark">&#9745; </td>
65+
<td class="checkmark">&#9745; </td>
66+
</tr>
67+
<tr>
68+
<td>Hyperparameter isolation</td>
69+
<td></td>
70+
<td></td>
71+
</tr>
72+
<tr>
73+
<td rowspan="3">Databases</td>
74+
<td>MySQL</td>
75+
<td class="checkmark">&#9745; </td>
76+
<td class="checkmark">&#9745; </td>
77+
</tr>
78+
<tr>
79+
<td>Milvus</td>
80+
<td class="checkmark">&#9745; </td>
81+
<td class="checkmark">&#9745; </td>
82+
</tr>
83+
<tr>
84+
<td>OceanBase</td>
85+
<td class="checkmark">&#9745; </td>
86+
<td></td>
87+
</tr>
88+
<tr>
89+
<td rowspan="3">Session management</td>
90+
<td>Single-turn dialogue</td>
91+
<td class="checkmark">&#9745; </td>
92+
<td class="checkmark">&#9745; </td>
93+
</tr>
94+
<tr>
95+
<td>System commands</td>
96+
<td class="checkmark">&#9745; </td>
97+
<td></td>
98+
</tr>
99+
<tr>
100+
<td>Multi-turn dialogue</td>
101+
<td class="checkmark">&#9745; </td>
102+
<td></td>
103+
</tr>
104+
<tr>
105+
<td rowspan="2">Data management</td>
106+
<td>Data persistence</td>
107+
<td class="checkmark">&#9745; </td>
108+
<td class="checkmark">&#9745; </td>
109+
</tr>
110+
<tr>
111+
<td>One-click cache clearance</td>
112+
<td class="checkmark">&#9745; </td>
113+
<td></td>
114+
</tr>
115+
<tr>
116+
<td rowspan="2">Tenant management</td>
117+
<td>Support for multi-tenancy</td>
118+
<td class="checkmark">&#9745; </td>
119+
<td></td>
120+
</tr>
121+
<tr>
122+
<td>Milvus multi-collection capability</td>
123+
<td class="checkmark">&#9745; </td>
124+
<td></td>
125+
</tr>
126+
<tr>
127+
<td>Other</td>
128+
<td>Long-short dialogue distinction</td>
129+
<td class="checkmark">&#9745; </td>
130+
<td></td>
131+
</tr>
132+
</table>

docs/2.model-cache-features.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# ModelCache features
2+
3+
This topic describes ModelCache features. In ModelCache, we incorporated the core principles of GPTCache. ModelCache has four modules: adapter, embedding, similarity, and data_manager.
4+
5+
- The adapter module orchestrates the business logic for various tasks, integrate the embedding, similarity, and data_manager modules.
6+
- The embedding module converts text into semantic vector representations, and transforms user queries into vectors.
7+
- The rank module ranks and evaluate the similarity of recalled vectors.
8+
- The data_manager module manages the databases.
9+
10+
To make ModelCache more suitable for industrial use, we made several improvements to its architecture and functionality:
11+
12+
- [x] Architectural adjustment (lightweight integration):
13+
- Embedded into LLM products using a Redis-like caching mode.
14+
- Provided semantic caching without interfering with LLM calls, security audits, and other functions.
15+
- Compatible with all LLM services.
16+
- [x] Multiple model loading:
17+
- Supported local embedding model loading, and resolved Hugging Face network connectivity issues.
18+
- Supported loading embedding layers from various pre-trained models.
19+
- [x] Data isolation
20+
- Environment isolation: Read different database configurations based on the environment. Isolate development, staging, and production environments.
21+
- Multi-tenant data isolation: Dynamically create collections based on models for data isolation, addressing data separation issues in multi-model/service scenarios within large language model products.
22+
- [x] Supported system instruction: Adopted a concatenation approach to resolve issues with system instructions in the prompt paradigm.
23+
- [x] Long and short text differentiation: Long texts bring more challenges for similarity assessment. Added differentiation between long and short texts, allowing for separate threshold configurations.
24+
- [x] Milvus performance optimization: Adjusted Milvus consistency level to "Session" level for better performance.
25+
- [x] Data management:
26+
- One-click cache clearing to enable easy data management after model upgrades.
27+
- Recall of hit queries for subsequent data analysis and model iteration reference.
28+
- Asynchronous log write-back for data analysis and statistics.
29+
- Added model field and data statistics field to enhance features.

docs/3.model-cache-quick-start.md

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
# Quick start
2+
3+
This topic describes how to set up and use ModelCache.
4+
5+
You can find the start script in `flask4modelcache.py` and `flask4modelcache_demo.py`.
6+
7+
- `flask4modelcache_demo.py`: A quick test service that embeds SQLite and FAISS. No database configuration required.
8+
- `flask4modelcache.py`: The standard service that requires MySQL and Milvus configuration.
9+
10+
## Dependencies
11+
12+
- Python: V3.8 or above
13+
- Package installation
14+
15+
```shell
16+
pip install -r requirements.txt
17+
```
18+
19+
## Start service
20+
21+
### Start demo
22+
23+
1. Download the embedding model bin file from [Hugging Face](https://huggingface.co/shibing624/text2vec-base-chinese/tree/main). Place it in the `model/text2vec-base-chinese` folder.
24+
2. Start the backend service:
25+
26+
```shell
27+
cd CodeFuse-ModelCache
28+
```
29+
30+
```shell
31+
python flask4modelcache_demo.py
32+
```
33+
34+
### Start standard service
35+
36+
Before you start standard service, do these steps:
37+
38+
1. Install MySQL and import the SQL file from `reference_doc/create_table.sql`.
39+
2. Install vector database Milvus.
40+
3. Configure database access in:
41+
- `modelcache/config/milvus_config.ini`
42+
- `modelcache/config/mysql_config.ini`
43+
4. Download the embedding model bin file from [Hugging Face](https://huggingface.co/shibing624/text2vec-base-chinese/tree/main). Put it in `model/text2vec-base-chinese`.
44+
5. Start the backend service:
45+
46+
```bash
47+
python flask4modelcache.py
48+
```
49+
50+
## Visit the service
51+
52+
The service provides three core RESTful API functionalities: Cache-Writing, Cache-Querying, and Cache-Clearing.
53+
54+
### Write cache
55+
56+
```python
57+
import json
58+
import requests
59+
url = 'http://127.0.0.1:5000/modelcache'
60+
type = 'insert'
61+
scope = {"model": "CODEGPT-1008"}
62+
chat_info = [{"query": [{"role": "system", "content": "You are an AI code assistant and you must provide neutral and harmless answers to help users solve code-related problems."}, {"role": "user", "content": "你是谁?"}],
63+
"answer": "Hello, I am an intelligent assistant. How can I assist you?"}]
64+
data = {'type': type, 'scope': scope, 'chat_info': chat_info}
65+
headers = {"Content-Type": "application/json"}
66+
res = requests.post(url, headers=headers, json=json.dumps(data))
67+
```
68+
69+
### Query cache
70+
71+
```python
72+
import json
73+
import requests
74+
url = 'http://127.0.0.1:5000/modelcache'
75+
type = 'query'
76+
scope = {"model": "CODEGPT-1008"}
77+
query = [{"role": "system", "content": "You are an AI code assistant and you must provide neutral and harmless answers to help users solve code-related problems."}, {"role": "user", "content": "Who are you?"}]
78+
data = {'type': type, 'scope': scope, 'query': query}
79+
80+
headers = {"Content-Type": "application/json"}
81+
res = requests.post(url, headers=headers, json=json.dumps(data))
82+
```
83+
84+
### Clear cache
85+
86+
```python
87+
import json
88+
import requests
89+
url = 'http://127.0.0.1:5000/modelcache'
90+
type = 'remove'
91+
scope = {"model": "CODEGPT-1008"}
92+
remove_type = 'truncate_by_model'
93+
data = {'type': type, 'scope': scope, 'remove_type': remove_type}
94+
95+
headers = {"Content-Type": "application/json"}
96+
res = requests.post(url, headers=headers, json=json.dumps(data))
97+
```

0 commit comments

Comments
 (0)