PaddlePaddle
diff --git a/‎docs/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction_v3.md
+73-40 b/‎docs/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction_v3.md
+73-40
@@ -312,40 +312,14 @@ PaddleX 所提供的预训练的模型产线均可以快速体验效果，你可
 ### 2.2 本地体验
 在本地使用文档场景信息抽取v3产线前，请确保您已经按照[PaddleX本地安装教程](../../../installation/installation.md)完成了PaddleX的wheel包安装。
 
-首先需要配置获取 `PP-ChatOCRv3-doc` 产线的配置文件，可以通过以下命令获取：
-```bash
-paddlex --get_pipeline_config PP-ChatOCRv3-doc ./
-```
-
-执行上述命令后，配置文件会存储在当前路径下，打开配置文件，填写大语言模型的 ak/sk(access_token)，如下所示：
-
-```yaml
-......
-SubModules:
-  LLM_Chat:
-    module_name: chat_bot
-    model_name: ernie-3.5
-    api_type: qianfan
-    ak: "" # Your LLM API key
-    sk: ""  # Your LLM secret key
-
-  LLM_Retriever:
-    module_name: retriever
-    model_name: ernie-3.5
-    api_type: qianfan
-    ak: "" # Your LLM API key
-    sk: ""  # Your LLM secret key
-......
-```
-
-PP-ChatOCRv3-doc 仅支持文心大模型，支持在[百度云千帆平台](https://console.bce.baidu.com/qianfan/ais/console/onlineService)或者[星河社区 AIStudio](https://aistudio.baidu.com/)上获取相关的 ak/sk(access_token)。如果使用百度云千帆平台，可以参考[AK和SK鉴权调用API流程](https://cloud.baidu.com/doc/WENXINWORKSHOP/s/Hlwerugt8) 获取ak/sk，如果使用星河社区 AIStudio，可以在[星河社区 AIStudio 访问令牌](https://aistudio.baidu.com/account/accessToken)中获取 access_token。
+在进行模型推理之前，首先需要准备大语言模型的 api_key，PP-ChatOCRv3 支持调用 [百度云千帆平台](https://console.bce.baidu.com/qianfan/ais/console/onlineService) 提供的大模型推理服务，您可以参考[认证鉴权](https://cloud.baidu.com/doc/WENXINWORKSHOP/s/Um2wxbaps) 获取千帆平台的 api_key。
 
 更新配置文件后，即可使用几行Python代码完成快速推理，可以使用 [测试文件](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/vehicle_certificate-1.png)测试：
 
 ```python
 from paddlex import create_pipeline
 
-pipeline = create_pipeline(pipeline="./PP-ChatOCRv3-doc.yaml")
+pipeline = create_pipeline(pipeline="PP-ChatOCRv3-doc",initial_predictor=False)
 
 visual_predict_res = pipeline.visual_predict(input="vehicle_certificate-1.png",
     use_doc_orientation_classify=False,
@@ -359,8 +333,32 @@ for res in visual_predict_res:
     visual_info_list.append(res["visual_info"])
     layout_parsing_result = res["layout_parsing_result"]
 
-vector_info = pipeline.build_vector(visual_info_list, flag_save_bytes_vector=True)
-chat_result = pipeline.chat(key_list=["驾驶室准乘人数"], visual_info=visual_info_list, vector_info=vector_info)
+vector_info = pipeline.build_vector(visual_info_list, flag_save_bytes_vector=True,retriever_config={
+    "module_name": "retriever",
+    "model_name": "embedding-v1",
+    "base_url": "https://qianfan.baidubce.com/v2",
+    "api_type": "qianfan",
+    "api_key": "api_key" # your api_key
+})
+chat_result = pipeline.chat(
+    key_list=["驾驶室准乘人数"],
+    visual_info_list=visual_info_list,
+    vector_info=vector_info,
+    chat_bot_config={
+      "module_name": "chat_bot",
+      "model_name": "ernie-3.5-8k",
+      "base_url": "https://qianfan.baidubce.com/v2",
+      "api_type": "openai",
+      "api_key": "api_key" # your api_key
+    },
+    retriever_config={
+        "module_name": "retriever",
+        "model_name": "embedding-v1",
+        "base_url": "https://qianfan.baidubce.com/v2",
+        "api_type": "qianfan",
+        "api_key": "api_key" # your api_key
+    }
+)
 print(chat_result)
 
 ```
@@ -411,6 +409,12 @@ PP-ChatOCRv3-doc 预测的流程、API说明、产出说明如下：
 <td><code>bool</code></td>
 <td><code>False</code></td>
 </tr>
+<tr>
+<td><code>initial_predictor</code></td>
+<td>是否初始化推理模块（如果为<code>False</code>则在首次使用相关推理模块的时候进行初始化）</td>
+<td><code>bool</code></td>
+<td><code>True</code></td>
+</tr>
 </tbody>
 </table>
 </details>
@@ -821,9 +825,9 @@ for res in visual_predict_res:
         - `use_formula_recognition`: `(bool)` 控制是否启用公式识别子产线
 
     - `parsing_res_list`: `(List[Dict])` 解析结果的列表，每个元素为一个字典，列表顺序为解析后的阅读顺序。
-        - `layout_bbox`: `(np.ndarray)` 版面区域的边界框。
-        - `{label}`: `(str)` key 为版面区域的标签，例如`text`, `table`等，内容为版面区域内的内容。
-        - `layout`: `(str)` 版面排版类型，例如 `double`, `single` 等。
+        - `block_bbox`: `(np.ndarray)` 版面区域的边界框。
+        - `block_label`: `(str)` 版面区域的标签，例如`text`, `table`等。
+        - `block_content`: `(str)` 内容为版面区域内的内容。
 
     - `overall_ocr_res`: `(Dict[str, Union[List[str], List[float], numpy.ndarray]])` 全局 OCR 结果的字典
       -  `input_path`: `(Union[str, None])` 图像OCR子产线接受的图像路径，当输入为`numpy.ndarray`时，保存为`None`
@@ -845,12 +849,6 @@ for res in visual_predict_res:
       - `rec_scores`: `(List[float])` 文本识别的置信度列表，已按`text_rec_score_thresh`过滤
       - `rec_polys`: `(List[numpy.ndarray])` 经过置信度过滤的文本检测框列表，格式同`dt_polys`
 
-    - `text_paragraphs_ocr_res`: `(Dict[str, Union[List[str], List[float], numpy.ndarray]])` 段落OCR结果，版面类型非表格、印章和公式类型的段落OCR结果
-        - `rec_polys`: `(List[numpy.ndarray])` 文本检测框列表，格式同`dt_polys`
-        - `rec_texts`: `(List[str])` 文本识别结果列表
-        - `rec_scores`: `(List[float])` 文本识别结果的置信度列表
-        - `rec_boxes`: `(numpy.ndarray)` 检测框的矩形边界框数组，shape为(n, 4)，dtype为int16。每一行表示一个
-
     - `formula_res_list`: `(List[Dict[str, Union[numpy.ndarray, List[float], str]]])` 公式识别结果列表，每个元素为一个字典
         - `rec_formula`: `(str)` 公式识别结果
         - `rec_polys`: `(numpy.ndarray)` 公式检测框，shape为(4, 2)，dtype为int16
@@ -938,6 +936,14 @@ for res in visual_predict_res:
 <td><code>3500</code></td>
 </tr>
 <tr>
+<td><code>block_size</code></td>
+<td>长文本建立向量库时分块大小</td>
+<td><code>int</code></td>
+<td>
+大于0的正整数，可以根据大语言模型支持的token长度来决定
+</td>
+<td><code>300</code></td>
+</tr>
 <tr>
 <td><code>flag_save_bytes_vector</code></td>
 <td>文字是否保存为二进制文件</td>
@@ -947,7 +953,16 @@ for res in visual_predict_res:
 </td>
 <td><code>False</code></td>
 </tr>
-</tr></table>
+<tr>
+<td><code>retriever_config</code></td>
+<td>向量检索大模型配置参数,内容参考配置文件中的“LLM_Retriever”字段</td>
+<td><code>dict</code></td>
+<td>
+<code>None</code>
+</td>
+<td><code>None</code></td>
+</tr>
+</table>
 该方法会返回一个包含视觉文本信息的字典，字典的内容如下：
 
 - `flag_save_bytes_vector`：`(bool)`是否将结果保存为二进制文件
@@ -1075,6 +1090,24 @@ for res in visual_predict_res:
 <td><code>None</code></td>
 <td><code>None</code></td>
 </tr>
+<tr>
+<td><code>chat_bot_config</code></td>
+<td>大语言模型配置信息，内容参考产线配置文件“LLM_Chat”字段</td>
+<td><code>dict</code></td>
+<td>
+<code>None</code>
+</td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>retriever_config</code></td>
+<td>向量检索大模型配置参数,内容参考配置文件中的“LLM_Retriever”字段</td>
+<td><code>dict</code></td>
+<td>
+<code>None</code>
+</td>
+<td><code>None</code></td>
+</tr>
 </tbody>
 </table>