Skip to content

Commit 5c6188d

Browse files
authored
docs(style): clean up style for data/spec field types (#669)
1 parent 30e507b commit 5c6188d

File tree

3 files changed

+70
-70
lines changed

3 files changed

+70
-70
lines changed

docs/docs/ops/functions.md

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,10 @@ description: CocoIndex Built-in Functions
1111

1212
The spec takes the following fields:
1313

14-
* `text` (type: `str`, required): The source text to parse.
15-
* `language` (type: `str`, optional): The language of the source text. Only `json` is supported now. Default to `json`.
14+
* `text` (`str`): The source text to parse.
15+
* `language` (`str`, optional): The language of the source text. Only `json` is supported now. Default to `json`.
1616

17-
Return type: `Json`
17+
Return: *Json*
1818

1919
## SplitRecursively
2020

@@ -64,7 +64,7 @@ Input data:
6464

6565
:::
6666

67-
Return type: [*KTable*](/docs/core/data_types#ktable), each row represents a chunk, with the following sub fields:
67+
Return: [*KTable*](/docs/core/data_types#ktable), each row represents a chunk, with the following sub fields:
6868

6969
* `location` (*Range*): The location of the chunk.
7070
* `text` (*Str*): The text of the chunk.
@@ -79,22 +79,22 @@ Return type: [*KTable*](/docs/core/data_types#ktable), each row represents a chu
7979

8080
The spec takes the following fields:
8181

82-
* `model` (type: `str`, required): The name of the SentenceTransformer model to use.
83-
* `args` (type: `dict[str, Any]`, optional): Additional arguments to pass to the SentenceTransformer constructor. e.g. `{"trust_remote_code": True}`
82+
* `model` (`str`): The name of the SentenceTransformer model to use.
83+
* `args` (`dict[str, Any]`, optional): Additional arguments to pass to the SentenceTransformer constructor. e.g. `{"trust_remote_code": True}`
8484

8585
Input data:
8686

87-
* `text` (type: `str`, required): The text to embed.
87+
* `text` (*Str*): The text to embed.
8888

89-
Return type: `vector[float32; N]`, where `N` is determined by the model
89+
Return: *Vector[Float32, N]*, where *N* is determined by the model
9090

9191
## ExtractByLlm
9292

9393
`ExtractByLlm` extracts structured information from a text using specified LLM. The spec takes the following fields:
9494

95-
* `llm_spec` (type: `cocoindex.LlmSpec`, required): The specification of the LLM to use. See [LLM Spec](/docs/ai/llm#llm-spec) for more details.
96-
* `output_type` (type: `type`, required): The type of the output. e.g. a dataclass type name. See [Data Types](/docs/core/data_types) for all supported data types. The LLM will output values that match the schema of the type.
97-
* `instruction` (type: `str`, optional): Additional instruction for the LLM.
95+
* `llm_spec` (`cocoindex.LlmSpec`): The specification of the LLM to use. See [LLM Spec](/docs/ai/llm#llm-spec) for more details.
96+
* `output_type` (`type`): The type of the output. e.g. a dataclass type name. See [Data Types](/docs/core/data_types) for all supported data types. The LLM will output values that match the schema of the type.
97+
* `instruction` (`str`, optional): Additional instruction for the LLM.
9898

9999
:::tip Clear type definitions
100100

@@ -109,25 +109,25 @@ To improve the quality of the extracted information, giving clear definitions fo
109109

110110
Input data:
111111

112-
* `text` (type: `str`, required): The text to extract information from.
112+
* `text` (*Str*): The text to extract information from.
113113

114-
Return type: As specified by the `output_type` field in the spec. The extracted information from the input text.
114+
Return: As specified by the `output_type` field in the spec. The extracted information from the input text.
115115

116116
## EmbedText
117117

118118
`EmbedText` embeds a text into a vector space using various LLM APIs that support text embedding.
119119

120120
The spec takes the following fields:
121121

122-
* `api_type` (type: [`cocoindex.LlmApiType`](/docs/ai/llm#llm-api-types), required): The type of LLM API to use for embedding.
123-
* `model` (type: `str`, required): The name of the embedding model to use.
124-
* `address` (type: `str`, optional): The address of the LLM API. If not specified, uses the default address for the API type.
125-
* `output_dimension` (type: `int`, optional): The expected dimension of the output embedding vector. If not specified, use the default dimension of the model.
122+
* `api_type` ([`cocoindex.LlmApiType`](/docs/ai/llm#llm-api-types)): The type of LLM API to use for embedding.
123+
* `model` (`str`): The name of the embedding model to use.
124+
* `address` (`str`, optional): The address of the LLM API. If not specified, uses the default address for the API type.
125+
* `output_dimension` (`int`, optional): The expected dimension of the output embedding vector. If not specified, use the default dimension of the model.
126126

127127
For most API types, the function internally keeps a registry for the default output dimension of known model.
128128
You need to explicitly specify the `output_dimension` if you want to use a new model that is not in the registry yet.
129129

130-
* `task_type` (type: `str`, optional): The task type for embedding, used by some embedding models to optimize the embedding for specific use cases.
130+
* `task_type` (`str`, optional): The task type for embedding, used by some embedding models to optimize the embedding for specific use cases.
131131

132132
:::note Supported APIs for Text Embedding
133133

@@ -137,6 +137,6 @@ Not all LLM APIs support text embedding. See the [LLM API Types table](/docs/ai/
137137

138138
Input data:
139139

140-
* `text` (type: `str`, required): The text to embed.
140+
* `text` (*Str*, required): The text to embed.
141141

142-
Return type: `vector[float32; N]`, where `N` is the dimension of the embedding vector determined by the model.
142+
Return: *Vector[Float32, N]*, where *N* is the dimension of the embedding vector determined by the model.

docs/docs/ops/sources.md

Lines changed: 25 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,11 @@ The `LocalFile` source imports files from a local file system.
1313
### Spec
1414

1515
The spec takes the following fields:
16-
* `path` (type: `str`, required): full path of the root directory to import files from
17-
* `binary` (type: `bool`, optional): whether reading files as binary (instead of text)
18-
* `included_patterns` (type: `list[str]`, optional): a list of glob patterns to include files, e.g. `["*.txt", "docs/**/*.md"]`.
16+
* `path` (`str`): full path of the root directory to import files from
17+
* `binary` (`bool`, optional): whether reading files as binary (instead of text)
18+
* `included_patterns` (`list[str]`, optional): a list of glob patterns to include files, e.g. `["*.txt", "docs/**/*.md"]`.
1919
If not specified, all files will be included.
20-
* `excluded_patterns` (type: `list[str]`, optional): a list of glob patterns to exclude files, e.g. `["tmp", "**/node_modules"]`.
20+
* `excluded_patterns` (`list[str]`, optional): a list of glob patterns to exclude files, e.g. `["tmp", "**/node_modules"]`.
2121
Any file or directory matching these patterns will be excluded even if they match `included_patterns`.
2222
If not specified, no files will be excluded.
2323

@@ -29,9 +29,9 @@ The spec takes the following fields:
2929

3030
### Schema
3131

32-
The output is a [KTable](/docs/core/data_types#ktable) with the following sub fields:
33-
* `filename` (key, type: `str`): the filename of the file, including the path, relative to the root directory, e.g. `"dir1/file1.md"`
34-
* `content` (type: `str` if `binary` is `False`, otherwise `bytes`): the content of the file
32+
The output is a [*KTable*](/docs/core/data_types#ktable) with the following sub fields:
33+
* `filename` (*Str*, key): the filename of the file, including the path, relative to the root directory, e.g. `"dir1/file1.md"`
34+
* `content` (*Str* if `binary` is `False`, *Bytes* otherwise): the content of the file
3535

3636
## AmazonS3
3737

@@ -121,12 +121,12 @@ AWS's [Guide of Configuring a Bucket for Notifications](https://docs.aws.amazon.
121121
### Spec
122122

123123
The spec takes the following fields:
124-
* `bucket_name` (type: `str`, required): Amazon S3 bucket name.
125-
* `prefix` (type: `str`, optional): if provided, only files with path starting with this prefix will be imported.
126-
* `binary` (type: `bool`, optional): whether reading files as binary (instead of text).
127-
* `included_patterns` (type: `list[str]`, optional): a list of glob patterns to include files, e.g. `["*.txt", "docs/**/*.md"]`.
124+
* `bucket_name` (`str`): Amazon S3 bucket name.
125+
* `prefix` (`str`, optional): if provided, only files with path starting with this prefix will be imported.
126+
* `binary` (`bool`, optional): whether reading files as binary (instead of text).
127+
* `included_patterns` (`list[str]`, optional): a list of glob patterns to include files, e.g. `["*.txt", "docs/**/*.md"]`.
128128
If not specified, all files will be included.
129-
* `excluded_patterns` (type: `list[str]`, optional): a list of glob patterns to exclude files, e.g. `["*.tmp", "**/*.log"]`.
129+
* `excluded_patterns` (`list[str]`, optional): a list of glob patterns to exclude files, e.g. `["*.tmp", "**/*.log"]`.
130130
Any file or directory matching these patterns will be excluded even if they match `included_patterns`.
131131
If not specified, no files will be excluded.
132132

@@ -136,7 +136,7 @@ The spec takes the following fields:
136136

137137
:::
138138

139-
* `sqs_queue_url` (type: `str`, optional): if provided, the source will receive change event notifications from Amazon S3 via this SQS queue.
139+
* `sqs_queue_url` (`str`, optional): if provided, the source will receive change event notifications from Amazon S3 via this SQS queue.
140140

141141
:::info
142142

@@ -147,9 +147,9 @@ The spec takes the following fields:
147147

148148
### Schema
149149

150-
The output is a [KTable](/docs/core/data_types#ktable) with the following sub fields:
151-
* `filename` (key, type: `str`): the filename of the file, including the path, relative to the root directory, e.g. `"dir1/file1.md"`.
152-
* `content` (type: `str` if `binary` is `False`, otherwise `bytes`): the content of the file.
150+
The output is a [*KTable*](/docs/core/data_types#ktable) with the following sub fields:
151+
* `filename` (*Str*, key): the filename of the file, including the path, relative to the root directory, e.g. `"dir1/file1.md"`.
152+
* `content` (*Str* if `binary` is `False`, otherwise *Bytes*): the content of the file.
153153

154154

155155
## GoogleDrive
@@ -176,10 +176,10 @@ To access files in Google Drive, the `GoogleDrive` source will need to authentic
176176

177177
The spec takes the following fields:
178178

179-
* `service_account_credential_path` (type: `str`, required): full path to the service account credential file in JSON format.
180-
* `root_folder_ids` (type: `list[str]`, required): a list of Google Drive folder IDs to import files from.
181-
* `binary` (type: `bool`, optional): whether reading files as binary (instead of text).
182-
* `recent_changes_poll_interval` (type: `datetime.timedelta`, optional): when set, this source provides a change capture mechanism by polling Google Drive for recent modified files periodically.
179+
* `service_account_credential_path` (`str`): full path to the service account credential file in JSON format.
180+
* `root_folder_ids` (`list[str]`): a list of Google Drive folder IDs to import files from.
181+
* `binary` (`bool`, optional): whether reading files as binary (instead of text).
182+
* `recent_changes_poll_interval` (`datetime.timedelta`, optional): when set, this source provides a change capture mechanism by polling Google Drive for recent modified files periodically.
183183

184184
:::info
185185

@@ -198,9 +198,9 @@ The spec takes the following fields:
198198

199199
### Schema
200200

201-
The output is a [KTable](/docs/core/data_types#ktable) with the following sub fields:
201+
The output is a [*KTable*](/docs/core/data_types#ktable) with the following sub fields:
202202

203-
* `file_id` (key, type: `str`): the ID of the file in Google Drive.
204-
* `filename` (type: `str`): the filename of the file, without the path, e.g. `"file1.md"`
205-
* `mime_type` (type: `str`): the MIME type of the file.
206-
* `content` (type: `str` if `binary` is `False`, otherwise `bytes`): the content of the file.
203+
* `file_id` (*Str*, key): the ID of the file in Google Drive.
204+
* `filename` (*Str*): the filename of the file, without the path, e.g. `"file1.md"`
205+
* `mime_type` (*Str*): the MIME type of the file.
206+
* `content` (*Str* if `binary` is `False`, otherwise *Bytes*): the content of the file.

0 commit comments

Comments
 (0)