docs(style): clean up style for data/spec field types (#669)

badmonster0 · web-flow · commit 5c6188dc1cfe · 2025-06-27T22:58:14.000-07:00
diff --git a/docs/docs/ops/functions.md b/docs/docs/ops/functions.md
@@ -11,10 +11,10 @@ description: CocoIndex Built-in Functions
 
 The spec takes the following fields:
 
-*   `text` (type: `str`, required): The source text to parse.
-*   `language` (type: `str`, optional): The language of the source text.  Only `json` is supported now.  Default to `json`.
+*   `text` (`str`): The source text to parse.
+*   `language` (`str`, optional): The language of the source text.  Only `json` is supported now.  Default to `json`.
 
-Return type: `Json`
+Return: *Json*
 
 ## SplitRecursively
 
@@ -64,7 +64,7 @@ Input data:
 
     :::
 
-Return type: [*KTable*](/docs/core/data_types#ktable), each row represents a chunk, with the following sub fields:
+Return: [*KTable*](/docs/core/data_types#ktable), each row represents a chunk, with the following sub fields:
 
 *   `location` (*Range*): The location of the chunk.
 *   `text` (*Str*): The text of the chunk.
@@ -79,22 +79,22 @@ Return type: [*KTable*](/docs/core/data_types#ktable), each row represents a chu
 
 The spec takes the following fields:
 
-*   `model` (type: `str`, required): The name of the SentenceTransformer model to use.
-*   `args` (type: `dict[str, Any]`, optional): Additional arguments to pass to the SentenceTransformer constructor. e.g. `{"trust_remote_code": True}`
+*   `model` (`str`): The name of the SentenceTransformer model to use.
+*   `args` (`dict[str, Any]`, optional): Additional arguments to pass to the SentenceTransformer constructor. e.g. `{"trust_remote_code": True}`
 
 Input data:
 
-*   `text` (type: `str`, required): The text to embed.
+*   `text` (*Str*): The text to embed.
 
-Return type: `vector[float32; N]`, where `N` is determined by the model
+Return: *Vector[Float32, N]*, where *N* is determined by the model
 
 ## ExtractByLlm
 
 `ExtractByLlm` extracts structured information from a text using specified LLM. The spec takes the following fields:
 
-*   `llm_spec` (type: `cocoindex.LlmSpec`, required): The specification of the LLM to use. See [LLM Spec](/docs/ai/llm#llm-spec) for more details.
-*   `output_type` (type: `type`, required): The type of the output. e.g. a dataclass type name. See [Data Types](/docs/core/data_types) for all supported data types. The LLM will output values that match the schema of the type.
-*   `instruction` (type: `str`, optional): Additional instruction for the LLM.
+*   `llm_spec` (`cocoindex.LlmSpec`): The specification of the LLM to use. See [LLM Spec](/docs/ai/llm#llm-spec) for more details.
+*   `output_type` (`type`): The type of the output. e.g. a dataclass type name. See [Data Types](/docs/core/data_types) for all supported data types. The LLM will output values that match the schema of the type.
+*   `instruction` (`str`, optional): Additional instruction for the LLM.
 
 :::tip Clear type definitions
 
@@ -109,25 +109,25 @@ To improve the quality of the extracted information, giving clear definitions fo
 
 Input data:
 
-*   `text` (type: `str`, required): The text to extract information from.
+*   `text` (*Str*): The text to extract information from.
 
-Return type: As specified by the `output_type` field in the spec. The extracted information from the input text.
+Return: As specified by the `output_type` field in the spec. The extracted information from the input text.
 
 ## EmbedText
 
 `EmbedText` embeds a text into a vector space using various LLM APIs that support text embedding.
 
 The spec takes the following fields:
 
-*   `api_type` (type: [`cocoindex.LlmApiType`](/docs/ai/llm#llm-api-types), required): The type of LLM API to use for embedding.
-*   `model` (type: `str`, required): The name of the embedding model to use.
-*   `address` (type: `str`, optional): The address of the LLM API. If not specified, uses the default address for the API type.
-*   `output_dimension` (type: `int`, optional): The expected dimension of the output embedding vector. If not specified, use the default dimension of the model.
+*   `api_type` ([`cocoindex.LlmApiType`](/docs/ai/llm#llm-api-types)): The type of LLM API to use for embedding.
+*   `model` (`str`): The name of the embedding model to use.
+*   `address` (`str`, optional): The address of the LLM API. If not specified, uses the default address for the API type.
+*   `output_dimension` (`int`, optional): The expected dimension of the output embedding vector. If not specified, use the default dimension of the model.
 
     For most API types, the function internally keeps a registry for the default output dimension of known model.
     You need to explicitly specify the `output_dimension` if you want to use a new model that is not in the registry yet.
 
-*   `task_type` (type: `str`, optional): The task type for embedding, used by some embedding models to optimize the embedding for specific use cases.
+*   `task_type` (`str`, optional): The task type for embedding, used by some embedding models to optimize the embedding for specific use cases.
 
 :::note Supported APIs for Text Embedding
 
@@ -137,6 +137,6 @@ Not all LLM APIs support text embedding. See the [LLM API Types table](/docs/ai/
 
 Input data:
 
-*   `text` (type: `str`, required): The text to embed.
+*   `text` (*Str*, required): The text to embed.
 
-Return type: `vector[float32; N]`, where `N` is the dimension of the embedding vector determined by the model.
+Return: *Vector[Float32, N]*, where *N* is the dimension of the embedding vector determined by the model.
diff --git a/docs/docs/ops/sources.md b/docs/docs/ops/sources.md
@@ -13,11 +13,11 @@ The `LocalFile` source imports files from a local file system.
 ### Spec
 
 The spec takes the following fields:
-*   `path` (type: `str`, required): full path of the root directory to import files from
-*   `binary` (type: `bool`, optional): whether reading files as binary (instead of text)
-*   `included_patterns` (type: `list[str]`, optional): a list of glob patterns to include files, e.g. `["*.txt", "docs/**/*.md"]`.
+*   `path` (`str`): full path of the root directory to import files from
+*   `binary` (`bool`, optional): whether reading files as binary (instead of text)
+*   `included_patterns` (`list[str]`, optional): a list of glob patterns to include files, e.g. `["*.txt", "docs/**/*.md"]`.
     If not specified, all files will be included.
-*   `excluded_patterns` (type: `list[str]`, optional): a list of glob patterns to exclude files, e.g. `["tmp", "**/node_modules"]`.
+*   `excluded_patterns` (`list[str]`, optional): a list of glob patterns to exclude files, e.g. `["tmp", "**/node_modules"]`.
     Any file or directory matching these patterns will be excluded even if they match `included_patterns`.
     If not specified, no files will be excluded.
 
@@ -29,9 +29,9 @@ The spec takes the following fields:
 
 ### Schema
 
-The output is a [KTable](/docs/core/data_types#ktable) with the following sub fields:
-*   `filename` (key, type: `str`): the filename of the file, including the path, relative to the root directory, e.g. `"dir1/file1.md"`
-*   `content` (type: `str` if `binary` is `False`, otherwise `bytes`): the content of the file
+The output is a [*KTable*](/docs/core/data_types#ktable) with the following sub fields:
+*   `filename` (*Str*, key): the filename of the file, including the path, relative to the root directory, e.g. `"dir1/file1.md"`
+*   `content` (*Str* if `binary` is `False`, *Bytes* otherwise): the content of the file
 
 ## AmazonS3
 
@@ -121,12 +121,12 @@ AWS's [Guide of Configuring a Bucket for Notifications](https://docs.aws.amazon.
 ### Spec
 
 The spec takes the following fields:
-*   `bucket_name` (type: `str`, required): Amazon S3 bucket name.
-*   `prefix` (type: `str`, optional): if provided, only files with path starting with this prefix will be imported.
-*   `binary` (type: `bool`, optional): whether reading files as binary (instead of text).
-*   `included_patterns` (type: `list[str]`, optional): a list of glob patterns to include files, e.g. `["*.txt", "docs/**/*.md"]`.
+*   `bucket_name` (`str`): Amazon S3 bucket name.
+*   `prefix` (`str`, optional): if provided, only files with path starting with this prefix will be imported.
+*   `binary` (`bool`, optional): whether reading files as binary (instead of text).
+*   `included_patterns` (`list[str]`, optional): a list of glob patterns to include files, e.g. `["*.txt", "docs/**/*.md"]`.
     If not specified, all files will be included.
-*   `excluded_patterns` (type: `list[str]`, optional): a list of glob patterns to exclude files, e.g. `["*.tmp", "**/*.log"]`.
+*   `excluded_patterns` (`list[str]`, optional): a list of glob patterns to exclude files, e.g. `["*.tmp", "**/*.log"]`.
     Any file or directory matching these patterns will be excluded even if they match `included_patterns`.
     If not specified, no files will be excluded.
 
@@ -136,7 +136,7 @@ The spec takes the following fields:
 
     :::
 
-*   `sqs_queue_url` (type: `str`, optional): if provided, the source will receive change event notifications from Amazon S3 via this SQS queue.
+*   `sqs_queue_url` (`str`, optional): if provided, the source will receive change event notifications from Amazon S3 via this SQS queue.
 
     :::info
 
@@ -147,9 +147,9 @@ The spec takes the following fields:
 
 ### Schema
 
-The output is a [KTable](/docs/core/data_types#ktable) with the following sub fields:
-*   `filename` (key, type: `str`): the filename of the file, including the path, relative to the root directory, e.g. `"dir1/file1.md"`.
-*   `content` (type: `str` if `binary` is `False`, otherwise `bytes`): the content of the file.
+The output is a [*KTable*](/docs/core/data_types#ktable) with the following sub fields:
+*   `filename` (*Str*, key): the filename of the file, including the path, relative to the root directory, e.g. `"dir1/file1.md"`.
+*   `content` (*Str* if `binary` is `False`, otherwise *Bytes*): the content of the file.
 
 
 ## GoogleDrive
@@ -176,10 +176,10 @@ To access files in Google Drive, the `GoogleDrive` source will need to authentic
 
 The spec takes the following fields:
 
-*   `service_account_credential_path` (type: `str`, required): full path to the service account credential file in JSON format.
-*   `root_folder_ids` (type: `list[str]`, required): a list of Google Drive folder IDs to import files from.
-*   `binary` (type: `bool`, optional): whether reading files as binary (instead of text).
-*   `recent_changes_poll_interval` (type: `datetime.timedelta`, optional): when set, this source provides a change capture mechanism by polling Google Drive for recent modified files periodically.
+*   `service_account_credential_path` (`str`): full path to the service account credential file in JSON format.
+*   `root_folder_ids` (`list[str]`): a list of Google Drive folder IDs to import files from.
+*   `binary` (`bool`, optional): whether reading files as binary (instead of text).
+*   `recent_changes_poll_interval` (`datetime.timedelta`, optional): when set, this source provides a change capture mechanism by polling Google Drive for recent modified files periodically.
 
     :::info
 
@@ -198,9 +198,9 @@ The spec takes the following fields:
 
 ### Schema
 
-The output is a [KTable](/docs/core/data_types#ktable) with the following sub fields:
+The output is a [*KTable*](/docs/core/data_types#ktable) with the following sub fields:
 
-*   `file_id` (key, type: `str`): the ID of the file in Google Drive.
-*   `filename` (type: `str`): the filename of the file, without the path, e.g. `"file1.md"`
-*   `mime_type` (type: `str`): the MIME type of the file.
-*   `content` (type: `str` if `binary` is `False`, otherwise `bytes`): the content of the file.
+*   `file_id` (*Str*, key): the ID of the file in Google Drive.
+*   `filename` (*Str*): the filename of the file, without the path, e.g. `"file1.md"`
+*   `mime_type` (*Str*): the MIME type of the file.
+*   `content` (*Str* if `binary` is `False`, otherwise *Bytes*): the content of the file.
diff --git a/docs/docs/ops/targets.md b/docs/docs/ops/targets.md