Skip to content

Make sentence_transformers an optional dependency #674

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/docs/getting_started/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ We'll need to install a bunch of dependencies for this project.
1. Install CocoIndex:

```bash
pip install -U cocoindex
pip install -U 'cocoindex[embeddings]'
```

2. You can skip this step if you already have a Postgres database with pgvector extension installed.
Expand Down
9 changes: 9 additions & 0 deletions docs/docs/ops/functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,15 @@ Return: [*KTable*](/docs/core/data_types#ktable), each row represents a chunk, w

`SentenceTransformerEmbed` embeds a text into a vector space using the [SentenceTransformer](https://huggingface.co/sentence-transformers) library.

:::note Optional Dependency Required

This function requires the 'sentence-transformers' library, which is an optional dependency. Install CocoIndex with:

```bash
pip install 'cocoindex[embeddings]'
```
:::

The spec takes the following fields:

* `model` (`str`): The name of the SentenceTransformer model to use.
Expand Down
2 changes: 1 addition & 1 deletion examples/amazon_s3_embedding/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ name = "amazon-s3-text-embedding"
version = "0.1.0"
description = "Simple example for cocoindex: build embedding index based on Amazon S3 files."
requires-python = ">=3.11"
dependencies = ["cocoindex>=0.1.52", "python-dotenv>=1.0.1"]
dependencies = ["cocoindex[embeddings]>=0.1.52", "python-dotenv>=1.0.1"]

[tool.setuptools]
packages = []
2 changes: 1 addition & 1 deletion examples/code_embedding/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ name = "code-embedding"
version = "0.1.0"
description = "Simple example for cocoindex: build embedding index based on source code."
requires-python = ">=3.11"
dependencies = ["cocoindex>=0.1.56", "python-dotenv>=1.0.1"]
dependencies = ["cocoindex[embeddings]>=0.1.56", "python-dotenv>=1.0.1"]

[tool.setuptools]
packages = []
2 changes: 1 addition & 1 deletion examples/fastapi_server_docker/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
cocoindex>=0.1.52
cocoindex[embeddings]>=0.1.52
python-dotenv>=1.0.1
fastapi==0.115.12
fastapi-cli==0.0.7
Expand Down
2 changes: 1 addition & 1 deletion examples/gdrive_text_embedding/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ name = "gdrive-text-embedding"
version = "0.1.0"
description = "Simple example for cocoindex: build embedding index based on Google Drive files."
requires-python = ">=3.11"
dependencies = ["cocoindex>=0.1.52", "python-dotenv>=1.0.1"]
dependencies = ["cocoindex[embeddings]>=0.1.52", "python-dotenv>=1.0.1"]

[tool.setuptools]
packages = []
2 changes: 1 addition & 1 deletion examples/pdf_embedding/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ version = "0.1.0"
description = "Simple example for cocoindex: build embedding index based on local PDF files."
requires-python = ">=3.11"
dependencies = [
"cocoindex>=0.1.52",
"cocoindex[embeddings]>=0.1.52",
"python-dotenv>=1.0.1",
"marker-pdf>=1.5.2",
]
Expand Down
2 changes: 1 addition & 1 deletion examples/text_embedding/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ version = "0.1.0"
description = "Simple example for cocoindex: build embedding index based on local text files."
requires-python = ">=3.11"
dependencies = [
"cocoindex>=0.1.52",
"cocoindex[embeddings]>=0.1.52",
"python-dotenv>=1.0.1",
"pgvector>=0.4.1",
"psycopg[binary,pool]",
Expand Down
1 change: 1 addition & 0 deletions examples/text_embedding_qdrant/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ We use Qdrant client to query the index, and reuse the embedding operation in th
pip install -e .
```


- Setup:

```bash
Expand Down
2 changes: 1 addition & 1 deletion examples/text_embedding_qdrant/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ version = "0.1.0"
description = "Simple example for cocoindex: build embedding index based on local text files."
requires-python = ">=3.11"
dependencies = [
"cocoindex>=0.1.52",
"cocoindex[embeddings]>=0.1.52",
"python-dotenv>=1.0.1",
"qdrant-client>=1.6.0",
]
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ authors = [{ name = "CocoIndex", email = "cocoindex.io@gmail.com" }]
readme = "README.md"
requires-python = ">=3.11"
dependencies = [
"sentence-transformers>=3.3.1",
"click>=8.1.8",
"rich>=14.0.0",
"python-dotenv>=1.1.0",
Expand All @@ -31,6 +30,7 @@ features = ["pyo3/extension-module"]
[project.optional-dependencies]
test = ["pytest"]
dev = ["ruff", "pre-commit"]
embeddings = ["sentence-transformers>=3.3.1"]

[tool.mypy]
python_version = "3.11"
Expand Down
20 changes: 20 additions & 0 deletions python/cocoindex/functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,14 @@
if TYPE_CHECKING:
import sentence_transformers

# Check if sentence_transformers is available
try:
import sentence_transformers

_SENTENCE_TRANSFORMERS_AVAILABLE = True
except ImportError:
_SENTENCE_TRANSFORMERS_AVAILABLE = False


class ParseJson(op.FunctionSpec):
"""Parse a text into a JSON object."""
Expand Down Expand Up @@ -58,6 +66,10 @@ class SentenceTransformerEmbed(op.FunctionSpec):

model: The name of the SentenceTransformer model to use.
args: Additional arguments to pass to the SentenceTransformer constructor. e.g. {"trust_remote_code": True}

Note:
This function requires the optional sentence-transformers dependency.
Install it with: pip install 'cocoindex[embeddings]'
"""

model: str
Expand All @@ -72,6 +84,14 @@ class SentenceTransformerEmbedExecutor:
_model: "sentence_transformers.SentenceTransformer"

def analyze(self, text: Any) -> type:
if not _SENTENCE_TRANSFORMERS_AVAILABLE:
raise ImportError(
"sentence_transformers is required for SentenceTransformerEmbed function. "
"Install it with one of these commands:\n"
" pip install 'cocoindex[embeddings]'\n"
" pip install sentence-transformers"
)

import sentence_transformers # pylint: disable=import-outside-toplevel

args = self.spec.args or {}
Expand Down
Loading