Skip to content

Make sentence_transformers an optional dependency #674

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

theparthgupta
Copy link
Contributor

Summary

This PR makes sentence_transformers an optional dependency as requested in #636. The dependency is heavy and not needed for all use cases, so users can now choose whether to install it.

Changes Made

  • pyproject.toml: Moved sentence_transformers from required dependencies to optional dependencies under the sentence-transformers extra
  • python/cocoindex/features.py: Added conditional import logic to only define SentenceTransformerEmbed when the dependency is available
  • python/cocoindex/init.py: Updated to conditionally export SentenceTransformerEmbed only when available
  • examples/: Added installation instructions to all example files that use SentenceTransformerEmbed

Installation Options

Users can now install with:

# With sentence-transformers support
pip install 'cocoindex[sentence-transformers]'

# Or install the dependency separately
pip install sentence-transformers

Fixes #636

- Move sentence_transformers to optional dependencies in pyproject.toml
- Add conditional import in features.py
- Update __init__.py to only export SentenceTransformerEmbed when available
- Add installation instructions to examples

Fixes #636
Copy link
Member

@badmonster0 badmonster0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also help to update the documentation for the SentenceTransformerEmbed function?

@lemorage
Copy link
Contributor

Hey, personally, I feel that for the extra name, sentence-transformers is a bit long and does not quite make sense. Following PEP 621, Extras collect specific subsets of optional dependencies based on features. Maybe we can change it to some descriptive names like ml or embeddings. This way, it is also quite extensible for future changes.

- Add sentence-transformers to pyproject.toml dependencies in all 7 examples
- Update README files with installation instructions for SentenceTransformerEmbed
- Add clear dependency notes to examples that use sentence transformers

Part of #636
@theparthgupta
Copy link
Contributor Author

I've updated all 7 examples that use SentenceTransformerEmbed with the new optional dependency approach.

I noticed that the main README.md and several docs files also reference SentenceTransformerEmbed. Should I update those files with the new installation instructions as well, or would you prefer to handle the documentation updates yourself?

Files that might need updates:

  • README.md (main example uses SentenceTransformerEmbed)
  • docs/quickstart.md
  • docs/basics.md
  • docs/functions.md
  • docs/custom_function.mdx

@badmonster0
Copy link
Member

Hey, personally, I feel that for the extra name, sentence-transformers is a bit long and does not quite make sense. Following PEP 621, Extras collect specific subsets of optional dependencies based on features. Maybe we can change it to some descriptive names like ml or embeddings. This way, it is also quite extensible for future changes.

Agree, sentence-transformers is indeed too long and probably too specific. +1 for using embedding as the feature name.

@badmonster0
Copy link
Member

I've updated all 7 examples that use SentenceTransformerEmbed with the new optional dependency approach.

I noticed that the main README.md and several docs files also reference SentenceTransformerEmbed. Should I update those files with the new installation instructions as well, or would you prefer to handle the documentation updates yourself?

Files that might need updates:

  • README.md (main example uses SentenceTransformerEmbed)
  • docs/quickstart.md
  • docs/basics.md
  • docs/functions.md
  • docs/custom_function.mdx

It'll be great if you can help to update these docs in the current PR:

  • docs/quickstart.md (in the instruction for pip install cocoindex, we need to add this new optional feature)
  • docs/functions.md (we need to tell users this SentenceTransformerEmbed function needs this new optional feature)

Other docs may be just mentioning SentenceTransformerEmbed. Seems they don't need to be updated.

README.md usually doesn't need to be updated either - after the dependency is added into pyproject.toml, the dependency will be automatically installed in pip install.

Thanks!

…mers dependency

- Update quickstart.md: change pip install command to include [sentence-transformers] extra
- Update functions.md: add optional dependency note for SentenceTransformerEmbed function
- Remove manual 'pip install sentence-transformers' instructions from all example README.md and main.py files
@theparthgupta
Copy link
Contributor Author

I've addressed all the feedback regarding documentation updates:
Changes made:

  • docs/quickstart.md: Updated pip install command to pip install -U 'cocoindex[sentence-transformers]'
  • docs/functions.md: Added optional dependency note for SentenceTransformerEmbed function with clear installation instructions
  • All 7 examples: Removed manual installation instructions from README.md and main.py files as requested

@badmonster0
Copy link
Member

Hey, personally, I feel that for the extra name, sentence-transformers is a bit long and does not quite make sense. Following PEP 621, Extras collect specific subsets of optional dependencies based on features. Maybe we can change it to some descriptive names like ml or embeddings. This way, it is also quite extensible for future changes.

Agree, sentence-transformers is indeed too long and probably too specific. +1 for using embedding as the feature name.

@theparthgupta what do you think about this? Would you also change the feature name to embedding? Thanks!

@theparthgupta
Copy link
Contributor Author

Hey, personally, I feel that for the extra name, sentence-transformers is a bit long and does not quite make sense. Following PEP 621, Extras collect specific subsets of optional dependencies based on features. Maybe we can change it to some descriptive names like ml or embeddings. This way, it is also quite extensible for future changes.

Agree, sentence-transformers is indeed too long and probably too specific. +1 for using embedding as the feature name.

@theparthgupta what do you think about this? Would you also change the feature name to embedding? Thanks!

Agreed on changing it! I'd lean toward embeddings (plural) since it typically handles multiple embeddings, but either embedding or embeddings works well. Much better than the current long name. If you agree i can work on this too

@badmonster0
Copy link
Member

embeddings

embeddings looks good. Thanks!

@theparthgupta
Copy link
Contributor Author

embeddings

embeddings looks good. Thanks!

Sounds good! I'll get started on the implementation.

- Updated pyproject.toml to use [embeddings] instead of [sentence-transformers] for the optional dependency group
- Replaced all install instructions using 'cocoindex[sentence-transformers]' with 'cocoindex[embeddings]' across docs and examples
-
@theparthgupta
Copy link
Contributor Author

I've updated the optional dependency group name from sentence-transformers to embeddings as discussed. Let me know if you'd like any other naming adjustments or additions

@badmonster0
Copy link
Member

Thanks for this change!

@badmonster0 badmonster0 merged commit 1f29ba3 into cocoindex-io:main Jun 30, 2025
8 checks passed
@theparthgupta
Copy link
Contributor Author

Thanks for this change!

thanks for merging the PR!

I noticed CI is failing on main after my PR was merged — looks like mypy can't find sentence_transformers or numpy. I think it's due to the optional import. Should I open a quick fix to update mypy.ini or CI deps?

@lemorage
Copy link
Contributor

@theparthgupta The CI failure is largely due to the missing numpy dep, after making sentence_transformers optional. I've made the fix, and feel free to check out #676.

@theparthgupta
Copy link
Contributor Author

@theparthgupta The CI failure is largely due to the missing numpy dep, after making sentence_transformers optional. I've made the fix, and feel free to check out #676.

thanks for doing it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants