Skip to content

Conversation

mayya-sharipova
Copy link
Contributor

Add GPUPlugin that build vector indices on GPU.
Currently available for "hnsw" and "int8_hnws" types.

It uses Nvidia cuvs library that should be available on a node.
GPUPlugin is under the feature flag for 9.2 release.

mayya-sharipova and others added 30 commits June 17, 2025 14:41
- Created GPUVectorsFormat that  write/read of flat vectors
- Added a new index_options: gpu for dense_vector field that is under
the feature flag
./gradlew ":x-pack:plugin:gpu:yamlRestTest" --tests "org.elasticsearch.xpack.gpu.GPUYamlTestSuiteIT.test {p0=gpu/10_basic/*}"
First save Cagra index to hnswlib format on disk.
Read this disk file to serialize to Lucene HNSW format.
Plugins can provide VectorsFormatProvider that provides
new KnnVectorsFormat for different VectorIndexTypes.
If there formats provided by plugins they are used
instead of standard
index.vectors.indexing.use_gpu has 3 options:
- auto (null) default: use gpu indexing when available
- false: don't use gpu indexing
- true: use gpu indexing and if not available, throw an error
…ix (#132832)

This PR updates cuvs-java dependency to 25.10 (I left 25.08 and updated its verification metadata to the final version for convenience in case we want to go back).

It uses CuVSMatrix as a way to transfer data efficiently from GPU memory to the Java heap directly (and then to a Lucene file). I tried to keep changes at a minimum, but some restructuring was necessary (e.g. resource management need to be done at a upper level - we need to keep hold of the resource until we finished reading the CuVSMatrix).
@ChrisHegarty
Copy link
Contributor

I can't tell, but when the test-gpu label is added, do we get all the usual CI test + the testing on GPU? Or does it replace existing tests?


@BeforeClass
public static void checkGPUSupport() {
assumeTrue("cuvs not supported", GPUSupport.isSupported(false));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do tests need to check the feature flag too? I just curious what these do when tested against test-release ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, we should have a couple of tests that asserts that you cannot create GPU things with the feature flag off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more tests in 543fafe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added "test-release" label for the PR. So I guess it will test when FF is disabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is strange when I put "test-release", elasticsearch-ci would not run with a message "Pipeline upload rejected: You can only change the pipeline of a running build". So I've eventually deleted "test-release" label.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, we have such tests in GPUPluginInitializationIT

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good!

I didn't review the native integrations, just format stuff.

I would like tests that verify that you cannot utilize the format if es.vectors_indexing_use_gpu_feature_flag_enable isn't set :).

@mayya-sharipova mayya-sharipova added the test-release Trigger CI checks against release build label Sep 29, 2025
@mayya-sharipova
Copy link
Contributor Author

@elasticsearchmachine test this please

@mayya-sharipova
Copy link
Contributor Author

@elasticsearchmachine test this please

@mayya-sharipova mayya-sharipova removed the test-release Trigger CI checks against release build label Sep 30, 2025
* HNSW graph is built on GPU, while scalar quantization and search is performed on CPU.
*/
public class ES92GpuHnswSQVectorsFormat extends KnnVectorsFormat {
public static final String NAME = "Lucene99HnswVectorsFormat";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the name be ES814ScalarQuantizedVectorsFormat ? I'm surprised that tests do not fail with this. Maybe I'm wrong, no?

Copy link
Contributor Author

@mayya-sharipova mayya-sharipova Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ES814ScalarQuantizedVectorsFormat is a flat format that is used inside it, but by itself it is still Lucene99HnswVectorsFormat.

Lucene99HnswVectorsFormat doesn't concern itself how flat format stores data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, ok then. I honestly confuse myself about these names all the time! Sorry for the noise.

@ChrisHegarty ChrisHegarty added test-release Trigger CI checks against release build and removed test-gpu Run tests using a GPU labels Sep 30, 2025
@ChrisHegarty
Copy link
Contributor

@elasticsearchmachine test this please

@ChrisHegarty
Copy link
Contributor

I temporarily removed the test-gpu label and replaced it with test-release. I just want to get a CI run with release testing, and we can remove test-release and restore test-gpu.

@mayya-sharipova mayya-sharipova added the test-gpu Run tests using a GPU label Sep 30, 2025
@mayya-sharipova mayya-sharipova merged commit 9faa068 into main Sep 30, 2025
36 checks passed
@mayya-sharipova mayya-sharipova deleted the es-gpu branch September 30, 2025 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch test-gpu Run tests using a GPU test-release Trigger CI checks against release build v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants