-
Notifications
You must be signed in to change notification settings - Fork 48
AI update based on proto changes #641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ebef663
to
067cc83
Compare
067cc83
to
5e4ef0a
Compare
|
||
/** | ||
* CreateIndex starts a custom index build | ||
* | ||
* @example | ||
* | ||
* ```ts | ||
* await dataClient.createIndex( | ||
* '123abc45-1234-5678-90ab-cdef12345678', | ||
* IndexableCollection.HOT_STORE, | ||
* [new TextEncoder().encode(JSON.stringify({ field: 1 }))] | ||
* ); | ||
* ``` | ||
* | ||
* @param organizationId The ID of the organization | ||
* @param collectionType The type of collection to create the index on | ||
* @param indexSpec The MongoDB index specification in JSON format, as a | ||
* Uint8Array | ||
* @param pipelineName Optional name of the pipeline if collectionType is | ||
* PIPELINE_SINK | ||
*/ | ||
async createIndex( | ||
organizationId: string, | ||
collectionType: IndexableCollection, | ||
indexSpec: Uint8Array[], | ||
pipelineName?: string | ||
) { | ||
await this.dataClient.createIndex({ | ||
organizationId, | ||
collectionType, | ||
indexSpec, | ||
pipelineName, | ||
}); | ||
} | ||
|
||
/** | ||
* ListIndexes returns all the indexes for a given collection | ||
* | ||
* @example | ||
* | ||
* ```ts | ||
* const indexes = await dataClient.listIndexes( | ||
* '123abc45-1234-5678-90ab-cdef12345678', | ||
* IndexableCollection.HOT_STORE | ||
* ); | ||
* ``` | ||
* | ||
* @param organizationId The ID of the organization | ||
* @param collectionType The type of collection to list indexes for | ||
* @param pipelineName Optional name of the pipeline if collectionType is | ||
* PIPELINE_SINK | ||
* @returns An array of indexes | ||
*/ | ||
async listIndexes( | ||
organizationId: string, | ||
collectionType: IndexableCollection, | ||
pipelineName?: string | ||
): Promise<Index[]> { | ||
const resp = await this.dataClient.listIndexes({ | ||
organizationId, | ||
collectionType, | ||
pipelineName, | ||
}); | ||
return resp.indexes; | ||
} | ||
|
||
/** | ||
* DeleteIndex drops the specified custom index from a collection | ||
* | ||
* @example | ||
* | ||
* ```ts | ||
* await dataClient.deleteIndex( | ||
* '123abc45-1234-5678-90ab-cdef12345678', | ||
* IndexableCollection.HOT_STORE, | ||
* 'my_index' | ||
* ); | ||
* ``` | ||
* | ||
* @param organizationId The ID of the organization | ||
* @param collectionType The type of collection to delete the index from | ||
* @param indexName The name of the index to delete | ||
* @param pipelineName Optional name of the pipeline if collectionType is | ||
* PIPELINE_SINK | ||
*/ | ||
async deleteIndex( | ||
organizationId: string, | ||
collectionType: IndexableCollection, | ||
indexName: string, | ||
pipelineName?: string | ||
) { | ||
await this.dataClient.deleteIndex({ | ||
organizationId, | ||
collectionType, | ||
indexName, | ||
pipelineName, | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RobertXu @gloriacai01 can y'all check to see if this is correct? I saw y'all were the ones who added it to the protos.
Specifically the indexSpec
in createIndex
(since this is a mongo index, is there a more user-friendly way of accepting this data natively? we already include the bson library, so if the user can just pass in something else and then we can convert it to bytes internally, that would be better ux), and also the documentation overall
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Naveed,
Thanks for bringing this up. The choice to set indexSpec
as a byte array was following a pattern of other projects on the Data team. I can share the pattern and we can talk about whether it'd be useful to update it!
We have several Data APIs that deal with queries. When hitting those APIs through the cli, we ask users to provide a path to a JSON file that contains the MQL query. We then convert this to the BSON byte array for safety and to preserve field ordering. The indexSpec
field is following that pattern established for queries.
Right now the indexSpec
will only be used with the cli, though SDK support may come in the future, which would require converting from the index spec file into the byte array. I could see a world where we add this feature to the UI, and in that case we could also perform said conversion.
I agree that we want to make things user friendly. Are there ways of hitting the API where we end users would be forced to generate the byte array? A direct grpc is one example.
I'm definitely open to making a change. Just wanted to share how we got this point. I think some alternatives such as using a JSON string as long as possible and being careful around deserialization (to preserve ordering) could also work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this a little more, and I think I may be ok with passing around a JSON string and then converting to BSON right before it's used.
Let me chat with my team and get back to you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RobertXu Sorry I think I wasn't clear!
I don't mean we change the proto -- that can remain as a bytearray. I mean at the typescript CALL SITE. I think the user should be able to do something like
dataClient.createIndex("MY_ORG_ID", "PIPELINE_SINK", {"index_key": 1})
rather than what it currently is:
const myIndex = {"index_key": 1}
const indexBytes = bson.serialize(myIndex)
dataClient.createIndex("MY_ORG_ID", "PIPELINE_SINK", indexBytes)
In the former, the SDK will do the serialization, allowing the user to pass in a plain old object. In the latter case, the user will have to serialize themselves.
I have never used mongo indexes before, so I'm not sure what's possible, what the format should be, the type, etc.
But what I would like is for the callsite to be super ergonomic for the user, even as the proto remains using bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying! I'm still new to learning how our protos and SDKs interact.
It looks like this is an AI generated PR. Should I treat this like auto-generated code with a "DO NOT MODIFY" label? That's why I thought that modifying the protos was the way to go.
If we can make modifications here, then I agree that letting end users pass in a JS Object should work. The index_spec needs to have key ordering preserved. That should work with JS Objects (the one exception to preserving key ordering is with numeric keys, but they aren't really an issue for this project).
I agree that the SDK should do the serializaton for the user behind the scenes. What would making such a change involve? Is it as simple as changing the param list and applying the serialization logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RobertXu I can take care of making that change! Should the indexSpec
be an array of objects, or just a single object?
Also, definitely don't treat this as DO NOT MODIFY -- this is AI code and is bound to be incorrect in some places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, could you modify this to a single object? That'd be great, thanks! (Technically the index keys will be a bson.D at the end, but up front we ask for normal Object to mirror how people normally generate indexes through the nodeJS driver for MongoDB)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
One super small nit
For the test case 'creates an index'
, could you change the collectionType to IndexableCollection.PIPELINE_SINK
? It doesn't impact the test, but it's nice to be consistent with the collectionType and pipeline name arg. (If someone passes in a HOT_STORE collection type with a pipeline name, we will throw an error).
This is an AI-generated PR to update the SDK based on proto changes. The AI may make mistakes so review carefully.
Summary of Changes:
This update introduces index management capabilities to the DataClient, allowing users to create, list, and delete custom indexes on data collections. It also enhances the
tabularDataByMQL
method to support querying using a savedqueryPrefixName
. These changes involve updates to the DataClient implementation, its tests, and the underlying protobuf definitions.