-
Notifications
You must be signed in to change notification settings - Fork 76
[#714] feat(Java): Chunk Info Reader #714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If m
lines of data is split into n
files:
- First
n-1
files havechunkSize
(get fromVertexInfo/EdgeInfo getChunkSize()
) entries each - Last file has the remainder:
m - (n-1)*chunkSize
So we don’t need to read file record count.
return false; | ||
} | ||
|
||
public String getPropertyGroupChunkPath(PropertyGroup propertyGroup, long chunkIndex) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why move from VertexInfo.java
to here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems to be more useful in this class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting related functions in the same file helps organize the code better, increases cohesion, reduces coupling, and makes future maintenance easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is already refactored
// TODO check equality test for type | ||
String type = propertyGroup.getFileType().toString(); | ||
numberOfParts = vertexInfo.getChunkSize() | ||
chunkBasePath = vertexInfo.getPropertyGroupPrefix() + "/part"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "part" or "chunk"? Please confirm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see "chunk" for vertex logical table here https://graphar.apache.org/docs/specification/format/#physical-table-of-vertices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies — the docs are a bit outdated. I’ll update them as soon as possible.
In the meantime, you can check the latest data format in this repo: https://github.com/apache/incubator-graphar-testing
maven-projects/info/src/main/java/org/apache/graphar/info/FileReader.java
Outdated
Show resolved
Hide resolved
How to know about "m"? the count of the data file associated with the vertex? |
It can be read from vertex_count. |
@yangxk1 pls provide review and merge |
@yangxk1 pls allow CI to re-trigger automatically when a new code is commited by me so that I don't have to wait for you to do it manually. Thanks. |
To ensure security, workflow execution for first-time contributors requires approval from a project committer. You can change the |
@yangxk1 thanks for merging the PR on the version parsing, pls also do the same for this PR to let us merge this. PS: Allow edits by maintainers is enabled. |
This PR does not use protobuf, so it is not that urgent. I will still help you as soon as possible. |
@yangxk1 pls approve and merge this or let me know the changes required to merge before the next release so it can be included in the next release. |
Hi @unical1988 ,I think you should open an issue to discuss the necessity of this pr. |
@yangxk1 i already describe here whats the pr intended for, couldnt be clearer |
You need to think about these:
Open an Issue focus on discussion rather than description. This is also the specification of the CONTRIBUTING document. |
@yangxk1 i can and will add tests anything else has been discussed here, pls note that this is implemented in C++ |
|
This PR introduces a set of utility methods designed to efficiently read information pertaining to graph vertex chunks. These utilities will streamline the process of accessing and interpreting how vertices are grouped and stored, which is crucial for optimizing graph processing and analysis tasks.
Specifically, this PR provides:
Methods to query and retrieve metadata about vertex chunks.
Functions to read the boundaries and contents of individual vertex chunks.
Utilities to facilitate the navigation and processing of vertex data in a chunked manner.
This enhancement will benefit features requiring granular access to graph vertex data, such as distributed graph algorithms, incremental graph updates, and optimized data loading.