[#714] feat(Java): Chunk Info Reader #714

sapienza88 · 2025-06-28T18:32:19Z

This PR introduces a set of utility methods designed to efficiently read information pertaining to graph vertex chunks. These utilities will streamline the process of accessing and interpreting how vertices are grouped and stored, which is crucial for optimizing graph processing and analysis tasks.

Specifically, this PR provides:

Methods to query and retrieve metadata about vertex chunks.

Functions to read the boundaries and contents of individual vertex chunks.

Utilities to facilitate the navigation and processing of vertex data in a chunked manner.

This enhancement will benefit features requiring granular access to graph vertex data, such as distributed graph algorithms, incremental graph updates, and optimized data loading.

yangxk1

If m lines of data is split into n files:

First n-1 files have chunkSize (get from VertexInfo/EdgeInfo getChunkSize()) entries each
Last file has the remainder: m - (n-1)*chunkSize

So we don’t need to read file record count.

yangxk1 · 2025-06-30T05:57:34Z

maven-projects/info/src/main/java/org/apache/graphar/info/ChunkInfoReader.java

+        return false;
+    }
+
+    public String getPropertyGroupChunkPath(PropertyGroup propertyGroup, long chunkIndex) {


Why move from VertexInfo.java to here

it seems to be more useful in this class

Putting related functions in the same file helps organize the code better, increases cohesion, reduces coupling, and makes future maintenance easier.

Yes it is already refactored

yangxk1 · 2025-06-30T06:10:18Z

maven-projects/info/src/main/java/org/apache/graphar/info/FileReader.java

+        // TODO check equality test for type
+        String type = propertyGroup.getFileType().toString();
+        numberOfParts = vertexInfo.getChunkSize()
+        chunkBasePath = vertexInfo.getPropertyGroupPrefix() + "/part";


Is "part" or "chunk"? Please confirm.

I don't see "chunk" for vertex logical table here https://graphar.apache.org/docs/specification/format/#physical-table-of-vertices

Apologies — the docs are a bit outdated. I’ll update them as soon as possible.
In the meantime, you can check the latest data format in this repo: https://github.com/apache/incubator-graphar-testing

maven-projects/info/src/main/java/org/apache/graphar/info/FileReader.java

sapienza88 · 2025-06-30T16:28:53Z

If m lines of data is split into n files:

First n-1 files have chunkSize (get from VertexInfo/EdgeInfo getChunkSize()) entries each

Last file has the remainder: m - (n-1)*chunkSize

So we don’t need to read file record count.

How to know about "m"? the count of the data file associated with the vertex?

yangxk1 · 2025-07-01T02:26:28Z

If m lines of data is split into n files:

First n-1 files have chunkSize (get from VertexInfo/EdgeInfo getChunkSize()) entries each

Last file has the remainder: m - (n-1)*chunkSize

So we don’t need to read file record count.

How to know about "m"? the count of the data file associated with the vertex?

It can be read from vertex_count.

sapienza88 · 2025-07-24T02:29:16Z

@yangxk1 pls provide review and merge

sapienza88 · 2025-07-31T07:41:58Z

@yangxk1 pls allow CI to re-trigger automatically when a new code is commited by me so that I don't have to wait for you to do it manually. Thanks.

yangxk1 · 2025-07-31T11:57:30Z

@yangxk1 pls allow CI to re-trigger automatically when a new code is commited by me so that I don't have to wait for you to do it manually. Thanks.

To ensure security, workflow execution for first-time contributors requires approval from a project committer.

You can change the branches setting in java-info.yml to trigger the workflow in your own fork, or running the script commands in java-info.yml directly in your local environment.
Just remember to revert any changes to the .yml file before submitting your final PR.

sapienza88 · 2025-08-11T12:02:58Z

@yangxk1 thanks for merging the PR on the version parsing, pls also do the same for this PR to let us merge this. PS: Allow edits by maintainers is enabled.

yangxk1 · 2025-08-11T12:25:37Z

@yangxk1 thanks for updating the PR on the version parsing, pls also do the same for this PR to let us merge this. PS: Allow edits by maintainers is enabled.

This PR does not use protobuf, so it is not that urgent. I will still help you as soon as possible.

sapienza88 · 2025-08-16T16:58:45Z

@yangxk1 pls approve and merge this or let me know the changes required to merge before the next release so it can be included in the next release.

yangxk1 · 2025-08-18T02:20:09Z

Hi @unical1988 ,I think you should open an issue to discuss the necessity of this pr.

sapienza88 · 2025-08-18T09:34:46Z

@yangxk1 i already describe here whats the pr intended for, couldnt be clearer

yangxk1 · 2025-08-18T09:42:17Z

You need to think about these:

If m lines of data is split into n files:

First n-1 files have chunkSize (get from VertexInfo/EdgeInfo getChunkSize()) entries each

Last file has the remainder: m - (n-1)*chunkSize

So we don’t need to read file record count.

Open an Issue focus on discussion rather than description. This is also the specification of the CONTRIBUTING document.

sapienza88 · 2025-08-18T10:28:29Z

@yangxk1 i can and will add tests anything else has been discussed here, pls note that this is implemented in C++

sapienza88 · 2025-09-19T22:45:57Z

@yangxk1 i can and will add tests anything else has been discussed here, pls note that this is implemented in C++

@yangxk1 do you agree that this function doesn't require opening an issue ? why do you think it is not correct?

yangxk1 · 2025-09-22T02:06:48Z

We have not discussed whether this function is necessary.
We don’t know whether it is implemented in java-info or java-io or other some modules.

Chunk Info Reader (Java) + refactoring for VertexInfo

f35e435

sapienza88 changed the title ~~Chunk Info Reader (Java) +~~ feat(Java): Chunk Info Reader Jun 28, 2025

sapienza88 changed the title ~~feat(Java): Chunk Info Reader~~ [#714] feat(Java): Chunk Info Reader Jun 28, 2025

yangxk1 requested changes Jun 30, 2025

View reviewed changes

switch for file types instead of if for file type names

65a2121

getChunk and checKChunkExists for ChunkInfoReader

fbec999

Selim Soufargi added 2 commits July 29, 2025 07:27

fixed issues

c7a0988

fixed issues related to maven compiler version

245a822

Selim Soufargi added 4 commits August 1, 2025 05:27

fix tests for CI to pass

ea75e87

fixing CI spotless error

3452847

fixing CI error on formatting

48898e0

fixing CI error

4443d3d

[#714] feat(Java): Chunk Info Reader #714

Are you sure you want to change the base?

[#714] feat(Java): Chunk Info Reader #714

Uh oh!

Conversation

sapienza88 commented Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yangxk1 left a comment

Choose a reason for hiding this comment

Uh oh!

yangxk1 Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

sapienza88 Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

yangxk1 Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

sapienza88 Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

yangxk1 Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

sapienza88 Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

yangxk1 Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sapienza88 commented Jun 30, 2025

Uh oh!

yangxk1 commented Jul 1, 2025

Uh oh!

sapienza88 commented Jul 24, 2025

Uh oh!

sapienza88 commented Jul 31, 2025

Uh oh!

yangxk1 commented Jul 31, 2025

Uh oh!

sapienza88 commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yangxk1 commented Aug 11, 2025

Uh oh!

sapienza88 commented Aug 16, 2025

Uh oh!

yangxk1 commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sapienza88 commented Aug 18, 2025

Uh oh!

yangxk1 commented Aug 18, 2025

Uh oh!

sapienza88 commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sapienza88 commented Sep 19, 2025

Uh oh!

yangxk1 commented Sep 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sapienza88 commented Jun 28, 2025 •

edited

Loading

sapienza88 commented Aug 11, 2025 •

edited

Loading

yangxk1 commented Aug 18, 2025 •

edited

Loading

sapienza88 commented Aug 18, 2025 •

edited

Loading