Skip to content

Unclear handling of attributes in BP4/BP5 #4471

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jorgensd opened this issue Feb 27, 2025 · 4 comments
Open

Unclear handling of attributes in BP4/BP5 #4471

jorgensd opened this issue Feb 27, 2025 · 4 comments

Comments

@jorgensd
Copy link

The following minimal example using the ADIOS2 Python interface (2.10.2), shows a difference in how variables are handled in the BP4 and BP5 formats.

import adios2.bindings as adios2
from mpi4py import MPI


def read_attr(engine):
    filename = "test_" + engine + ".bp"

    adios = adios2.ADIOS(MPI.COMM_WORLD)


    io = adios.DeclareIO("reader" + engine)
    io.SetEngine(engine)
    file = io.Open(str(filename), adios2.Mode.Read)
    print(engine,  io.AvailableAttributes().keys())
    for step in range(file.Steps()):
        file.BeginStep()
        print(engine, step, io.AvailableAttributes().keys())
        file.EndStep()
    file.Close()
    adios.RemoveIO("reader"+engine)


def write_attr(engine):
    filename = "test_" + engine + ".bp"

    # Write two attributes to file
    adios = adios2.ADIOS(MPI.COMM_WORLD)
    io = adios.DeclareIO("writer" + engine)
    io.SetEngine(engine)
    adios_file = io.Open(str(filename), adios2.Mode.Write, MPI.COMM_WORLD)
    adios_file.BeginStep()
    io.DefineAttribute("a", "first")
    adios_file.PerformPuts()
    adios_file.EndStep()

    adios_file.BeginStep()
    io.DefineAttribute("b", "last")
    adios_file.PerformPuts()
    adios_file.EndStep()

    adios_file.Close()
    adios.RemoveIO("writer"+"engine")

if __name__ == "__main__":
    write_attr("BP4")
    read_attr("BP4")
    write_attr("BP5")
    read_attr("BP5")

This yields:

BP4 dict_keys(['a'])
BP4 0 dict_keys(['a'])
BP5 dict_keys([])
BP5 0 dict_keys(['a'])
BP5 1 dict_keys(['a', 'b'])

Which makes the handling of both formats within Python very hard to maintain.
Is this a change that was made on purpose or a bug?

@eisenhauer
Copy link
Member

Some things about this example surprise me and some do not. There is a fundamental difference between BP4 and BP5 surrounding the more explicit separation between "streaming" and "random access" read modes. In BP4 these were somewhat blurred. BP4 loads all file metadata immediately upon Open() regardless of access mode. However in the default Adios.Mode.Read, BP5 loads each timesteps metadata only upon BeginStep. Therefore there are no attributes available before BeginStep, and the attributes are added cumulatively as you read additional steps. So, the BP5 output above looks reasonable for those semantics. (You should get different semantics if you specify Mode.ReadRandomAccess.) I'm a little fuzzier why you're not seeing "b" in the BP4 output. @pnorbert ?

@jorgensd
Copy link
Author

jorgensd commented Mar 3, 2025

However in the default Adios.Mode.Read, BP5 loads each timesteps metadata only upon BeginStep.

This makes a big difference for my applications, as I've assumed that Attributes were time-independent.
I based this on the documentation, that states:

Attribute: Attributes add extra information to the overall variables dataset defined in the IO class. They can be single or array values.

Having to define these for each write step is not something I can do for my applications, as write steps are not assoicated with time steps, and the data evolves over time (One first might write a mesh, then some function data, then some markers, then function data for a different time step).
With BP5, I now have to loop over all steps to find the right step in the ADIOS2-file.

Suddenly the divide between Attribute and Variable is not clear to me?
A Global single-value Variable and an attribute now seems like the same kind of object to me.

@eisenhauer
Copy link
Member

Mode.Read is setup to match semantics that ADIOS can provide in a streaming situation. That is, one in which the writer and reader are running simultaneously and data flows directly from one to another over the network. In this circumstance time-independence is impossible. Possibly you want Mode.ReadRandomAccess? You can't use BeginStep/EndStep with that in BP5, but if you structure your code so that instead you use SetStepSelection for the variables that you read then your code will work with both BP4 and BP5 and all attributes will be available immediately upon Open(). The downside of that approach is higher Open() cost and memory utilization because all file metadata is read immediately. (Higher as compared to BP5 Mode.Read. BP4 always has those higher costs and memory utilization).

(The BP4 engine and prior versions of ADIOS in general didn't have a strict differentiation between Read and ReadRandomAccess and provided inconsistent semantics in streaming vs. non-streaming situations. BP5 tries to enforce a stronger line between access methods with clearer semantics.)

The most obvious difference between Attribute and Variable is that Attributes are persistent on the reader side, where variables are not. That is, once set and Attribute is always available to query regardless of timestep. They used to be immutable as well, but ADIOS has now introduced mutable attributes to accommodate user requests.

@pnorbert
Copy link
Contributor

pnorbert commented Mar 4, 2025

As a note on the BP4 test, I don't understand why but it does not work as expected. Even though this high-level python API example (which uses the bindings as the example above) works as expected.

import adios2
from mpi4py import MPI

# MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

def read_attr(engine):
    filename = "testHL_" + engine + ".bp"
    with adios2.FileReader(filename, comm) as fh:
        attrs = fh.available_attributes()
        print(engine,  attrs.keys())
        for aname in attrs.keys():
            a = fh.inquire_attribute(aname)
            print(f"{aname} = {a.data()}")

    with adios2.Stream(filename, "r", comm) as fh:
        for _ in fh.steps():
            print(f"----- step {fh.current_step()}")
            attrs = fh.available_attributes()
            print(engine,  attrs.keys())
            for aname in attrs.keys():
                a = fh.inquire_attribute(aname)
                print(f"{aname} = {a.data()}")



def write_attr(engine):
    filename = "testHL_" + engine + ".bp"
    with adios2.Stream(filename, "w", comm) as fh:
        for _ in fh.steps(4):
            currentStep = fh.current_step()
            fh.write_attribute("attr"+str(currentStep), currentStep)

if __name__ == "__main__":
    write_attr("BP4")
    read_attr("BP4")
    write_attr("BP5")
    read_attr("BP5")
$ python3 ./testHL.py
BP4 dict_keys(['attr0', 'attr1', 'attr2', 'attr3'])
attr0 = 0
attr1 = 1
attr2 = 2
attr3 = 3
----- step 0
BP4 dict_keys(['attr0'])
attr0 = 0
----- step 1
BP4 dict_keys(['attr0', 'attr1'])
attr0 = 0
attr1 = 1
----- step 2
BP4 dict_keys(['attr0', 'attr1', 'attr2'])
attr0 = 0
attr1 = 1
attr2 = 2
----- step 3
BP4 dict_keys(['attr0', 'attr1', 'attr2', 'attr3'])
attr0 = 0
attr1 = 1
attr2 = 2
attr3 = 3
BP5 dict_keys(['attr0', 'attr1', 'attr2', 'attr3'])
attr0 = 0
attr1 = 1
attr2 = 2
attr3 = 3
----- step 0
BP5 dict_keys(['attr0'])
attr0 = 0
----- step 1
BP5 dict_keys(['attr0', 'attr1'])
attr0 = 0
attr1 = 1
----- step 2
BP5 dict_keys(['attr0', 'attr1', 'attr2'])
attr0 = 0
attr1 = 1
attr2 = 2
----- step 3
BP5 dict_keys(['attr0', 'attr1', 'attr2', 'attr3'])
attr0 = 0
attr1 = 1
attr2 = 2
attr3 = 3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants