Skip to content

Refactor rizin graph in rz_util #4788

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

Heersin
Copy link
Member

@Heersin Heersin commented Dec 22, 2024

Your checklist for this pull request

  • I've read the guidelines for contributing to this repository
  • I made sure to follow the project's coding style
  • I've documented or updated the documentation of every function and struct this PR changes. If not so I've explained why.
  • I've added tests that prove my fix is effective or that my feature works (if possible)
  • I've updated the rizin book with the relevant information (if needed)

Detailed description

working on implement functions
Add more graph representation for rizin, see #4231

  1. support identifier for nodes
  2. matrix based graph

Test plan

  1. unit test
  2. replace old graph api and test

Closing issues

...

#include <rz_list.h>

/**
* @brief Represents a node in a graph, node should be hashable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rizin uses \brief syntax, here and everywhere please use this one.

Copy link
Member

@Rot127 Rot127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks already way way better then before! Thanks a lot.

Some questions you might have already considered.

  1. Should the graph's API mostly work on Node pointers or the node ids? Having both would be nice. If the node content doesn't matter to the user. But working on pointers is fine for me as well for now.
  2. Runtime complexity for getting neighbors. Please let's not clone the nodes every time the neighbor list is requested. This doesn't scale well. Instead a read only RzIterator over the lists/matrix is a better fit I think.
  3. Could the elements in the List based graph implementation contain edges? It would maybe reduce internal complexity getting the neighbors. Especially for directed graphs.

Comment on lines +34 to +39
bool (*add_edge)(RzGraphNew *graph, const RzGraphNodeNew *from, const RzGraphNodeNew *to);
bool (*remove_edge)(RzGraphNew *graph, const RzGraphNodeNew *from, const RzGraphNodeNew *to);
RzList *(*get_in_neighbours)(RzGraphNew *graph, const RzGraphNodeNew *node);
RzList *(*get_out_neighbours)(RzGraphNew *graph, const RzGraphNodeNew *node);
bool (*is_directed_connect)(RzGraphNew *graph, const RzGraphNodeNew *from, const RzGraphNodeNew *to);
bool (*del_node)(RzGraphNew *graph, RzGraphNodeNew *node);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it is useful to have the same callbacks, but taking the node ids, instead of the node pointers.
This way users can decide they don't want to mess with the pointers at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the design, these pointers should all be internal to the graph API, with external access only through a unified set of functions.

Comment on lines +49 to +50
unsigned int n_nodes;
unsigned int n_edges;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
unsigned int n_nodes;
unsigned int n_edges;
ut64 n_nodes;
ut64 n_edges;

Just in case :)

unsigned int n_nodes;
unsigned int n_edges;

HtUP *nodes; /* <hash_id, RzGraphNode> */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can add a typedef RzGraphHashId;


typedef struct rz_graph_matrix_edge_impl_t {
st64 **adj_matrix;
unsigned int n_nodes;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
unsigned int n_nodes;
ut64 n_nodes;

Comment on lines +339 to +342
if (!graph || !from || !to) {
rz_return_val_if_fail(graph && from && to, false);
return false;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not a simple rz_return_val_if`?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess just auto-complete and choose one of them

return false;
}

impl->adj_matrix[from->hash_id][to->hash_id] = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the magic values like 1 and 0 into macros. EDGE_SET, EDGE_UNSET or similar.
It might be even useful to add some small inline functions to test this property.

E.g.:

static inline bool edge_set(impl, from, to) { return impl->adj_matrix[from->hash_id][to->hash_id] & 1; }

This way we can extend the graph easier with special edges. By using the upper bits.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, I would fix this dirty and quick implementation

return NULL;
}

for (unsigned int i = 0; i < impl->n_nodes; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (unsigned int i = 0; i < impl->n_nodes; i++) {
for (ut64 i = 0; i < impl->n_nodes; i++) {

Comment on lines +797 to +807
for (unsigned int i = 0; i < impl->n_nodes; i++) {
if (impl->adj_matrix[i][node->hash_id] == 1) {
RzGraphNodeNew *new_node = ht_up_find(graph->nodes, i, NULL);
if (!new_node) {
RZ_LOG_WARN("Failed to clone the node")
rz_list_free(neighbours);
return NULL;
}
rz_list_append(neighbours, new_node);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. Let's not clone all the nodes just because the neighbors are requested. The pointers to the neighbors are enough.

It could instead return a RzIterator for example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would largely prevent the graph to be accessed by multiple threads. Because the pointers returned by the iterator might live longer then the nodes in the graph.

But we would need to add another implementation for this use case anyways.

* @param impl The implementation of the graph. IMPLEMENTATION_LIST or IMPLEMENTATION_MATRIX etc.
* @return The new graph, NULL if the graph is not created successfully.
*/
RZ_API RzGraphNew *rz_graph_new_new(RzGraphIdentifierHash user_hash, RzGraphUserdataFree user_free, RzGraphImpl impl) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RZ_API RzGraphNew *rz_graph_new_new(RzGraphIdentifierHash user_hash, RzGraphUserdataFree user_free, RzGraphImpl impl) {
RZ_API RZ_OWN RzGraphNew *rz_graph_new_new(RzGraphIdentifierHash user_hash, RzGraphUserdataFree user_free, RzGraphImpl impl) {

* @return RzList contains all out neighbour nodes (borrowed instance)
*/
RZ_API RzList *rz_graph_get_out_nodes_new(RzGraphNew *g, RzGraphNodeNew *node) {
return g->edge_set->ops->get_out_neighbours(g, node);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These callbacks might are a performance problem (see: #4769). We would need to test this and maybe replace them with a switch.

@Heersin
Copy link
Member Author

Heersin commented Jan 9, 2025

Looks already way way better then before! Thanks a lot.

Some questions you might have already considered.

1. Should the graph's API mostly work on Node pointers or the node ids? Having both would be nice. If the node content doesn't matter to the user. But working on pointers is fine for me as well for now.

2. Runtime complexity for getting neighbors. Please let's not clone the nodes every time the neighbor list is requested. This doesn't scale well. Instead a read only `RzIterator` over the lists/matrix is a better fit I think.

3. Could the elements in the List based graph implementation contain edges? It would maybe reduce internal complexity getting the neighbors. Especially for directed graphs.
  1. yes, this is also something I have been contemplating. don't know if it would be confused to provide both way to access. so I defined them mostly work on Node pointer, and they could turn id into pointer through hash_find
  2. exactly, it did bring crazy workload for runtime. I'm looking for a better way to return neighbors, do u know which file provide a best practice sample for returning RzIterator?
  3. no they don't contain explicit edges, initially I implemented it as edge and nodes, but later remove edges to abstract the RzGraph and implement PoC. You r right It's good to have edge in impl for list-based graph, and to reduce the complexity, thanks for point out that

@Rot127
Copy link
Member

Rot127 commented Jan 9, 2025

do u know which file provide a best practice sample for returning RzIterator

You can checkout ht_inc.c. The functions Ht_(as_iter_keys), Ht_(as_iter_mut) and the corresponding (iter_next etc.) return an iterator over a hash map. Your usecase is even simpler I think.

no they don't contain explicit edges, initially I implemented it as edge and nodes, but later remove edges to abstract the RzGraph and implement PoC. You r right It's good to have edge in impl for list-based graph, and to reduce the complexity, thanks for point out that

You make a good point there. Considering code complexity (meaning: many lines of code) vs. low runtime complexity.

If you don't want to add an edge list for now (because too much work/ too many things to consider), it is fine.
We can still extend it.

cc @wargio

@wargio
Copy link
Member

wargio commented Jan 9, 2025

agreed with @Rot127

I think it is ok to have first an implementation that works, then we can improve the execution time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants