Refactor rizin graph in rz_util #4788

Heersin · 2024-12-22T16:41:33Z

Your checklist for this pull request

I've read the guidelines for contributing to this repository
I made sure to follow the project's coding style
I've documented or updated the documentation of every function and struct this PR changes. If not so I've explained why.
I've added tests that prove my fix is effective or that my feature works (if possible)
I've updated the rizin book with the relevant information (if needed)

Detailed description

working on implement functions
Add more graph representation for rizin, see #4231

support identifier for nodes
matrix based graph

Test plan

unit test
replace old graph api and test

Closing issues

...

XVilka · 2024-12-22T16:45:49Z

librz/include/rz_util/rz_graph_new.h

+#include <rz_list.h>
+
+/**
+ * @brief Represents a node in a graph, node should be hashable.


Rizin uses \brief syntax, here and everywhere please use this one.

Rot127

Looks already way way better then before! Thanks a lot.

Some questions you might have already considered.

Should the graph's API mostly work on Node pointers or the node ids? Having both would be nice. If the node content doesn't matter to the user. But working on pointers is fine for me as well for now.
Runtime complexity for getting neighbors. Please let's not clone the nodes every time the neighbor list is requested. This doesn't scale well. Instead a read only RzIterator over the lists/matrix is a better fit I think.
Could the elements in the List based graph implementation contain edges? It would maybe reduce internal complexity getting the neighbors. Especially for directed graphs.

Rot127 · 2024-12-29T16:07:39Z

librz/include/rz_util/rz_graph_new.h

+	bool (*add_edge)(RzGraphNew *graph, const RzGraphNodeNew *from, const RzGraphNodeNew *to);
+	bool (*remove_edge)(RzGraphNew *graph, const RzGraphNodeNew *from, const RzGraphNodeNew *to);
+	RzList *(*get_in_neighbours)(RzGraphNew *graph, const RzGraphNodeNew *node);
+	RzList *(*get_out_neighbours)(RzGraphNew *graph, const RzGraphNodeNew *node);
+	bool (*is_directed_connect)(RzGraphNew *graph, const RzGraphNodeNew *from, const RzGraphNodeNew *to);
+	bool (*del_node)(RzGraphNew *graph, RzGraphNodeNew *node);


I wonder if it is useful to have the same callbacks, but taking the node ids, instead of the node pointers.
This way users can decide they don't want to mess with the pointers at all.

in the design, these pointers should all be internal to the graph API, with external access only through a unified set of functions.

Rot127 · 2024-12-29T16:08:36Z

librz/include/rz_util/rz_graph_new.h

+	unsigned int n_nodes;
+	unsigned int n_edges;


Suggested change

unsigned int n_nodes;

unsigned int n_edges;

ut64 n_nodes;

ut64 n_edges;

Just in case :)

Rot127 · 2024-12-29T16:10:40Z

librz/include/rz_util/rz_graph_new.h

+	unsigned int n_nodes;
+	unsigned int n_edges;
+
+	HtUP *nodes; /* <hash_id, RzGraphNode> */


I think you can add a typedef RzGraphHashId;

Rot127 · 2024-12-29T16:14:24Z

librz/util/graph.c

+
+typedef struct rz_graph_matrix_edge_impl_t {
+	st64 **adj_matrix;
+	unsigned int n_nodes;


Suggested change

unsigned int n_nodes;

ut64 n_nodes;

Rot127 · 2024-12-29T16:15:09Z

librz/util/graph.c

+	if (!graph || !from || !to) {
+		rz_return_val_if_fail(graph && from && to, false);
+		return false;
+	}


Why not a simple rz_return_val_if`?

i guess just auto-complete and choose one of them

Rot127 · 2024-12-29T16:45:02Z

librz/util/graph.c

+		return false;
+	}
+
+	impl->adj_matrix[from->hash_id][to->hash_id] = 1;


Move the magic values like 1 and 0 into macros. EDGE_SET, EDGE_UNSET or similar.
It might be even useful to add some small inline functions to test this property.

E.g.:

static inline bool edge_set(impl, from, to) { return impl->adj_matrix[from->hash_id][to->hash_id] & 1; }

This way we can extend the graph easier with special edges. By using the upper bits.

nice, I would fix this dirty and quick implementation

Rot127 · 2024-12-29T16:49:36Z

librz/util/graph.c

+		return NULL;
+	}
+
+	for (unsigned int i = 0; i < impl->n_nodes; i++) {


Suggested change

for (unsigned int i = 0; i < impl->n_nodes; i++) {

for (ut64 i = 0; i < impl->n_nodes; i++) {

Rot127 · 2024-12-29T16:52:33Z

librz/util/graph.c

+	for (unsigned int i = 0; i < impl->n_nodes; i++) {
+		if (impl->adj_matrix[i][node->hash_id] == 1) {
+			RzGraphNodeNew *new_node = ht_up_find(graph->nodes, i, NULL);
+			if (!new_node) {
+				RZ_LOG_WARN("Failed to clone the node")
+				rz_list_free(neighbours);
+				return NULL;
+			}
+			rz_list_append(neighbours, new_node);
+		}
+	}


Same as above. Let's not clone all the nodes just because the neighbors are requested. The pointers to the neighbors are enough.

It could instead return a RzIterator for example.

This would largely prevent the graph to be accessed by multiple threads. Because the pointers returned by the iterator might live longer then the nodes in the graph.

But we would need to add another implementation for this use case anyways.

Rot127 · 2024-12-29T16:55:11Z

librz/util/graph.c

+ * @param impl The implementation of the graph. IMPLEMENTATION_LIST or IMPLEMENTATION_MATRIX etc.
+ * @return The new graph, NULL if the graph is not created successfully.
+ */
+RZ_API RzGraphNew *rz_graph_new_new(RzGraphIdentifierHash user_hash, RzGraphUserdataFree user_free, RzGraphImpl impl) {


Suggested change

RZ_API RzGraphNew *rz_graph_new_new(RzGraphIdentifierHash user_hash, RzGraphUserdataFree user_free, RzGraphImpl impl) {

RZ_API RZ_OWN RzGraphNew *rz_graph_new_new(RzGraphIdentifierHash user_hash, RzGraphUserdataFree user_free, RzGraphImpl impl) {

Rot127 · 2024-12-29T16:59:09Z

librz/util/graph.c

+ * @return RzList contains all out neighbour nodes (borrowed instance)
+ */
+RZ_API RzList *rz_graph_get_out_nodes_new(RzGraphNew *g, RzGraphNodeNew *node) {
+	return g->edge_set->ops->get_out_neighbours(g, node);


These callbacks might are a performance problem (see: #4769). We would need to test this and maybe replace them with a switch.

Heersin · 2025-01-09T11:51:22Z

Looks already way way better then before! Thanks a lot.

Some questions you might have already considered.

1. Should the graph's API mostly work on Node pointers or the node ids? Having both would be nice. If the node content doesn't matter to the user. But working on pointers is fine for me as well for now.

2. Runtime complexity for getting neighbors. Please let's not clone the nodes every time the neighbor list is requested. This doesn't scale well. Instead a read only `RzIterator` over the lists/matrix is a better fit I think.

3. Could the elements in the List based graph implementation contain edges? It would maybe reduce internal complexity getting the neighbors. Especially for directed graphs.

yes, this is also something I have been contemplating. don't know if it would be confused to provide both way to access. so I defined them mostly work on Node pointer, and they could turn id into pointer through hash_find
exactly, it did bring crazy workload for runtime. I'm looking for a better way to return neighbors, do u know which file provide a best practice sample for returning RzIterator?
no they don't contain explicit edges, initially I implemented it as edge and nodes, but later remove edges to abstract the RzGraph and implement PoC. You r right It's good to have edge in impl for list-based graph, and to reduce the complexity, thanks for point out that

Rot127 · 2025-01-09T13:11:55Z

do u know which file provide a best practice sample for returning RzIterator

You can checkout ht_inc.c. The functions Ht_(as_iter_keys), Ht_(as_iter_mut) and the corresponding (iter_next etc.) return an iterator over a hash map. Your usecase is even simpler I think.

no they don't contain explicit edges, initially I implemented it as edge and nodes, but later remove edges to abstract the RzGraph and implement PoC. You r right It's good to have edge in impl for list-based graph, and to reduce the complexity, thanks for point out that

You make a good point there. Considering code complexity (meaning: many lines of code) vs. low runtime complexity.

If you don't want to add an edge list for now (because too much work/ too many things to consider), it is fine.
We can still extend it.

cc @wargio

wargio · 2025-01-09T13:42:23Z

agreed with @Rot127

I think it is ok to have first an implementation that works, then we can improve the execution time.

Heersin added 2 commits December 23, 2024 00:37

Basic implement of list based and matrix based graph

95d5c28

Add comments

e636b1c

github-actions bot added API RzUtil labels Dec 22, 2024

XVilka requested changes Dec 22, 2024

View reviewed changes

Rot127 reviewed Dec 29, 2024

View reviewed changes

	for (unsigned int i = 0; i < impl->n_nodes; i++) {
	for (ut64 i = 0; i < impl->n_nodes; i++) {

	RZ_API RzGraphNew *rz_graph_new_new(RzGraphIdentifierHash user_hash, RzGraphUserdataFree user_free, RzGraphImpl impl) {
	RZ_API RZ_OWN RzGraphNew *rz_graph_new_new(RzGraphIdentifierHash user_hash, RzGraphUserdataFree user_free, RzGraphImpl impl) {

Uh oh!

Refactor rizin graph in rz_util #4788

Are you sure you want to change the base?

Refactor rizin graph in rz_util #4788

Uh oh!

Conversation

Heersin commented Dec 22, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Rot127 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Heersin commented Jan 9, 2025

Uh oh!

Rot127 commented Jan 9, 2025

Uh oh!

wargio commented Jan 9, 2025

Uh oh!

Uh oh!