Awesome
Directed Louvain algorithm
The algorithm used in this package is based on the Louvain algorithm developed by V. Blondel, J.-L. Guillaume, R. Lambiotte, E. Lefebvre and was downloaded on the Louvain algorithm webpage ([1]). The algorithm was then adjusted to handle directed graphs and to optimize directed modularity of Arenas et al. ([2]). These modifications were mostly made by Anthony perez and by Nicolas Dugué.
The directed modularity is proved to be more efficient in the case of directed graphs as shown in Direction matters in complex networks: A theoretical and applied study for greedy modularity optimization and Directed Louvain : maximizing modularity in directed networks ([3,4]). For any citation of this work please use the following:
@article{DP22,
author = {Nicolas Dugué and Anthony Perez},
title = {Direction matters in complex networks: A theoretical and applied study for greedy modularity optimization},
journal = {Physica A: Statistical Mechanics and its Applications},
volume = {603},
pages = {127798},
year = {2022},
issn = {0378-4371},
doi = {https://doi.org/10.1016/j.physa.2022.127798},
}
How to use
The algorithm works in two steps, namely computing communities and then displaying hierarchical tree. Note that the algorithme computes a hierarchical community structure, with several levels. More information are given below regarding the meaning and use of such levels.
Computing communities
The graph must be in edgelist format, that is one edge per line as follows (the weight
being optional):
src dest [weight]
Weighted graphs are automatically recognized by the program and there is no need to specify it in the command line.
Moreover, it is mandatory that vertices of the input graph are numbered from 0
to n-1
.
To ensure a proper computation of the communities, the default computation encompasses a renumbering of the input graph.
The option -n
indicates that the graph is already numbered from 0
to n-1
and hence avoids renumbering.
Important: communities are written using the original label nodes.
The standard command is:
./bin/community -f examples/graph.txt -l -1 -v > graph.tree
Several options are available, among which:
-f
path to the input graph (edgelist or binary format (.bin
))-g
value of the resolution parameter: the algorithm favors larger communities if less than 1 and smaller otherwise-n
to indicate that the input graph is correctly numbered (from0
ton-1
)-r
for reproducibility purposes: the renumbered graph is stored on hard drive.-s
for disabling randomness: the algorithm will consider vertices from0
ton-1
More options and information are provided using ./bin/community
Another possibility is to pass a binary file containing all information regarding the graph.
We strongly recommand to generate the binary file using our program in a first place.
Using any binary file compatible with our format would of course work (see the load
function in the src/graph.cpp
file, more information to come).
Graph representation: CSR format
Graphs are stored under the Compressed Sparse Row (CSR) format.
Several structures are containing the whole graph information:
- two arrays of cumulative degrees (out- and in-degrees): for out-degrees, each index i contains the sum of all out-degrees for nodes from 0 to i.
- two arrays of outcoming and incoming arcs: the d(0) (out- or in-degree of node 0) first values contain neighbors of node 0 and so on.
To find the first neighbor of a given node i one simply needs to consider the difference between cumulative degrees of i and i-1. - two array of outcoming and incoming weights: similar to the previous ones but store weights instead of node identifiers.
Example of CSR format for a directed graph. The displayed arrays contain information regarding out-neighbors and weighted out--degrees only.
Examples
Using examples/graph.txt
one obtains:
./bin/community -f examples/graph.txt -l -1 -v -r > examples/graph.tree
to compute hierarchical community structure (using original label nodes) by first renumbering the graph, and then writing files for reproducibility. The next runs would thus be:
./bin/community -f examples/graph.txt -l -1 > examples/graph.tree
Another useful value for -l
is -2
, which computes only the last level of the hierarchical structure.
Finally, using an already renumbered graph one gets:
./bin/community -f examples/graph_renum.txt -l -1 -v -n > examples/graph.tree
The program can also start with any given partition using -p option
./bin/community -f examples/graph.txt -p examples/graph.part -v
Improvements
To ensure a faster computation (with a loss of quality), one can use the -q option to specify that the program must stop if the increase of modularity is below epsilon for a given iteration or pass:
./bin/community examples/graph.txt -l -1 -q 0.0001 > examples/graph.tree
Display communities information
The following command displays information on the tree structure (number of hierarchical
levels and nodes per level) and is equivalent to using option -l -1
:
./bin/hierarchy graph.tree
The following command displays the belonging of nodes to communities for a given level of the tree:
./bin/hierarchy graph.tree -l 2 > graph_node2comm_level2
Using -l -2
displays the last level of the hierarchy.
References
- [1] Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, Etienne Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008, vol. 2008, no 10, p. P10008.
- [2] Alexandre Arenas, Jordi Duch, Alberto Fern'andez, Sergio G'omez. Size reduction of complex networks preserving modularity. New Journal of Physics 9, 176, 2007.
- [3] Nicolas Dugué, Anthony Perez. Direction matters in complex networks: A theoretical and applied study for greedy modularity optimization. Physica A: Statistical Mechanics and its Applications, 603, 2022.
- [4] Nicolas Dugué, Anthony Perez. Directed Louvain: maximizing modularity in directed networks. Research Report, Université d'Orléans. 2015.