To view this page ensure that Adobe Flash Player version 11.1.0 or greater is installed.
In the Flash applet built with
flare above and described in
Community detection and visualization of networks with the map
equation framework, we explore the duality between finding community structure in
networks and minimizing the description length of a random walker's
movements on a network. For a given network partition
M
, the
map equation
specifies the theoretical limit L(M)
of how concisely
we can describe the trajectory of this random walk.
The underlying code structure of the map equation is designed such that the description can be compressed if the network has regions in which the random walker tends to stay for a long time. Therefore, with a random walk as a proxy for real flow, minimizing the map equation over all possible network partitions reveals important aspects of network structure with respect to the dynamics on the network.
To take advantage of the regional structure of the network, one
index codebook and m
module codebooks, one for each
module in the network, are used to describe the random walker's
movements. The module codebooks have codewords for nodes within each
module (and exit codes to leave the module), which are derived from
the node visit/exit frequencies of the random walker. The index
codebook has codewords for the modules, which are derived from the
module switch rates of the random walker. Therefore, the average
length of the code describing a step of the random walker is the
average length of codewords from the index codebook and the module
codebooks weighted by their rates of use (mouseover the map equation
for more information):
M
. For module partition M
of
n
nodes into m
modules, the lower
bound of the average length of the code describing a step of the
random walker.
m
modules. The total height of the blocks under
"Index codebook" in the visualization above corresponds to this
rate.m
module codebooks. This
corresponds to the total height of the blocks under "Module
codebooks" in the visualization above. For module
i
, this is given by the fraction of time the random
walker spends in module i
, which is given by the
total probability that any node in the module is visited, plus
the probability that it exits the module and the exit message is
used.i
. The entropy of the relative
rates at which the random walker exits module i
and
visits each node in module i
measures the smallest
average codeword length that is theoretically possible. The
heights of individual blocks under "Module codebooks" in the
visualization above correspond to the relative rates and the
codeword lengths approximately correspond to the negative
logarithm of the rates in base 2.
In the Flash applet, release a random walker and see how, with multiple Huffman codebooks, we can represent its trajectory on the network with a compressed binary message. Change the assignment of nodes into modules by changing their node colors and see how the description length changes. The code structure corresponding to the current partition is shown by the stacked boxes on the right. With multiple module codebooks, each of which re-uses a similar set of codewords, we must specify which module codebook should be used. This is implemented by using the index codebook shown to the left and exit codes in each module (marked by arrows). The coding procedure is as follows. The index codebook specifies a module codebook, and the module codebook specifies a succession of nodes within that module. When the random walker leaves the module, and we need to return to the index codebook, we use the exit code. The codeword lengths in the index codebook are derived from the relative rates at which a random walker enters each module. The codeword lengths for each module codebook are derived from the relative rates at which a random walker visits each node in the module or exits the module. Mouseover objects in the visualization for more information.