Infomap options guide

Infomap has many options. This guide explains what they are for and shows the ones you change most often, grouped by what you want to achieve. For a first end-to-end run, start with the quickstart.

Infomap finds communities by minimizing the map equation, the description length of a random walk on the network (Rosvall and Bergstrom, 2008; Rosvall, Axelsson, and Bergstrom, 2009). Most options below change how that random walk is defined, how hard the search works, or what gets written. Each section names the paper that introduced the option, and the References section at the end collects them.

Every option is the same idea across interfaces. A command-line flag like --two-level is the keyword two_level=True in Python and the matching argument in R. In Python you can set options three ways:

  • keyword arguments: Infomap(two_level=True, num_trials=20)

  • a raw flag string: Infomap(args="--two-level --num-trials 20")

  • a reusable InfomapOptions object passed to Infomap.from_options(...) or Infomap.run_with_options(...)

This guide uses keyword arguments. The last section lists every option with its flag, default, and description, generated from the installed package so it always matches your version.

import infomap
from infomap import Infomap
import networkx as nx
import pandas as pd
import matplotlib.pyplot as plt
from pathlib import Path

Reproducible runs

Set a seed and a meaningful num_trials so results are reproducible and stable. This guide fixes seed=123 and num_trials=20, and runs silent=True to keep the notebook output short.

print("Infomap version:", infomap.__version__)

SEED = 123
NUM_TRIALS = 20
Infomap version: 2.13.0

Which option should I change?

Start from your goal and reach for the matching option. Each option has its own section below, and the full reference is at the end.

Your goal

Options to reach for

Make results reproducible

seed

Get a better, more stable partition

num_trials (higher), converge

Get fewer, larger modules

markov_time (higher), preferred_number_of_modules, two_level

Get more, finer modules

markov_time (lower)

Control the hierarchy depth

preferred_number_of_levels, two_level

Respect link direction

directed, flow_model

Run faster on large networks

two_level, num_trials (lower), converge, fast_hierarchical_solution, parallel_trials, num_threads

Handle sparse, noisy, or incomplete data

regularized, entropy_corrected

Cluster multilayer or temporal networks

multilayer_relax_rate, multilayer_relax_to_self, multilayer_relax_limit

Bias the partition with node metadata

meta_data, meta_data_rate

Start from, or only score, a partition

cluster_data, no_infomap

Choose what gets written

tree, ftree, clu, output, out_name, no_file_output

For run time and memory planning, see benchmark-performance.

Accuracy: reproducibility and solution quality

Infomap is a stochastic search. A single trial can land in a worse local minimum, so the result depends on the seed. Two options control this.

  • seed fixes the random number generator so a run is reproducible.

  • num_trials runs that many independent searches and keeps the best (lowest codelength) one. More trials give a better and more stable result at a higher run time.

Calatayud et al. (2019) found that stochastic searches produce many near-degenerate solutions, so exploring more of the landscape gives a more reliable partition.

def run_jazz(**options):
    im = Infomap(silent=True, **options)
    im.read_file("data/jazz.net")
    im.run()
    return im


# A single trial varies with the seed: different seeds find different codelengths.
single_trial = pd.DataFrame(
    {"seed": s, "codelength": round(run_jazz(seed=s, num_trials=1).codelength, 4)}
    for s in range(1, 6)
)
single_trial
seed codelength
0 1 6.9099
1 2 6.8647
2 3 6.9101
3 4 6.8656
4 5 6.8650
# Keeping the best of many trials gives a lower, stable codelength.
pd.DataFrame(
    {
        "num_trials": nt,
        "codelength": round(run_jazz(seed=SEED, num_trials=nt).codelength, 4),
    }
    for nt in (1, 5, 20, 50)
)
num_trials codelength
0 1 6.9099
1 5 6.8647
2 20 6.8615
3 50 6.8612

converge treats the trial count as a cap. It runs trials until the best codelength stops improving, so you can ask for many trials without always paying for all of them.

im = run_jazz(seed=SEED, num_trials=50, converge=True)
round(im.codelength, 4)
6.8615

Algorithm: resolution and hierarchy

These options change what partition Infomap looks for, not how hard it searches.

We use two small example networks: networkx’s built-in ring of cliques, and a synthetic nested hierarchy built inline (networkx has no deep-hierarchy generator). The inline helper is notebook-local and is not a public Infomap API.

def hierarchical_graph():
    # Nested structure at three scales: 2 top groups, each with 2 subgroups,
    # each with 2 cliques of 5 nodes. Edge weights shrink with each scale, so the
    # best map is a deep, four-level hierarchy.
    G = nx.Graph()
    nid = 0
    top_groups = []
    for _ in range(2):
        subgroups = []
        for _ in range(2):
            cliques = []
            for _ in range(2):
                members = list(range(nid, nid + 5))
                nid += 5
                for i in range(len(members)):
                    for j in range(i + 1, len(members)):
                        G.add_edge(members[i], members[j], weight=100.0)
                cliques.append(members)
            for a in range(len(cliques)):
                for b in range(a + 1, len(cliques)):
                    G.add_edge(
                        cliques[a][0], cliques[b][0], weight=10.0
                    )  # within subgroup
            subgroups.append(cliques)
        for a in range(len(subgroups)):
            for b in range(a + 1, len(subgroups)):
                G.add_edge(
                    subgroups[a][0][0], subgroups[b][0][0], weight=1.0
                )  # within top group
        top_groups.append(subgroups)
    for a in range(len(top_groups)):
        for b in range(a + 1, len(top_groups)):
            G.add_edge(
                top_groups[a][0][0][0], top_groups[b][0][0][0], weight=0.1
            )  # across top groups
    return G


def run_graph(G, **options):
    im = Infomap(silent=True, seed=SEED, num_trials=NUM_TRIALS, **options)
    im.add_networkx_graph(G)
    im.run()
    return im


G_hier = hierarchical_graph()
G_ring = nx.ring_of_cliques(16, 5)  # networkx built-in: 16 cliques of 5 nodes

markov_time sets the resolution

markov_time scales how costly it is to move between modules. Higher values make module boundaries more expensive to cross, so modules merge and you get fewer, larger ones. Lower values give more, finer modules. It is the main resolution knob. Kheirkhahzadeh et al. (2016) made sweeping Markov time across scales efficient. variable_markov_time instead sets a local time per node, so one run handles sparse and dense regions at once (Edler et al., 2022).

markov_sweep = pd.DataFrame(
    {
        "markov_time": mt,
        "top_modules": run_graph(
            G_ring, two_level=True, markov_time=mt
        ).num_top_modules,
    }
    for mt in (1, 2, 4, 6, 8, 10)
)
markov_sweep
markov_time top_modules
0 1 16
1 2 16
2 4 8
3 6 6
4 8 5
5 10 4
ax = markov_sweep.plot(x="markov_time", y="top_modules", marker="o", legend=False)
ax.set(
    xlabel="Markov time",
    ylabel="number of top modules",
    title="Higher Markov time merges modules",
)
plt.show()
../_images/57320e588fc0f87edc6c6ba7d317f5f7206ec14ca322ead6c53042247a660f90.png

Steering the module and level counts

When you have a target structure in mind, two soft preferences nudge the search without hard-coding the answer.

  • preferred_number_of_modules penalizes solutions whose module count is far from the value you give.

  • preferred_number_of_levels biases the hierarchy depth. Steering to a shallower hierarchy is reliable; deeper is best-effort. Use preferred_number_of_levels_strength to scale how hard it steers.

pd.DataFrame(
    {
        "preferred_number_of_modules": p,
        "top_modules": run_graph(
            G_ring, two_level=True, preferred_number_of_modules=p
        ).num_top_modules,
    }
    for p in (2, 4, 8, 16)
)
preferred_number_of_modules top_modules
0 2 2
1 4 4
2 8 8
3 16 16
rows = []
for levels in (None, 3):
    options = {} if levels is None else {"preferred_number_of_levels": levels}
    im = run_graph(G_hier, **options)
    rows.append(
        {
            "preferred_number_of_levels": levels or "default",
            "levels": im.num_levels,
            "top_modules": im.num_top_modules,
        }
    )
pd.DataFrame(rows)
preferred_number_of_levels levels top_modules
0 default 4 2
1 3 3 4

Regularization for sparse, noisy, or incomplete data

Sparse or under-sampled networks can be over-partitioned: Infomap splits on sampling noise instead of real structure. Two options make the partition more conservative.

  • regularized adds a fully connected Bayesian prior network, which reduces overfitting to missing links and merges weakly supported modules. Scale it with regularization_strength; larger values merge more.

  • entropy_corrected corrects the negative entropy bias in small samples, which matters most for solutions with many modules.

Smiljanić, Edler, and Rosvall (2020) derived both from the Bayesian map equation, which Smiljanić et al. (2021) extended to weighted and directed data.

On the jazz network, regularization steadily merges the least supported modules. When the network has missing links, see Networks with incomplete data for the full treatment.

rows = []
for label, options in [
    ("baseline", {}),
    ("regularized", {"regularized": True}),
    ("regularized, strength 2", {"regularized": True, "regularization_strength": 2.0}),
]:
    im = run_jazz(seed=SEED, num_trials=NUM_TRIALS, **options)
    rows.append(
        {
            "setting": label,
            "top_modules": im.num_top_modules,
            "codelength": round(im.codelength, 4),
        }
    )
pd.DataFrame(rows)
setting top_modules codelength
0 baseline 6 6.8615
1 regularized 5 7.2532
2 regularized, strength 2 4 7.4569

Multilayer and temporal networks

For multilayer or temporal data, a state node can relax to neighboring layers while the walker moves. These options control that coupling.

  • multilayer_relax_rate is the probability of relaxing to other layers instead of staying in the current one.

  • multilayer_relax_limit, ..._up, and ..._down restrict relaxation to nearby layers, which is useful for ordered (temporal) layers.

  • multilayer_relax_to_self couples a state node to its own physical node in the target layer, building a smaller state network with the same flow.

De Domenico et al. (2015) introduced the relax rate in the multiplex map equation. Edler, Bohlin, and Rosvall (2017) cast memory and multilayer networks as one state-node formulation, and multilayer_relax_by_jsd couples layers by flow similarity for intermittent communities (Aslak et al., 2018).

Build multilayer input with add_multilayer_intra_link (within a layer) and add_multilayer_inter_link (between layers), then set the relax options as usual.

ml = Infomap(silent=True, seed=SEED, num_trials=NUM_TRIALS, multilayer_relax_rate=0.25)
intra = [(1, 2), (2, 3), (3, 1), (4, 5), (5, 6), (6, 4)]  # two triangles per layer
for layer in (1, 2):
    for source, target in intra:
        ml.add_multilayer_intra_link(layer, source, target, 1.0)
ml.run()
ml.num_top_modules
2

The dedicated Multilayer Networks notebook goes further.

Metadata

meta_data reads a node attribute from a clu-format file and encodes it alongside the network, biasing modules toward the metadata groups. meta_data_rate sets how strongly the metadata is weighted. Emmons and Mucha (2019) introduced this tunable attribute weight, and Bassolas et al. (2022) extended it to nonlocal metadata relationships. See Networks with Metadata for the theory.

karate = nx.karate_club_graph()

# Write a clu-format metadata file: node id (1-based) and a group id.
Path("output").mkdir(exist_ok=True)
with open("output/karate-club.meta", "w") as f:
    f.write("# node_id metadata\n")
    for node in karate.nodes():
        group = 0 if karate.nodes[node]["club"] == "Mr. Hi" else 1
        f.write(f"{node + 1} {group}\n")

rows = []
for label, options in [
    ("no metadata", {}),
    ("with metadata", {"meta_data": "output/karate-club.meta"}),
]:
    im = run_graph(karate, **options)
    rows.append(
        {
            "setting": label,
            "top_modules": im.num_top_modules,
            "codelength": round(im.codelength, 4),
        }
    )
pd.DataFrame(rows)
setting top_modules codelength
0 no metadata 3 4.0874
1 with metadata 6 4.6413

Shaping the input

A few options change which network Infomap sees, or reuse an existing partition.

  • cluster_data reads an initial partition (a .clu file) or a hierarchy (a .tree/.ftree file) to start from.

  • no_infomap skips the search entirely. Combined with cluster_data, it scores an existing partition without searching, which lets you compare partitions on the same codelength scale.

  • weight_threshold ignores links below a weight, and node_limit reads only up to a node id. Both trim the input before clustering.

Here we cluster a network, save the partition, then score that same partition without re-running the search.

im = run_jazz(seed=SEED, num_trials=NUM_TRIALS)
im.write_clu("output/jazz.clu")
optimized = round(im.codelength, 4)

scorer = Infomap(silent=True, cluster_data="output/jazz.clu", no_infomap=True)
scorer.read_file("data/jazz.net")
scorer.run()

pd.DataFrame(
    [
        {"run": "optimized", "codelength": optimized},
        {"run": "scored only (no_infomap)", "codelength": round(scorer.codelength, 4)},
    ]
)
run codelength
0 optimized 6.8615
1 scored only (no_infomap) 6.8615

Choosing the output

By default Infomap writes a .tree file. You can pick formats explicitly.

  • tree, ftree, and clu enable individual formats. ftree adds aggregated links between modules and is used by the Network Navigator.

  • output selects several formats at once, for example output=["clu", "tree", "ftree"].

  • clu_level chooses the depth for .clu output, out_name sets the base file name, and no_file_output=True keeps everything in memory.

  • silent and verbosity_level control console output.

In a notebook you read the result from the Infomap object and its to_dataframe(...), and write files only when another tool needs them.

im = run_graph(karate)

# Read results in memory, no files needed.
im.to_dataframe(columns=["node_id", "module_id", "flow"]).head()
node_id module_id flow
0 0 1 0.090909
1 1 1 0.062771
2 2 1 0.071429
3 3 1 0.038961
4 7 1 0.028139
# Or write explicit formats for downstream tools.
im.write_clu("output/karate.clu")
im.write_tree("output/karate.tree")
im.write_flow_tree("output/karate.ftree")
sorted(p.name for p in Path("output").glob("karate.*"))
['karate.clu', 'karate.ftree', 'karate.tree']

Full options reference

This guide covers the options you reach for most often. For the complete list, with every flag, default, and description, use any of these:

from infomap import InfomapOptions
help(InfomapOptions)

From the command line, infomap --help lists the common options, infomap -hh adds the advanced ones, and infomap --print-json-parameters prints the full machine-readable catalog. The API options reference renders the same InfomapOptions fields as a searchable web page.

References

The options in this guide come from the map equation literature. The survey (Smiljanić et al., 2026) ties them together; the papers below introduced each one.

Foundations

Hierarchy and resolution (two_level, preferred_number_of_levels, markov_time, variable_markov_time)

Flow model and teleportation (flow_model, directed, teleportation_probability, recorded_teleportation)

Reliability and regularization (num_trials, converge, regularized, entropy_corrected)

Higher-order, multilayer, metadata, and bipartite (multilayer_relax_*, meta_data, bipartite options)

How to cite

If you use Infomap in published work, mapequation.org recommends citing two things: the map equation paper (Rosvall and Bergstrom, 2008, https://doi.org/10.1073/pnas.0706851105) for the method, and the MapEquation software package (Edler, Holmgren, and Rosvall) for the implementation and version. See How to cite for BibTeX.

Where to go next