Compare Infomap and Leiden in a Scanpy workflow

This tutorial shows how to run Infomap on an AnnData neighbor graph and compare the result with Scanpy’s Leiden workflow. It uses a small synthetic dataset to avoid network downloads and to keep the example reproducible.

Infomap reads the sparse observation graph from adata.obsp and writes categorical labels to adata.obs, matching Scanpy tl conventions. Leiden is the standard Scanpy baseline shown here.

from importlib.metadata import version
import warnings

import infomap
import pandas as pd

warnings.filterwarnings("ignore", message="IProgress not found.*")
import scanpy as sc
from sklearn.datasets import make_blobs

print("infomap:", infomap.__version__)
print("scanpy:", version("scanpy"))
print("pandas:", pd.__version__)
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 8
      4 import infomap
      5 import pandas as pd
      6 
      7 warnings.filterwarnings("ignore", message="IProgress not found.*")
----> 8 import scanpy as sc
      9 from sklearn.datasets import make_blobs
     10 
     11 print("infomap:", infomap.__version__)

ModuleNotFoundError: No module named 'scanpy'

Create a small AnnData object

The dataset is synthetic and local. Scanpy builds a nearest-neighbor graph in adata.obsp["connectivities"], which is the same graph used by Scanpy clustering tools and the default graph read by infomap.tl.infomap().

X, truth = make_blobs(
    n_samples=120,
    centers=4,
    n_features=12,
    cluster_std=1.8,
    random_state=123,
)
adata = sc.AnnData(X)
adata.obs["truth"] = pd.Categorical([str(label) for label in truth])

sc.pp.neighbors(adata, n_neighbors=12, random_state=123)
sc.tl.umap(adata, random_state=123)

adata

Run Infomap and Leiden

infomap.tl.infomap() stores labels in adata.obs[key_added] and run metadata in adata.uns[key_added]. Scanpy’s Leiden function is called with the same neighbor graph.

infomap.tl.infomap(
    adata,
    key_added="infomap",
    seed=123,
    num_trials=20,
)

sc.tl.leiden(
    adata,
    key_added="leiden",
    random_state=123,
    flavor="igraph",
    n_iterations=2,
    directed=False,
)

print("Infomap communities:", adata.obs["infomap"].nunique())
print("Leiden communities:", adata.obs["leiden"].nunique())

adata.uns["infomap"]

Compare assignments

The labels are categorical strings because that is the common Scanpy representation for cluster assignments.

comparison = adata.obs[["truth", "infomap", "leiden"]].copy()
comparison.head(10)
from sklearn.metrics import adjusted_mutual_info_score, normalized_mutual_info_score

metrics = pd.DataFrame(
    [
        {
            "method": "infomap",
            "AMI vs truth": adjusted_mutual_info_score(comparison["truth"], comparison["infomap"]),
            "NMI vs truth": normalized_mutual_info_score(comparison["truth"], comparison["infomap"]),
        },
        {
            "method": "leiden",
            "AMI vs truth": adjusted_mutual_info_score(comparison["truth"], comparison["leiden"]),
            "NMI vs truth": normalized_mutual_info_score(comparison["truth"], comparison["leiden"]),
        },
    ]
)
metrics

Visualize the clusters

Color the same UMAP layout by the known synthetic labels and the detected communities.

sc.pl.umap(adata, color=["truth", "infomap", "leiden"], wspace=0.35)

Notes on graph choices

By default, Infomap uses adata.obsp["connectivities"]. Pass neighbors_key when using a named Scanpy neighbor graph, or obsp/adjacency when selecting a graph directly. Use directed=True for directed observation graphs and use_weights=False for unweighted treatment of nonzero sparse entries.

If you use Infomap in published work, cite the Infomap software and the map equation literature. See the repository and Infomap user guide for the current recommended references.