Run Infomap on GraphRAG-style tables

GraphRAG pipelines build a community hierarchy over an entity graph, usually with Leiden. Infomap is a drop-in alternative that finds communities by compressing flow rather than maximizing modularity. This notebook reads the GraphRAG entities.parquet and relationships.parquet tables, runs Infomap, writes GraphRAG-style community outputs, and compares the result with Leiden.

For a first Infomap run, see the quickstart; for the options used here, see the options guide.

from pathlib import Path
import tempfile

import pandas as pd

from infomap.graphrag import read_graphrag, run_graphrag_communities

try:
    import igraph as ig
except ImportError:
    ig = None

Build a small entity graph

GraphRAG stores entities and their relationships as two Parquet tables. Here you build a tiny example: two triangles, Alpha-Beta-Gamma and Delta-Epsilon-Zeta, joined by one weak link, so the community structure is easy to check.

work_dir = Path(tempfile.mkdtemp(prefix="infomap-graphrag-"))
input_dir = work_dir / "input"
output_dir = work_dir / "infomap"
input_dir.mkdir()

entities = pd.DataFrame(
    {
        "id": ["a", "b", "c", "d", "e", "f"],
        "title": ["Alpha", "Beta", "Gamma", "Delta", "Epsilon", "Zeta"],
    }
)
relationships = pd.DataFrame(
    {
        "id": ["ab", "bc", "ca", "de", "ef", "fd", "cd"],
        "source": ["Alpha", "Beta", "Gamma", "Delta", "Epsilon", "Zeta", "Gamma"],
        "target": ["Beta", "Gamma", "Alpha", "Epsilon", "Zeta", "Delta", "Delta"],
        "weight": [2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 1.0],
    }
)

entities.to_parquet(input_dir / "entities.parquet")
relationships.to_parquet(input_dir / "relationships.parquet")

Read the tables into Infomap

read_graphrag maps entity titles to node ids and loads the weighted edges. The table below shows each relationship with the node ids Infomap will use.

graph = read_graphrag(
    input_dir / "entities.parquet", input_dir / "relationships.parquet"
)

relationships.assign(source_node=graph.sources, target_node=graph.targets)
id source target weight source_node target_node
0 ab Alpha Beta 2.0 1 2
1 bc Beta Gamma 2.0 2 3
2 ca Gamma Alpha 2.0 3 1
3 de Delta Epsilon 3.0 4 5
4 ef Epsilon Zeta 3.0 5 6
5 fd Zeta Delta 3.0 6 4
6 cd Gamma Delta 1.0 3 4

Run Infomap

run_graphrag_communities clusters the graph and writes GraphRAG-style outputs to output_dir. Set seed and num_trials for a reproducible, stable result, the same way you would for any Infomap run.

result = run_graphrag_communities(
    input_dir=input_dir,
    output_dir=output_dir,
    silent=True,
    seed=123,
    num_trials=5,
)

result.infomap.codelength
1.9831517459081185

Inspect the communities

Infomap returns the per-node assignments and a GraphRAG-style community table with one row per community, its level in the hierarchy, and its size.

infomap_nodes = result.nodes
infomap_nodes
node_id entity_id entity_title module_id module_path level flow
0 1 a Alpha 1 [1] 1 0.12500
1 2 b Beta 1 [1] 1 0.12500
2 3 c Gamma 1 [1] 1 0.15625
3 4 d Delta 2 [2] 1 0.21875
4 5 e Epsilon 2 [2] 1 0.18750
5 6 f Zeta 2 [2] 1 0.18750
infomap_communities = result.communities
infomap_communities
id human_readable_id community parent children level title entity_ids relationship_ids text_unit_ids period size
0 infomap-1 1 1 -1 [] 0 Infomap community 1 [a, b, c] [ab, bc, ca] [] None 3
1 infomap-2 2 2 -1 [] 0 Infomap community 2 [d, e, f] [de, ef, fd] [] None 3

Compare with Leiden

Leiden is GraphRAG’s default community detector. On this graph both methods recover the two triangles. Infomap reports the result as a flow-based hierarchy, while Leiden returns a flat modularity partition.

def _top_level_groups(communities):
    return [
        sorted(entity_ids)
        for entity_ids in communities.loc[communities["level"] == 0, "entity_ids"]
    ]


comparison_rows = [
    {
        "method": "Infomap",
        "number of communities": len(infomap_communities),
        "levels": int(infomap_communities["level"].nunique()),
        "largest community size": int(infomap_communities["size"].max()),
        "entity groups at top level": _top_level_groups(infomap_communities),
    }
]

if ig is None:
    print("Install python-igraph to run the Leiden comparison.")
else:
    leiden_graph = ig.Graph.TupleList(
        relationships[["source", "target", "weight"]].itertuples(
            index=False, name=None
        ),
        directed=False,
        edge_attrs=["weight"],
    )
    leiden_partition = leiden_graph.community_leiden(
        weights="weight",
        objective_function="modularity",
    )
    leiden_groups = [
        sorted(leiden_graph.vs[vertex]["name"] for vertex in community)
        for community in leiden_partition
    ]
    comparison_rows.append(
        {
            "method": "Leiden",
            "number of communities": len(leiden_partition),
            "levels": 1,
            "largest community size": max(len(group) for group in leiden_groups),
            "entity groups at top level": leiden_groups,
        }
    )

pd.DataFrame(comparison_rows)
method number of communities levels largest community size entity groups at top level
0 Infomap 2 1 3 [[a, b, c], [d, e, f]]
1 Leiden 2 1 3 [[Alpha, Beta, Gamma], [Delta, Epsilon, Zeta]]

Where to go next