Skip to content

chow_liu

chow_liu ¤

ChowLiuTree(data, input_type, root=None, chunk_size=None, num_categories=None, num_bins=None, as_region_graph=True) ¤

Learns a Chow-Liu Tree and returns it either as a list of predecessors (Bayesian net) or as region graph (HCLT).

See
  • Approximating discrete probability distributions with dependence trees 🔗 CKCN Chow and Cong Liu. In IEEE transactions on Information Theory, 14(3):462–467, 1968b.

  • What is the Relationship between Tensor Factorizations and Circuits (and How Can We Exploit it)? 🔗 Lorenzo Loconte and Antonio Mari and Gennaro Gala and Robert Peharz and Cassio de Campos and Erik Quaeghebeur and Gennaro Vessio and Antonio Vergari In Transactions on Machine Learning Research, 2025.

Parameters:

Name Type Description Default
data Tensor

The input data over which running the CLT algorithm, it must be in tabular form (i.e. a matrix).

required
input_type str | list

The type of the input data, e.g. 'categorical', 'gaussian'. If a list is provided, then each feature is treated differently according to its type.

required
root int | None

The index of the variable desired as root.

None
chunk_size int | None

Chunked computation, useful in case of large input data.

None
num_categories int | None

Specifies the number of categories in case of categorical data.

None
num_bins int | None

In case of categorical input, it is used to rescale categories in bins for ordinal features, e.g. [0, 255] -> [0, 7], which is useful for images.

None
as_region_graph Optional[bool]

True to returns a region graph, False to return a list of predecessors. Defaults to True.

True

Returns:

Type Description
ndarray | RegionGraph

A Chow-Liu Tree, either a list of predecessors or as a region graph.

Raises:

Type Description
ValueError

If the number of categories has not been specified but the number of bins has.

NotImplementedError

If the input type is neither 'categorical' nor 'gaussian'.

Source code in cirkit/templates/region_graph/algorithms/chow_liu.py
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
def ChowLiuTree(
    data: Tensor,
    input_type: str | list[str],
    root: int | None = None,
    chunk_size: int | None = None,
    num_categories: int | None = None,
    num_bins: int | None = None,
    as_region_graph: bool = True,
) -> np.ndarray | RegionGraph:
    """Learns a Chow-Liu Tree and returns it either as a
    list of predecessors (Bayesian net) or as region graph (HCLT).

    See:
        - *Approximating discrete probability distributions with dependence trees* [🔗](https://ieeexplore.ieee.org/abstract/document/1054142)
          CKCN Chow and Cong Liu.
          In IEEE transactions on Information Theory, 14(3):462–467, 1968b.

        - *What is the Relationship between Tensor Factorizations and Circuits (and How Can We Exploit it)?* [🔗](https://openreview.net/forum?id=Y7dRmpGiHj)
          Lorenzo Loconte and Antonio Mari and Gennaro Gala and Robert Peharz and Cassio de Campos and Erik Quaeghebeur and Gennaro Vessio and Antonio Vergari
          In Transactions on Machine Learning Research, 2025.

    Args:
        data (Tensor): The input data over which running the CLT algorithm,
            it must be in tabular form (i.e. a matrix).
        input_type (str | list): The type of the input data, e.g. 'categorical', 'gaussian'.
            If a list is provided, then each feature is treated differently according to its type.
        root (int | None): The index of the variable desired as root.
        chunk_size (int | None): Chunked computation, useful in case of large input data.
        num_categories (int | None): Specifies the number of categories in case of
            categorical data.
        num_bins (int | None): In case of categorical input, it is used to rescale
            categories in bins for ordinal features, e.g. [0, 255] -> [0, 7],
            which is useful for images.
        as_region_graph (Optional[bool]): True to returns a region graph,
            False to return a list of predecessors. Defaults to True.

    Returns:
        A Chow-Liu Tree, either a list of predecessors or as a region graph.

    Raises:
        ValueError: If the number of categories has not been specified but the number of bins has.
        NotImplementedError: If the input type is neither 'categorical' nor 'gaussian'.
    """
    assert data.ndim == 2
    assert root is None or -1 < root < data.size(-1)
    if isinstance(input_type, list):
        mutual_info = _heterogeneous_mutual_info(
            data, is_categorical_mask=[name == "categorical" for name in input_type]
        )
    elif input_type == "categorical":
        if num_bins is not None:
            if num_categories is None:
                raise ValueError("Number of categories must be known if rescaling in bins")
            data = torch.div(data, num_categories // num_bins, rounding_mode="floor")
        mutual_info = _categorical_mutual_info(
            data.long(), num_categories=num_categories, chunk_size=chunk_size
        )
    elif input_type == "gaussian":
        # todo: implement chunked computation
        mutual_info = -0.5 * torch.log(1 - torch.corrcoef(data.t()) ** 2)
    else:
        raise NotImplementedError(f"MI computation not implemented for {input_type} input units")

    _, tree = _maximum_spanning_tree(adj_matrix=mutual_info, root=root)
    if as_region_graph:
        return tree2rg(tree)
    return tree