Large Causal Models on Time Series


  • Foundation for Research and Technology, Hellas (FORTH)
  • Huawei Ireland Research Center - AIOps Team
  • Huawei Dongguan R&D Campus

Large Causal Model

A Large Causal Model (LCM) for causal discovery is a model that unveils the cause-and-effect relationships between variables in complex systems. These models are particularly useful for understanding how changes in one factor lead to changes in another, rather than just identifying correlations. Additionally, they help identify which variables influence others, especially in complex or high-dimensional datasets where traditional methods might miss key connections.

Approach Discover causal connections Fast inference Scale to large data quantity
Traditional Causal Discovery Methods Yes No No
Large Causal Models Yes Yes Yes

How does it work?

The LCM takes as input a dataset of time series and automatically predicts the full-time graph representing the causal relationships between the time series over a time window period. Such graph is then condensed in a simpler representation, the summary graph, in which each time series is represented as a node, and each discovered cause-effect relationship is represented as a directed link.

Prediction Pipeline

Try it yourself!

0. Setup

The setup phase is divided into three steps:

Currently, two pretrained LCM models of different sizes are available:

1. Import the Required Libraries

At first, import the necessary modules for data generation, model prediction, and result visualization:


        from pathlib import Path
        from utils.model_wrapper import Architecture_PL
        from utils.cp_utils import set_seed, create_example_data, run_cp_and_parse_res
        from utils.plotting_utils import plot_summary_from_pred
        

2. Load the Data

The LCM takes as input a temporal dataset of shape (N, D) where N is the sample size and D the feature size (number of time-series). In this example, we generate synthetic data with 1000 time samples, where each column represents a different time series. Data are Min-max normalized and random seed set to 42 for reproducibility.


        set_seed(42)
        
        df = create_example_data(n=1000)
        variable_names = list(df.columns)
        

3. Load the Pretrained Model

Load the .ckpt pretrained model for causal prediction:


        models_path = 'res'
        model_name = 'lcm_CI_RH_12_3_merged_290k'
        
        model = Architecture_PL.load_from_checkpoint(Path(models_path) / f"{model_name}.ckpt")
        

4. Perform Causal Discovery

Run run_cp_and_parse_res to perform causal discovery on the data. The max_lag parameter specifies the maximum time window size for analyzing causal relationships:


        # Run causal discovery with a maximum lag of 2
        pred = run_cp_and_parse_res(model_name, model=model, df=df, max_lag=2, seed=42)
        

The result is a lagged adjacency tensor of shape (N, N, max_lag) where:

5. Visualize the Results

The predicted causal relationships can be visualized using plot_summary_from_pred. The plt_thr parameter controls the density of the graph: higher values result in fewer edges being displayed.


        plot_summary_from_pred(pred, variable_names, plt_thr=0.5)
        

In the resulting graph, an edge from time series A to B marked as t-1 means that time series A at time t-1 caused time series B at time t.

Output plot of the summary graph.

Paper

Comming soon

Assumptions and Limitations of current model

The last version of the LCMs works under the current assumptions: