Algorithms Reference =================== This page provides detailed documentation for all algorithms available in CausalBoundingEngine. .. note:: All algorithms are implementations of published research. For complete citations, references, and proper attribution, see the :doc:`references` page. Please cite the original papers when using these algorithms in your research. Algorithm Overview ------------------ .. list-table:: Algorithm Comparison :header-rows: 1 :widths: 20 15 15 15 15 20 * - Algorithm - ATE - PNS - Scenarios - Dependencies - Notes * - Manski - ✓ - ✗ - BinaryConf - Core - Most conservative * - TianPearl - ✗ - ✓ - BinaryConf - Core - PNS only * - Autobound - ✓ - ✓ - BinaryConf, BinaryIV - Core - Optimization-based * - EntropyBounds - ✓ - ✓ - BinaryConf - Core - Requires theta parameter * - Causaloptim - ✓ - ✓ - BinaryConf, BinaryIV - R - Symbolic derivation * - Zaffalonbounds - ✓ - ✓ - BinaryConf, BinaryIV - Java - Credal networks * - ZhangBareinboim - ✓ - ✗ - ContIV - Core - Continuous IV Core Algorithms --------------- Manski Bounds ~~~~~~~~~~~~~ **Reference**: Manski, C. F. (1990). Nonparametric Bounds on Treatment Effects. *The American Economic Review*, 80(2), 319-323. See :doc:`references` for complete citation. **Description**: Provides the most conservative bounds on the ATE under no additional assumptions beyond the observed data. These bounds are derived by considering the worst-case scenarios for the unobserved potential outcomes. **Mathematical Foundation**: For binary outcomes, the ATE bounds are: .. math:: ATE_{lower} = P(Y=1 \mid X=1) \cdot P(X=1) - P(Y=1 \mid X=0) \cdot P(X=0) - P(X=1) ATE_{upper} = P(Y=1 \mid X=1) \cdot P(X=1) + P(X=0) - P(Y=1 \mid X=0) \cdot P(X=0) **Usage**: .. code-block:: python from causalboundingengine.scenarios import BinaryConf import numpy as np X = np.array([0, 1, 1, 0, 1]) Y = np.array([1, 0, 1, 0, 1]) scenario = BinaryConf(X, Y) bounds = scenario.ATE.manski() print(f"Manski bounds: {bounds}") **Properties**: - No additional assumptions required - Always provides valid bounds - Most conservative (widest intervals) - Fast computation - Only available for ATE **When to use**: - As a baseline for comparison - When no additional assumptions can be justified - For quick exploratory analysis Tian-Pearl Bounds ~~~~~~~~~~~~~~~~~ **Reference**: Tian, J., & Pearl, J. (2000). Probabilities of Causation: Bounds and Identification. *Annals of Mathematics and Artificial Intelligence*, 28(1-4), 287-313. See :doc:`references` for complete citation. **Description**: Nonparametric bounds that use the joint distribution of treatment and outcome to derive bounds for PNS (Probability of Necessity and Sufficiency). **Mathematical Foundation**: For PNS: .. math:: PNS_{lower} = 0 PNS_{upper} = P(Y=1, X=1) + P(Y=0, X=0) **Usage**: .. code-block:: python from causalboundingengine.scenarios import BinaryConf import numpy as np X = np.array([0, 1, 1, 0, 1]) Y = np.array([1, 0, 1, 0, 1]) scenario = BinaryConf(X, Y) pns_bounds = scenario.PNS.tianpearl() print(f"Tian-Pearl PNS: {pns_bounds}") **Properties**: - Only available for PNS - Fast computation - No additional parameters - Nonparametric approach **When to use**: - For PNS estimation - When you need nonparametric PNS bounds Autobound ~~~~~~~~~ **Reference**: Duarte, G., Finkelstein, N., Knox, D., Mummolo, J., & Shpitser, I. (2023). An Automated Approach to Causal Inference in Discrete Settings. *Journal of the American Statistical Association*, 1-12. See :doc:`references` for complete citation. **Description**: A general-purpose algorithm that formulates causal bounding as a linear programming problem. Can handle complex causal graphs and both confounded and IV settings. **Mathematical Foundation**: Autobound represents the causal problem using: - Decision variables for each potential outcome type - Constraints matching observed distributions - Linear programming optimization **Usage**: .. code-block:: python from causalboundingengine.scenarios import BinaryConf, BinaryIV import numpy as np # Confounded setting X = np.array([0, 1, 1, 0, 1]) Y = np.array([1, 0, 1, 0, 1]) scenario = BinaryConf(X, Y) bounds = scenario.ATE.autobound() # IV setting Z = np.array([0, 1, 1, 0, 1]) scenario_iv = BinaryIV(X, Y, Z) bounds_iv = scenario_iv.ATE.autobound() **Properties**: - Works with both confounded and IV settings - Available for both ATE and PNS - Principled optimization approach - Moderate computation time **When to use**: - When you need a general-purpose algorithm - For IV settings where other algorithms aren't available - When you want theoretically grounded bounds EntropyBounds ~~~~~~~~~~~~~ **Reference**: Jiang, Z., Wei, L., & Kocaoglu, M. (2023). Approximate Causal Effect Identification under Weak Confounding. *Proceedings of the 40th International Conference on Machine Learning*, PMLR 202:15125-15143. See :doc:`references` for complete citation. **Description**: Uses mutual information constraints to bound causal effects under the assumption of "weak confounding" - The confounder entropy is lower than some θ. **Mathematical Foundation**: The algorithm constrains the mutual information between potential outcomes and treatment: .. math:: H(U) \leq \theta where θ is a user-specified parameter controlling the strength of confounding (i.e. it's entropy). **Usage**: .. code-block:: python from causalboundingengine.scenarios import BinaryConf import numpy as np X = np.array([0, 1, 1, 0, 1]) Y = np.array([1, 0, 1, 0, 1]) scenario = BinaryConf(X, Y) # Different theta values give different bounds strict_bounds = scenario.ATE.entropybounds(theta=0.1) # Strong assumption loose_bounds = scenario.ATE.entropybounds(theta=0.9) # Weak assumption print(f"Strict bounds (θ=0.1): {strict_bounds}") print(f"Loose bounds (θ=0.9): {loose_bounds}") **Parameters**: - **theta** (float): Information constraint level in [0, 1]. Lower values give tighter bounds but require stronger assumptions. **Properties**: - Requires theta parameter (no default) - Available for both ATE and PNS - Uses convex optimization - Sensitive to theta choice **When to use**: - When you can justify weak confounding assumptions - For sensitivity analysis across different theta values - When domain knowledge suggests limited confounding External Engine Algorithms --------------------------- Causaloptim ~~~~~~~~~~~ **Dependencies**: R, rpy2, causaloptim R package **Reference**: Sachs, M. C., Sjölander, A., & Gabriel, E. E. (2022). A General Method for Deriving Tight Symbolic Bounds on Causal Effects. *Journal of Computational and Graphical Statistics*, 31(2), 496-510. See :doc:`references` for complete citation. **Description**: Uses symbolic computation to derive analytic bounds on causal effects. Integrates with the R package ``causaloptim`` for graph specification and optimization. **Usage**: .. code-block:: python from causalboundingengine.scenarios import BinaryConf, BinaryIV import numpy as np # Confounded setting X = np.array([0, 1, 1, 0, 1]) Y = np.array([1, 0, 1, 0, 1]) scenario = BinaryConf(X, Y) try: bounds = scenario.ATE.causaloptim() print(f"Causaloptim bounds: {bounds}") except ImportError: print("R support not available") # IV setting Z = np.array([0, 1, 1, 0, 1]) scenario_iv = BinaryIV(X, Y, Z) bounds_iv = scenario_iv.ATE.causaloptim() **Parameters**: - **r_path** (str, optional): Custom path to R executable **Properties**: - Symbolic derivation of bounds - Works with both confounded and IV settings - Available for both ATE and PNS - Requires R installation **Installation**: .. code-block:: bash # Install R support pip install causalboundingengine[r] **When to use**: - When you want symbolically derived bounds - For complex causal graphs - When R environment is available Zaffalonbounds ~~~~~~~~~~~~~~ **Dependencies**: Java, jpype1, CREMA/CREDICI libraries **Reference**: Zaffalon, M., Antonucci, A., Cabañas, R., Huber, D., & Azzimonti, D. (2022). Bounding Counterfactuals under Selection Bias. *Proceedings of The 11th International Conference on Probabilistic Graphical Models*, 289-300. Uses CREMA and CREDICI libraries. See :doc:`references` for complete citation. **Description**: Uses credal networks and EM-based learning to compute bounds. Based on the CREMA and CREDICI Java libraries developed at IDSIA. **Usage**: .. code-block:: python from causalboundingengine.scenarios import BinaryConf, BinaryIV import numpy as np # Confounded setting X = np.array([0, 1, 1, 0, 1]) Y = np.array([1, 0, 1, 0, 1]) scenario = BinaryConf(X, Y) try: bounds = scenario.ATE.zaffalonbounds() print(f"Zaffalonbounds: {bounds}") except ImportError: print("Java support not available") **Properties**: - Uses credal network inference - EM-based parameter learning - Works with both confounded and IV settings - Available for both ATE and PNS - Requires Java installation **Installation**: .. code-block:: bash # Install Java support pip install causalboundingengine[java] **When to use**: - When you want Bayesian-style bounds - For complex probabilistic reasoning - When Java environment is available Specialized Algorithms ---------------------- ZhangBareinboim ~~~~~~~~~~~~~~~ **Reference**: Zhang, J., & Bareinboim, E. (2021). Bounding Causal Effects on Continuous Outcome. *Proceedings of the AAAI Conference on Artificial Intelligence*, 35(13), 12207-12215. See :doc:`references` for complete citation. **Description**: Designed specifically for continuous instrumental variable settings. Uses linear programming to handle compliance types in IV analysis. **Usage**: .. code-block:: python from causalboundingengine.scenarios import ContIV import numpy as np # Continuous data (will be discretized internally) Z = np.random.normal(0, 1, 100) # Instrument X = Z + np.random.normal(0, 0.5, 100) # Treatment Y = X + np.random.normal(0, 0.5, 100) # Outcome scenario = ContIV(X, Y, Z) bounds = scenario.ATE.zhangbareinboim() **Properties**: - Specifically for continuous IV settings - Handles compliance types automatically - Only available for ATE - Uses linear programming **When to use**: - With continuous instrumental variables - When compliance patterns are complex - For rigorous IV analysis Algorithm Implementation Details -------------------------------- Error Handling ~~~~~~~~~~~~~~ All algorithms implement consistent error handling: .. code-block:: python import logging logging.basicConfig(level=logging.WARNING) # Failed algorithms return trivial bounds scenario = BinaryConf(X, Y) bounds = scenario.ATE.some_algorithm() # Check for trivial bounds if bounds == (-1.0, 1.0): # ATE trivial bounds print("Algorithm failed, returned trivial bounds") if bounds == (0.0, 1.0): # PNS trivial bounds print("Algorithm failed, returned trivial bounds") Performance Characteristics ~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. list-table:: Typical Performance :header-rows: 1 :widths: 30 20 50 * - Algorithm - Speed - Notes * - Manski - Very Fast - Simple calculations * - TianPearl - Very Fast - Simple calculations (PNS only) * - Autobound - Moderate - Linear programming * - EntropyBounds - Moderate - Convex optimization * - Causaloptim - Slow - R interface overhead * - Zaffalonbounds - Very Slow - Java interface + EM algorithm * - ZhangBareinboim - Moderate - Linear programming Memory Usage ~~~~~~~~~~~~ Most algorithms have modest memory requirements, but some considerations: - **Zaffalonbounds**: May need increased JVM heap size for large datasets - **Autobound**: Linear programming may use significant memory - **EntropyBounds**: Convex optimization scales with data size .. code-block:: python # For large datasets with Java algorithms import jpype jpype.startJVM("-Xmx4g") # 4GB heap size Choosing the Right Algorithm ---------------------------- Decision Tree ~~~~~~~~~~~~~ 1. **What type of data do you have?** - Binary treatment/outcome → Continue to step 2 - Continuous variables → Use ZhangBareinboim (if IV available) 2. **Do you have an instrument?** - Yes → Use Autobound, Causaloptim, or Zaffalonbounds - No → Continue to step 3 3. **What are your computational constraints?** - Need fast results → Use Manski (ATE) or TianPearl (PNS) - Have more time → Consider Autobound, Causaloptim, or Zaffalonbounds 4. **What (further) assumptions can you make?** - Weak confounding → Use EntropyBounds with appropriate theta 5. **What external dependencies do you have?** - Core Python only → Use Manski, Autobound, or EntropyBounds (ATE); TianPearl (PNS) - R available → Consider Causaloptim - Java available → Consider Zaffalonbounds Robustness Strategy ~~~~~~~~~~~~~~~~~~~ For important analyses, consider using multiple algorithms: .. code-block:: python def robust_analysis(X, Y, Z=None): \"\"\"Run multiple algorithms for robustness.\"\"\" if Z is None: scenario = BinaryConf(X, Y) algorithms = ['manski', 'autobound'] else: scenario = BinaryIV(X, Y, Z) algorithms = ['autobound', 'causaloptim', 'zaffalonbounds'] results = {} for alg in algorithms: try: results[alg] = getattr(scenario.ATE, alg)() except Exception as e: print(f"Failed {alg}: {e}") return results # Compare results bounds_dict = robust_analysis(X, Y) for alg, bounds in bounds_dict.items(): print(f"{alg}: {bounds}") This approach helps identify: - Consensus across methods - Algorithms that may be failing - Sensitivity to different assumptions