Coverage
What OpenAlgo supports today
We are honest about what works and what doesn't. This page shows the maturity of each template family we use to generate code from QSAR and molecular property prediction papers. If a domain isn't listed here, we don't support it yet — and we'd rather say so upfront than produce unreliable output.
Template families
Fingerprint / descriptor classification
StableBinary endpoint prediction using molecular fingerprints (Morgan, MACCS) and classical descriptors with Random Forest, SVM, and XGBoost classifiers. Covers the most common QSAR classification workflow in published literature.
- —Multi-class targets require manual configuration after generation.
- —Custom descriptor calculators beyond RDKit built-ins are not yet supported.
- “Predicting hERG channel blockers using Morgan fingerprints and random forest ensembles”
- “MACCS-key SVM models for Ames mutagenicity prediction”
Fingerprint / descriptor regression
StableContinuous target prediction (solubility, logP, pIC50) using the same featurization pipeline as classification but with regression heads. Supports standard error metrics and applicability domain estimation.
- —Multi-task regression (simultaneous prediction of multiple endpoints) is not yet supported.
- —Uncertainty quantification is limited to ensemble variance; conformal prediction is on the roadmap.
- “Aqueous solubility prediction with extended-connectivity fingerprints and gradient-boosted trees”
- “Random forest regression models for lipophilicity using 2D molecular descriptors”
Graph neural network classification
BetaMessage-passing neural networks (MPNN) and graph convolutional networks (GCN) for binary classification directly on molecular graphs. Handles atom and bond features from standard featurization.
- —Does not yet support edge features in message passing.
- —Attention-based pooling variants (GAT, GATv2) are experimental.
- —Training hyperparameters are set to sensible defaults; full hyperparameter search scaffolding is planned.
- “Graph convolutional networks for toxicity prediction on ToxCast endpoints”
- “MPNN-based virtual screening for kinase inhibitors”
Graph neural network regression
BetaSame graph architecture as GNN classification but configured for continuous targets. Supports standard regression losses and evaluation metrics for molecular property prediction.
- —Edge features and attention-based pooling share the same limitations as GNN classification.
- —Transfer learning from pre-trained graph models is not yet integrated.
- —Large-scale datasets (>500k molecules) may require manual batch-size tuning.
- “Predicting aqueous solubility with message-passing neural networks”
- “GCN regression models for binding affinity on PDBbind”
Evaluation / split wrapper
StableReusable scaffolding for dataset splitting and evaluation. Includes scaffold split, temporal split, and stratified k-fold with proper leakage prevention. Designed to wrap any of the above template families.
- —Custom split functions must follow a specific callable signature; documentation for this is being expanded.
- —Temporal splits require an explicit date column in the source data.
- “Impact of scaffold splitting on predictive performance in molecular property models”
- “Avoiding data leakage in QSAR: a benchmark of splitting strategies”
What's not supported (yet)
We intentionally exclude the following domains. In each case we either cannot produce reliable output or the workflow requires tooling that sits outside our current architecture.
Have a paper that doesn't fit any of these templates?
Let us know