Skip to content

API Reference

maldet.types

Core dataclasses shared across maldet layers.

MetricReport dataclass

Return value of Evaluator.evaluate().

to_json_dict()

Serialize into the wire shape of metrics.json (schema_version=1).

Sample dataclass

A single binary sample.

label is None during predict. metadata is a per-instance mutable dict — frozen=True forbids reassigning the field, not mutating the contained dict.

TrainResult dataclass

Return value of Trainer.fit().

model is the trained estimator or LightningModule. best_checkpoint is set by Lightning when a ModelCheckpoint callback ran. extras is a free-form dict that round-trips into events.

maldet.protocols

Runtime-checkable Protocols for maldet's six layers + EventLogger.

Protocols use structural typing — implementations do not need to inherit. @runtime_checkable enables isinstance(obj, Trainer) for pipeline-assembly validation.

maldet.events

CompositeEventLogger — fans out to N loggers, isolating failures.

CompositeEventLogger

Forwards every call to every wrapped logger.

A delegate raising is caught and logged at WARNING; other delegates still run. This protects the training loop from a broken MLflow / filesystem from killing the detector run.

JSONL event logger — append-only, fsync per event.

JsonlEventLogger

Writes one NDJSON line per event to path.

Each write is followed by os.fsync so a pod kill does not lose events in the page cache. Parent directory is created if missing.

Stdout event logger — prefixed JSON line per event.

StdoutEventLogger

Writes maldet.event: {json}\n lines to stdout.

MLflow-backed event logger.

Keeps the mlflow import optional — if the caller passes mlflow=None and mlflow is not importable, log_* methods silently no-op. This makes MLflow a soft dependency (install maldet[mlflow] to enable).

MlflowEventLogger

log_event(kind, **payload)

Flatten event payload into tags named maldet.<kind>.<field>.

Metric events are NOT forwarded here — the caller calls log_metric for those.

maldet.builtins

Built-in sample readers.

SampleCsvReader

Reads a sample_csv contract CSV: columns file_name[,label].

Resolves each sample path under samples_root/<sha[:2]>/<sha>.

When strict=False (default), missing sample files are skipped with no error (lolday frequently produces CSVs that reference samples not yet present; platform guarantees the SHA is valid, not that the byte stream is).

Built-in predictor: batch prediction over a SampleReader.

BatchPredictor

Iterate samples, extract features, call model.predict in one batch.

Writes a CSV with the required columns file_name, pred_label, pred_score. Extra columns are added as pred_prob_<class> when predict_proba is available.

maldet.evaluators

Binary-classification evaluator.

BinaryClassification

Binary classification metrics using sklearn.metrics.

Runs model.predict once over the whole reader. Optionally calls predict_proba if available to compute ROC-AUC.

maldet.trainers

SklearnTrainer — thin wrapper around sklearn estimator.fit/predict/proba.

SklearnTrainer

Trainer for scikit-learn-compatible estimators (fit + predict).

Lightning-based Trainer for deep-learning detectors.

Reads the platform-injected env vars MALDET_GPU_COUNT and MALDET_DISTRIBUTED_STRATEGY to pick accelerator, devices, and strategy for lightning.Trainer.

LightningTrainer

PyTorch Lightning-based Trainer.

MaldetLightningLogger

Bases: Logger

Adapter from Lightning's Logger API onto maldet EventLogger.

MaldetProgressCallback

Bases: Callback

Emits epoch_begin / epoch_end events through the EventLogger.

maldet.manifest

Detector manifest — Pydantic model for maldet.toml and helpers.

DetectorManifest

Bases: _Frozen

The full manifest (maldet.toml root).

ManifestNotFoundError

Bases: FileNotFoundError

Raised by search_manifest when no maldet.toml is discoverable.

search_manifest()

Return the first manifest path found in: 1. $MALDET_MANIFEST env var (absolute path) 2. $PWD/maldet.toml 3. /app/maldet.toml (the scaffold Docker WORKDIR)

maldet.runner

StageRunner — loads manifest, composes layers via manifest symbols, drives stage.

StageRunner

Orchestrates a single stage (train / evaluate / predict).