Architecture
maldet is a composition framework. Each detector is a pipeline assembled from
six layers; each layer has a single responsibility and a well-defined interface
(a typing.Protocol).
The six layers
- SampleReader — produces an iterable of
Sample(sha256, path, label, metadata). The builtinSampleCsvReaderreads thesample_csvcontract:file_name[,label]columns, sample atsamples_root/<sha[:2]>/<sha>. - FeatureExtractor — transforms one
Sampleinto an ndarray or tensor. Stateless, pure, cacheable. - Model — the estimator or
nn.Module. Contains no I/O, no training logic. - Trainer — runs
fit()and owns serialization. Two engines:SklearnTrainer(joblib) andLightningTrainer(ckpt + DDP-ready). - Evaluator — computes metrics into a
MetricReport. BuiltinBinaryClassificationusessklearn.metrics. - Predictor — batch predictions into the standardized
predictions.csvshape. Online mode is deferred.
Control plane
The maldet CLI is the only entry point. It reads maldet.toml (manifest) and a Hydra YAML config, instantiates the declared layers, and drives a single stage.
See Protocols and Stages & Manifest for the interface details.