Building a BaseSETPipeline
A BaseSETPipeline defines the execution flow and data contracts for a specific type of target
system. All Security Evaluation Tests (SETs) that target that system type inherit from the
corresponding BaseSETPipeline. For example, all language model SETs inherit from
pipelines.languagemodel.BaseSETPipeline.
In this guide, we will walk through how the pipelines.languagemodel.BaseSETPipeline was designed
and built — covering the data schema, the abstract pipeline class, the four-phase execution model,
and the built-in helper utilities.
Note
If you are looking for a guide on how to use an existing pipeline to build a SET rather
than creating a new pipeline from scratch, see building_set instead. You only need to
build a new BaseSETPipeline if no suitable pipeline already exists under avise/pipelines/
for the type of target system you want to evaluate.
Overview: The 4-Phase Pipeline
Every BaseSETPipeline enforces a strict execution model with well-defined data
contracts between phases. This ensures that any SET built on top of the pipeline is consistent,
testable, and interoperable with the rest of the framework.
Each phase takes the output of the previous phase as its input. The run() method on the base
class orchestrates all four phases in sequence. Concrete SETs override each phase with their own
logic, while the orchestration and helper utilities are provided by the base class.
For clarity, here are the packages used in the construction of the pipeline:
abc.ABC,abstractmethod: Used to declareBaseSETPipelineas an abstract base class and mark the four pipeline phases as abstract methods that concrete SETs must implement.enum.Enum: Used to defineReportFormat, an enumeration of supported output formats.typing: Type hints for all method signatures and instance attributes.datetime: Used to record execution start and end times.math.sqrt,scipy.special.erfinv: Used in the confidence interval calculation helper..schema: The dataclasses that form the data contracts between pipeline phases (covered below).BaseLMConnector: Type hint for the connector passed intoexecute().EvaluationLanguageModel: Optional evaluation language model that concrete SETs may use to assess the model outputs with.
1. Defining the Data Schema
Before writing the pipeline itself, we need to define the dataclasses that act as the data
contracts between phases. These live in schema.py alongside the pipeline. There are five
dataclasses in total, each corresponding to a specific stage in the data flow.
LanguageModelSETCase — Phase 1 output / Phase 2 input
This dataclass represents a single SET case: the minimal unit of work that the pipeline processes.
Every SET case must have an id and a prompt. Any additional data — such as the attack
category or expected behavior — can be stored in the metadata dictionary
so that it is carried through the pipeline and appears in the final report.
ExecutionOutput — Intermediate result per SET case
This dataclass holds the raw output of running a single SET case against the target model. It
captures the original prompt, the model’s response, any metadata carried over from the SET case,
and an optional error field for cases where execution failed. Using a dedicated error
field (rather than raising an exception) allows execution to continue through the remaining SET
cases and report failures cleanly at the evaluation stage.
OutputData — Phase 2 output / Phase 3 input
This dataclass bundles all ExecutionOutput instances together with the total execution
duration. Wrapping outputs and timing in a single object keeps the execute() → evaluate()
contract clean and makes it easy to include execution time in the final report.
EvaluationResult — Phase 3 output / Phase 4 input
This dataclass holds the evaluated result of a single SET case. The status field must be one
of "passed", "failed", or "error". The reason field should explain why that
status was assigned. The detections dictionary stores the raw findings from any evaluators
used, and the optional elm_evaluation field is for the verdict produced by an Evaluation
Language Model, if one was used.
ReportData — Phase 4 output / Final report
This is the top-level dataclass that represents the completed report. It contains the SET name,
a timestamp, total execution time, a summary of pass/fail/error statistics, the full list of
EvaluationResult objects, and the configuration that was used for the run. This object is
what reporters (JSON, HTML, Markdown) consume to write the output file.
2. Defining the Base Pipeline Class
With the data schema in place, we can define the abstract base class itself. BaseSETPipeline
inherits from Python’s ABC and declares the four pipeline phases as abstract methods. It also
holds a set of common instance attributes that all concrete SETs will need — such as references
to the connector configuration path, the target model name, and an optional evaluation model.
ReportFormat — Supported output formats
Before defining the base class, we declare ReportFormat as an Enum to represent the
supported report output formats. Using an enum (rather than raw strings) makes the format
parameter type-safe and self-documenting throughout the codebase.
Class definition and __init__
The class is declared as abstract, which prevents it from being instantiated directly. The
name and description class attributes are left empty here and must be set by every
concrete SET subclass. The SUPPORTED_FORMATS list provides a reference of which report
file formats are supported.
The __init__ method initializes all common instance attributes to None or sensible
defaults. It does not accept any arguments — concrete SETs can extend __init__
to add their own attributes (and must call super().__init__() when doing so).
3. Declaring the Abstract Phase Methods
Each of the four pipeline phases is declared as an @abstractmethod. This enforces the
contract that any concrete SET must implement all four phases. The docstrings on each method
serve as the official specification for what each phase is responsible for and what its inputs
and outputs must be. Concrete implementations should preserve these contracts even when
overriding the methods with their own logic.
initialize()
Responsible for loading the SET configuration and returning a list of LanguageModelSETCase
objects. Every SET case must carry at minimum an id and a prompt; any other
test-specific data belongs in metadata.
execute()
Responsible for running each SET case against the target model via the provided connector and
returning an OutputData object containing one ExecutionOutput per SET case. Errors during
execution should be caught and stored in ExecutionOutput.error rather than propagated as
exceptions, so that the remaining SET cases can still be run.
evaluate()
Responsible for inspecting each ExecutionOutput and producing one EvaluationResult per
output. The status field of each result must be exactly one of "passed", "failed", or
"error", and the reason field must explain why that status was assigned.
report()
Responsible for assembling a ReportData object from the evaluation results and writing a
report file in the requested format to the given output path. The method must return the
ReportData object regardless of the format written.
4. Implementing the run() Orchestrator
The run() method is the only concrete method that directly implements pipeline logic in
the base class. It is called by the Execution Engine and is responsible for invoking the four
phases in order, passing the output of each phase as the input to the next. It also stores
the connector and configuration paths on the instance so that concrete report() implementations
can access them when building the final ReportData object.
Note
run() is intentionally kept minimal. It is a thin orchestrator — it does not contain any
evaluation logic of its own. All domain-specific behaviour lives in the four phase methods,
which concrete SETs override.
Summary: Contracts at a Glance
The table below summarises the full data flow and the contract each phase must honour.
Phase |
Method |
Input → Output |
Key requirements |
|---|---|---|---|
1 |
|
|
Every case must have |
2 |
|
|
One |
3 |
|
|
One result per output; |
4 |
|
|
Must write the report file to |
Building a SET on top of the BaseSETPipeline
With the BaseSETPipeline defined, you can now build SETs on top of it. To see a complete
worked example of how to implement all four phases in a concrete SET, see building_set.
Contributing a new BaseSETPipeline
To confirm that a newly created BaseSETPipeline works as expected, at least one SET is needed
to be built on top of it. Once you have a new BaseSETPipeline and a SET to go with it, they can
be contributed to the main repository for other users to utilize them as well. For details on how to
contribute a Pipeline and a SET to the main repository, check out :ref:`contributing_pipeline.