Bloom: An Open Source Tool for Automated Behavioral Evaluations

Overview

Bloom is an agentic framework designed to generate targeted evaluation suites for measuring arbitrary behavioral traits in frontier language models without requiring ground-truth labels. The tool enables researchers to develop reproducible, targeted evaluations that quantify behavioral frequency and severity across automatically generated scenarios.

Key Features

Reproducible Evaluations: Unlike open-ended auditing approaches, Bloom takes researcher-specified behaviors and quantifies their occurrence
Automated Scenario Generation: Creates diverse test scenarios automatically rather than relying on manual curation
Strong Correlation: Evaluations correlate strongly with hand-labeled judgments and reliably distinguish baseline models from intentionally misaligned variants
Accessibility: Designed to be highly configurable and user-friendly for diverse research applications

Purpose and Context

The tool addresses a critical need in AI safety research. As noted, "frontier models exhibit various types of misalignment" including in-context scheming, agentic misalignment, and sycophancy. While researchers develop mitigations for known issues, new forms of misalignment continue emerging as models gain capabilities and encounter more complex deployment environments.

High-quality evaluations remain essential but are resource-intensive and limited in quantity. Bloom enables researchers to skip traditional evaluation pipeline engineering and proceed directly to measuring specific behavioral propensities using a trusted, effective scaffold.

Complementary Tools

Bloom serves a distinct purpose from Petri, an automated auditor released previously. While Petri explores overall behavioral profiles and surfaces new misaligned behaviors, Bloom generates in-depth evaluation suites for specific behaviors, quantifying their severity and frequency.

Benchmark Results

The team released benchmarks measuring four alignment-relevant behaviors across 16 frontier models:

Delusional sycophancy
Instructed long-horizon sabotage
Self-preservation
Self-preferential bias

These evaluations were conceptualized, refined, and generated in just a few days using Bloom.

Availability

Bloom is available at github.com/safety-research/bloom

Authors: Kai Fronsdal, Abhay Sheshadri, Jonathan Michala, Jacqueline Tay

Contributors: Rowan Wang, Samuel R. Bowman, Sara Price

Published: December 19, 2025

Bloom: An Open Source Tool for Automated Behavioral Evaluations ​

Overview ​

Key Features ​

Purpose and Context ​

Complementary Tools ​

Benchmark Results ​