Bloom: An Open Source Tool for Automated Behavioral Evaluations
Overview
Bloom is an agentic framework designed to generate targeted evaluation suites for measuring arbitrary behavioral traits in frontier language models without requiring ground-truth labels. The tool enables researchers to develop reproducible, targeted evaluations that quantify behavioral frequency and severity across automatically generated scenarios.
Key Features
- Reproducible Evaluations: Unlike open-ended auditing approaches, Bloom takes researcher-specified behaviors and quantifies their occurrence
- Automated Scenario Generation: Creates diverse test scenarios automatically rather than relying on manual curation
- Strong Correlation: Evaluations correlate strongly with hand-labeled judgments and reliably distinguish baseline models from intentionally misaligned variants
- Accessibility: Designed to be highly configurable and user-friendly for diverse research applications
Purpose and Context
The tool addresses a critical need in AI safety research. As noted, "frontier models exhibit various types of misalignment" including in-context scheming, agentic misalignment, and sycophancy. While researchers develop mitigations for known issues, new forms of misalignment continue emerging as models gain capabilities and encounter more complex deployment environments.
High-quality evaluations remain essential but are resource-intensive and limited in quantity. Bloom enables researchers to skip traditional evaluation pipeline engineering and proceed directly to measuring specific behavioral propensities using a trusted, effective scaffold.
Complementary Tools
Bloom serves a distinct purpose from Petri, an automated auditor released previously. While Petri explores overall behavioral profiles and surfaces new misaligned behaviors, Bloom generates in-depth evaluation suites for specific behaviors, quantifying their severity and frequency.
Benchmark Results
The team released benchmarks measuring four alignment-relevant behaviors across 16 frontier models:
- Delusional sycophancy
- Instructed long-horizon sabotage
- Self-preservation
- Self-preferential bias
These evaluations were conceptualized, refined, and generated in just a few days using Bloom.
Availability
Bloom is available at github.com/safety-research/bloom
Authors: Kai Fronsdal, Abhay Sheshadri, Jonathan Michala, Jacqueline Tay
Contributors: Rowan Wang, Samuel R. Bowman, Sara Price
Published: December 19, 2025