AI evaluation and governance for systems that act in the real world

I help organisations understand, evaluate and govern generative and agentic AI systems across prompts, tools, workflows, human judgement and institutional accountability.
PhD in AI Evaluation & Ethics | Former Google Research, Ethical AI | Published AI & Ethics author | Lead Guest Editor, AI Agents: Ethics, Safety and Governance
What I do
I work at the intersection of AI evaluation, responsible AI, governance and public communication. My work helps organisations move beyond generic AI principles toward practical systems for testing, accountability, oversight and assurance. Most AI evaluation still measures outputs. I focus on how systems behave in deployment.

AI evaluation
I design methods to assess how AI systems behave in real-world conditions, not just benchmark tasks or isolated outputs. This includes testing prompt sensitivity, workflow behaviour, and how systems perform across changing contexts.
AI governance
I map where responsibility, judgement and accountability move when AI enters organisational workflows. This includes prompts, tools, retrieval, memory, interface design, and evaluation metrics as governance layers.
Agentic AI risk
I help organisations prepare for systems that retrieve, route, act, remember and trigger downstream consequences. This includes evaluating behaviour over time, not just outputs, and designing oversight, logging and intervention points.
Core Ideas
Most AI governance still focuses on models and outputs. In practice, risk and responsibility emerge through how systems are configured, used and evaluated. These core ideas provide a working framework for understanding AI as a dynamic, sociotechnical system rather than a static tool.

These ideas shift AI governance from static outputs to dynamic systems. The focus moves to configuration, trajectory and evaluation as active governance layers, where behaviour is shaped, responsibility is distributed, and accountability must be designed rather than assumed.
Evaluation and governance of AI
Effective AI governance depends on how systems are evaluated. What organisations choose to measure shapes what gets optimised, surfaced and treated as acceptable behaviour.
In real deployments, the model itself is only one part of the system. Risk and accountability sit across the full configuration, including prompts, tools, retrieval, memory, interfaces, workflows and human oversight. The same underlying model can produce very different outcomes depending on how it is implemented.
AI does not eliminate human judgement. It redistributes it across system design choices such as prompts, defaults, metrics and operational processes. Governance therefore requires making those decision points visible and accountable.
Finally, outputs alone are not sufficient evidence. They are the end result of path-dependent processes. Understanding how a system arrives at an outcome, including its trajectory through prompts, tools and context, is critical for robust evaluation and oversight.

Thesis
My PhD thesis, Measuring the Machine: Evaluating Generative AI as Pluralist Sociotechnical Systems, is now available on arXiv.
The thesis argues that AI evaluation is not neutral measurement. It shapes what models appear to be, what organisations optimise, and whose values become visible in practice.
Read the thesis: https://arxiv.org/abs/2604.20545
About Rebecca
Dr Rebecca L. Johnson is an AI evaluation and governance expert whose work examines how generative and agentic AI systems behave in real-world sociotechnical contexts. She holds a PhD in AI Evaluation and Ethics from the University of Sydney and was formerly a researcher in Google Research’s Ethical AI team. Her work spans AI evaluation, pluralist benchmarking, cultural value drift, agentic AI governance, public policy and responsible AI practice.
Click here for Media coverage and talks
See the Researcher page for a CV-snapshot, media bio, social media links, and to contact me. My blog contains a lot of easily digestible takes on AI Ethics.
Advisory, speaking and workshops
I provide expert briefings, workshops and advisory support for organisations navigating generative AI governance, evaluation and accountability.
- AI governance briefings
- Agentic AI risk and accountability workshops
- AI evaluation and assurance design
- Responsible AI strategy
- Public talks, panels and media commentary
For speaking, advisory or workshop enquiries, contact me here
Continue exploring: Research | Consulting | Media | Publications | Connect
