Home

I acknowledge and pay respect to the traditional owners of the land on which I work, the Gadigal people of the Eora Nation and the Darkinjung people of the Central Coast. This land has always been a learning space and is the site of the oldest continuing culture and knowledge system in the world.


PhD in AI Ethics • University educator • Formerly Google Research Ethical AI Team • 2020 “100 Brilliant Women in AI Ethics” LH3 AI Ethics consultant

My name is Rebecca Johnson. I’m a researcher in the Ethics and Philosophy of AI at the University of Sydney, Australia. My work bridges philosophy of AI, measurement theory, and applied ethics. See the Researcher page for a CV-snapshot, media bio, social media links, and to contact me. Other pages on this site are: my blog, a list of links to my media appearances and publications, my consultancy services, and a list of academic publications.


Overview of research

Generative AI systems don’t think or understand as humans do, but their outputs can appear to. What they actually reveal are the values, assumptions, and cultural patterns embedded in their training data and interaction loops. My research investigates how those patterns are produced, reflected, and amplified through model evaluation itself.

Using frameworks from measurement theory, enactivism, and moral value pluralism, I study how human, social, and technical factors co-construct what AI systems seem to “know.” This work develops new descriptive and pluralist methods (such as MaSH Loops and the World Values Benchmark) to reveal whose values are being measured and how they drift across contexts.

I answer: how can evaluation become a more transparent and accountable part of AI governance.

Quick links for this page

  1. Enactivism as a Way of Knowing AI
  2. MaSH Loops (Machine-Society-Human)
  3. The Model is not the Territory
  4. Descriptive Evaluation Calibrated to Human Data
  5. Participatory Realism and Quantum Measurement
  6. Evaluation as Governance

1 • Enactivism as a Way of Knowing AI

Reframing how we understand what AI systems reflect and enact in participation with us
Enactivism offers a more human way of understanding AI. It sees knowledge and meaning not as things stored inside a system, but as something created through interaction. What we learn from AI depends on how we engage with it.

Enactivism: meaning through participation
This illustration expresses the enactivist view that knowledge and meaning are not passively received or stored, but actively brought forth through embodied engagement with the world. Mind and environment continually shape one another through dynamic loops of perception and action.

This approach builds on earlier ways of thinking—like functionalism, which looks at how systems work, and constructivism, which sees knowledge as socially shaped—but adds the dimension of lived participation. It reminds us that meaning arises not just from what machines do or represent, but from how people and systems act together. From this view, evaluation is not only about what a model contains, but about what it brings into being through its interaction with us.

“Functionalism privileges efficiency and performance; constructivism uncovers context and bias; enactivism asks how systems participate in meaning.”
PhD thesis: Ch. 1 — Epistemological Rumbles in Responsible AI


2 • MaSH Loops (Machine – Society – Human)

Mapping feedback and co-construction across sociotechnical systems.
MaSH Loops is a way of studying how machines, societies, and people continually shape one another. It treats generative AI not as a stand-alone tool or predictor, but as part of a living system where design choices, data, user practices, and institutional norms feed back into one another. These loops make visible where values enter—and how they are enacted in return.

MaSH Loops – Machine, Society, Human in the loop.
Meaning and value arise in the spaces where humans, machines, and societies interact and co-create.

The framework builds on the spirit of cybernetics and constructivism, while extending both through enactivism’s focus on participation. Functionalism reminds us that systems have structure; constructivism shows that structure is socially shaped; MaSH Loops brings them together through interaction, mapping how meaning circulates through the recursive ties of design, deployment, and interpretation.

“MaSH Loops—Machine, Society, Human—trace how models, people, and institutions recursively co-construct meaning and values.”
PhD thesis: Ch. 2 — The Ghost in the Machine Has an American Accent


3 • The Model is Not the Territory

Pedagogies for seeing how models make worlds.
Teaching Responsible AI is about more than technical skill or ethical checklists. It’s about helping people understand that every model simplifies and frames the world in its own way. Whether it’s a neural network or a policy diagram, each model highlights some things and leaves others out—choices that shape what we notice, value, and act on.

Side-by-side map projections highlighting how representation choices shape perception.
All models are simplifications
Like maps, AI models highlight some features and omit others, shaping how we see and understand the world.

My approach uses sociotechnical mapping, a method I developed to visualise the feedback loops between people, data, institutions, and machines. By mapping these relationships, learners can spot where assumptions enter, whose perspectives are missing, and how those choices affect outcomes. The process doubles as a validity check, asking whether our models truly capture what matters or simply mirror existing biases.

Sociotechnical System Framework for AI Evaluation
Adapted from sociotechnical systems theory, this framework illustrates how the evaluation of language models emerges from interactions between technical components and social contexts. Benchmark schemas, prompts, datasets, and metrics are shaped by underlying values and assumptions within broader social systems.

Through real-world case studies, this approach turns complex theory into lived insight. Students learn to see modelling as an interpretive act and to use mapping as both an analytical and ethical practice for designing and evaluating AI systems.

“The map is not the territory—but our maps decide which parts of the territory matter.”
PhD thesis: Ch. 3 — The Model is Not the Market


4 • Descriptive Evaluation Calibrated to Human Data

Developing new methods for measuring what AI enacts
Traditional AI benchmarks test how well models perform against fixed targets such as accuracy, bias, or toxicity scores. These measures can be useful but often reveal more about the designers’ assumptions than about what a system actually reflects.

My research develops a new approach called descriptive evaluation. Instead of scoring AI on preset standards, it compares model responses with human survey data to see which cultural or national value patterns they most resemble. This helps us understand what values AI systems reflect, rather than assuming what they should.

Flow linking World Values Survey data to model outputs to produce value profiles.
The World Values Benchmarkc – design overiew
The WVB links human data from the World Values Survey with AI model outputs to show how systems reflect global value patterns and how evaluation choices shape what AI appears to value.

The method builds on principles from measurement theory. It uses carefully designed prompt sets to reduce noise, balanced anchors to prevent framing bias, and Bayesian corrections to adjust for model preferences. The results produce distributional cultural profiles that are stable, interpretable, and comparable across contexts.

This work comes together in the World Values Benchmark (WVB), which links AI outputs with data from the World Values Survey. The framework shifts evaluation from prescriptive scoring toward descriptive comparison—making model behaviour more transparent, plural, and open to contestation.

“Evaluation should be descriptive, pluralist, and enactivist—it should reveal assumptions rather than conceal them.”
PhD thesis: Ch. 4 — The World Values Benchmark


5 • Participatory Realism and Quantum Measurement

Understanding how observation and evaluation co-create meaning.
This area extends my work from systems and methods to the question of observation itself. Participatory realism builds on enactivism’s insight that knowing happens through interaction and adds a further idea: measurement is participatory. In both quantum physics and generative AI, observation does not simply reveal a pre-existing state—it helps bring one into being.

Observation and prompting as participatory acts.
Just as observing a photon changes its pattern in the quantum experiment, prompting an AI helps shape the patterns it produces. In both cases, outcomes are not simply discovered—they are brought into being through interaction.

Generative models can be thought of as vast fields of potential meaning. A prompt acts like a measurement, turning possibility into a specific result. Each output reflects not only the model’s design and data but also the human questions, cultural assumptions, and interpretive context that shape the exchange.

Seen this way, evaluation becomes a kind of measurement: a meeting point between human intention and machine probability. Just as physics shows that the observer cannot stand outside the system, Responsible AI must recognise that our evaluations help shape what AI becomes.

Two paths—hidden-variables vs participatory outcome—illustrating measurement.
No Hidden Variables in Prompting
Adapted from Bell’s theorem, this diagram contrasts two views of meaning in generative AI. The top path assumes fixed values that can be retrieved. The lower path reflects the enactivist view: meaning arises only through interaction, as each prompt collapses a field of possibilities into a single outcome..

Just as quantum measurement resolves potential into actuality, evaluation in generative AI selects from a range of possible meanings. There are no hidden variables determining an outcome in advance; each prompt is an experiment that helps define the system it probes. In this sense, responsible evaluation is less about revealing what a model is than about observing what emerges when human intention and machine probability meet.

Evaluation, like observation in quantum mechanics, is a participatory act that helps bring outcomes into being.”
PhD thesis: Ch. 5 — Semantic Auroras

6 • Evaluation as Governance

Designing measures that shape accountability

The Coda of my thesis argues that evaluation is not a peripheral task but a central mechanism through which AI systems are shaped and governed. Every benchmark, dataset, and metric carries normative assumptions that influence what becomes visible, comparable, and optimisable. Evaluation, therefore, functions as an instrument of governance; it determines how capability, alignment, and responsibility are defined in practice.

Four-order governance loop connecting evaluation, actors, and reflexivity.
The Cybernetics of Participatory Realism in AI sociotechnical systems
This diagram shows how evaluation operates as a governance system. Each stage—from designing benchmarks to reflecting on whose values are measured—forms a feedback loop linking machines, societies, and humans. The four orders of cybernetics capture escalating layers of accountability: behaviour, thinking, shared perception, and self-observation.

My ongoing research extends this insight toward the design of evaluative infrastructures: frameworks that integrate descriptive, pluralist, and enactivist approaches into policy and institutional processes. By treating measurement as part of governance design, we can make explicit whose values are being reinforced, where accountability resides, and how evaluation criteria evolve alongside the systems they assess.

“What we choose to measure determines what AI becomes in practice.”
PhD thesis: Coda — Measuring What We Enact

“Evaluation is not a side activity—it is how we come to know ourselves in relation to the machines we make.”
PhD thesis: Coda — Measuring What We Enact


Collaboration across disciplines

The ethics of generative AI cannot be built in isolation. Progress depends on dialogue between computer scientists, philosophers, social scientists, and practitioners who study how technology shapes society. Respecting the depth of existing philosophical and humanities work, and combining it with technical insight, gives us the best chance of steering these systems responsibly.

Building ethical AI is not just a technical challenge but a collective practice of reflection, design, and responsibility. Collaboration across disciplines ensures that the systems we build remain accountable to the diverse human contexts in which they operate.

Continue to:

Blog
Media
Consulting
Publications
Researcher
1-Page CV
Thesis highlights


Connect