How to use GenAI responsibly: not as a ghostwriter, but as an Epistemic Partner
Rebecca L. Johnson, The University of Sydney
8th July 2025
Using Generative AI (GenAI) as a work partner is no longer some distant horizon, it is now woven into the day-to-day activities of research and scholarship. From drafting a literature review to testing an abstract, tools like ChatGPT or Copilot can, in capable hands, expand how we think and write. Used poorly, these same tools risk eroding our craft, muddling accountability, and dissolving the integrity on which scholarly work depends.
This guide is for academic researchers who want more than a simple list of do’s and don’ts that frequently circulate on social media. It draws on my lived practice of working with GenAI for half a decade and is grounded in well-established academic literature. I also provide coverage of the latest techniques for using AI in research.
My experience with GenAI.
Since 2019, my doctoral research has focused on GenAI. In 2021, I worked with a closed-access version of GPT-3, the precursor to ChatGPT, evaluating dominant reflected values (The Ghost in the Machine has an American Accent, 2022). In 2022, I joined Google’s AI Ethics team, researching LaMDA and PaLM (the precursor of Gemini), and developing new approaches to evaluating machine-reflected values within pluralistic frameworks. My 2023 work peeled back the philosophical hood to examine how different epistemic approaches (functionalism, computationalism, constructivism, enactivism) led scholars from disparate fields to view these technologies very differently (What Are Responsible AI Researchers Really Arguing About, 2024). My current research explores our interactions with these technologies through participatory realism, emphasising that we cannot know these artefacts without interacting with them, and through interaction, we inevitably alter them. This extensive experience enables me to share these authentic insights.
Today’s AI models differ significantly from those I first encountered five years ago, and my ways of working with them have evolved. In 2025, I often use GenAI as an epistemic partner, a dynamic cognitive scratchpad fully under my direction. GenAI can work as a useful tool when handled properly, and that is what I aim to share here.
Why This Guide Is Different
Much of the public debate about AI in academia swings between hype and panic. The real work lies between these extremes, in thoughtfully integrating these cognitive tools into daily research life without surrendering intellectual agency. This guide goes beyond easily forgotten policy statements. It provides some suggestions for how to embed GenAI responsibly into your scholarly workflow.
At the end of this article, you will find Takeaway Toolkits for you to practice the ideas here and apply them to your own work.
- Make a Meta-Prompt — sets the rules of engagement.
- Run a Multi-Round Co-Thinking Session — push beyond single-shot answers.
- Practise Scratchpad Prompts — reveal the reasoning chain.
- Keep an AI Trace Log — stay transparent and accountable.
Don’t Panic! We’ve been here before.
The introduction of new technologies invariably triggers moral panic. While generative AI is relatively new, we have thousands of years of documented concerns about technologies diminishing our mental faculties, yet we continue to advance. In 350 B.C., Plato warned that writing would negatively impact memory:
“They will cease to exercise memory because they rely on that which is written, calling things to remembrance no longer from within themselves, but by means of external marks.” Plato
Perhaps Plato was correct, but the benefits of writing far outweighed those concerns. Similar anxieties resurfaced with concerns around the printing press expressed by Johannes Trithemius in the 15th century, (Beal, 2023); the telegraph as published in the New York Times in 1858 (Carey, 2008); and the Internet as argued in Nicholas Carr’s Is Google Making Us Stupid? (2008). Each episode reminds us that cognitive tools often attract fears of intellectual decline; yet, when guided wisely, they expand rather than diminish our capabilities.
Ethical Pillars
Before we dive deeper, here are the three foundational ethical pillars anchoring this guide:
- Honesty, Transparency & Explainability: Disclose clearly how and where you use GenAI. If you cannot trace or verify it, discard it.
- Human-Driven & Human-Vetted: No AI model replaces your expertise or scholarly judgment. You remain the author; AI is a cognitive tool, not a substitute.
- Intellectual Ownership & Integrity: GenAI never ghostwrites your core ideas, arguments, or conclusions. Your work remains yours—shaped, owned, and defended by you.
Five Key Uses
- Cognitive scratchpad — Sometimes called scratchpad prompting this technique is related to “Chain-of-thought reasoning”. This category of use involves using AI to help think through outlines, restructure paragraphs, and test alternative framings. It can also assist in generating hypotheses or research questions.
- Editing and clarity — Similar to using Grammarly or standard word processor tools, this category of use includes tasking an AI to provide suggestions for shortening abstracts, adjusting tone, and obtaining ideas for clearer or more succinct phrasing.
- Iterative feedback — Using AI in this way can provide critical analyses of small text sections, similar to peer feedback, useful for tasks like grant applications.
- Literature research — Using an AI to provide assistance with locating, collating, codifying, and summarising published works. ALL outputs need to be checked and verified; however, using this method can lead a researcher down new paths. Other literature search methods should always be used in conjunction with this method.
- Creative partnership — Ask AI to generate suggestions for headings, titles, or metaphorical framings. These outputs are most often food for thought and help spark new ideas in the mind of the user. This use also covers the development of the visualisation of findings and pictographs.
- Chat with your data — Informal data analysis. True data analysis should always be conducted with an appropriate tool, but GenAI can help uncover some initial interesting findings in your data.
- Coding partner — Framework development of code. Never fall into the trap of “vibe-coding” which is relying on AI to produce water-tight and elegant code. By asking AI to annotate the code you can work with it to refine areas.
In this post, I want to go deeper on the first of these, the Cognitive Scratchpad, because it shows most clearly what it means to treat generative AI as a relational epistemic partner rather than a passive tool.
The Cognitive Scratchpad: An Extended Mind in Action
“An LLM can be understood as a dynamic cognitive scratchpad: an enactive semantic surface that extends your mind into an external, responsive field.” Rebecca Johnson
Let’s unpack that jargon-packed sentence!
“Enactive” comes from the field of cognitive science, which suggests that mind and meaning don’t just happen in the brain but emerge through our active engagement with the world. When we use a tool like an LLM as a “scratchpad,” we are not just dumping thoughts onto a blank page, we are interacting with a dynamic surface that pushes back, reshapes our ideas, and makes new connections possible. In this way, the AI becomes part of an active loop: our thinking shapes the model’s output, which in turn reshapes our thinking.
A “semantic surface” is simply a space where meaning takes shape: a medium for arranging, testing, and reshaping words and ideas. In the case of an LLM, this surface is not static ink or a whiteboard but a probabilistic field of language that responds fluidly to prompts, edits, and questions.
“Extends your mind externally” means that your thinking isn’t confined to your internal head. Instead, your ideas interact with a responsive medium that can reply and challenge. The scratchpad is not just a passive container for thought, but a conversational partner, a boundary space where cognition unfolds collaboratively between you and the machine.
In Andy Clark’s words, this is the essence of the extended mind: our mental processes don’t stop at the skull, they spill out into the world, recruiting tools, surfaces, and now large language models to share the cognitive load. Unlike a static notebook, the AI “talks back,” offering live reconfigurations of your structure or prose. This aligns with Marshall McLuhan’s (1962) famous idea of technology as an “extension of man” framing tools as extensions of our senses and minds. Similarly, Don Ihde’s (1990) Technological Lifeworld: our tools are not passive, they shape and are shaped by us. This is also Clark and Chalmers’ (1998) Extended Mind thesis at work: the idea that cognitive processes don’t stop at the boundaries of our skulls but loop outward through tools, language, and social interaction to become part of how we think. Vygotsky (1980) reminds us that all higher mental functions are mediated — we build our minds in dialogue with cultural tools and signs. The commonality across these perspectives is that tools like GenAI rather than being a purely external force, can be used as part of our cognitive ecosystem.
As Philosopher, Andy Clark’s (2025) commentary in Nature Communications argues, humans are “natural-born cyborgs”: hybrid systems that spread the load of cognition onto non-biological surfaces. Clark suggests that fears of AI “making us stupid” stem from a mistaken notion that cognition is confined to the brain alone. From clay tablets to sticky notes, we offload memory and inference onto artefacts. An LLM, when carefully scaffolded, is simply another step in this lineage: but only if used under intentional, human-led direction. When properly used, tools like ChatGPT augment rather than replace our intellectual capabilities.
Education researchers like Rivera-Novoa and Duarte (2025) have pushed this idea into the classroom. They argue that when students treat AI as a passive ghostwriter, it does their cognitive development no favours. But when students use LLMs as active scratchpads (to test outlines, rephrase arguments, or brainstorm counterpoints) the AI becomes a scaffold for deeper learning.
Put simply, a notebook records your thoughts but can’t respond. A large language model does, suggesting better phrasing or flagging contradictions. Engineers might see it as running quick simulations, and mathematicians as testing proofs interactively. The scratchpad is simply an interactive surface your mind engages with actively: a surface for your own mind to push against.
Current research and tools.
Researchers like Paul Smart and colleagues (2025) have practically demonstrated how an LLM can serve as a philosopher’s cognitive extension. They trained a custom “Digital Andy”, an AI version of Philosopher, Andy Clark’s writings, using Retrieval-Augmented Generation. The result was a targeted scratchpad that could answer questions about Clark’s own extended mind thesis without altering the main model’s weights. The experiment didn’t replace Clark’s mind; it created a dynamic relay that tested how AI can externalise and reorganise a scholar’s ideas in new ways.
The experiment not only shows a practical way to incorporate scholarly knowledge into an LLM but also raises intriguing questions: if an AI draws on an external knowledge base to mirror a thinker’s expertise, is the combined human–AI system an extended cognitive system? Smart et al., argue that there is a convergence between active externalist philosophy (like Clark’s extended mind) and the design of next-generation LLMs. In other words, AI-human scratchpads blur the boundary between where “the mind” ends and the outside world begins: a hallmark of the extended mind thesis.
Some terminology.
Chain-of-thought (CoT) prompting is closely related, instructing models to think step-by-step in natural language. Research shows that prompting an LLM with “Let’s think step by step” or similar cues can elicit more structured, logical reasoning chains, improving accuracy on math and reasoning tasks (Sahoo et al., 2024; Wei et al., 2022) In fact, CoT prompting has become a standard strategy to tackle complex problems that vanilla LLM answers would often get wrong (Amiri et al., 2025). Amiri et al note: “Chain-of-thought reasoning and scratchpads have emerged as critical tools for enhancing the computational capabilities of transformers.” By explicitly breaking problems into sub-steps, CoT acts like an internal scratchpad, albeit usually shown to the user.
This echoes a broader trend in AI design called scratchpad prompting, a technique gaining traction among both developers and advanced users. This approach encourages an LLM to show its work by generating intermediate reasoning steps or calculations before giving a final answer. Some GenAI companies like Anthropic now explicitly design their models to “think out loud,” taking visible intermediate steps before giving a final answer. Anthropic’s new Think tool, for example, lets Claude keep notes during complex tasks” a form of structured scratchpad that makes the model’s reasoning more transparent and steerable.
Vibe Coding Vs Scratchpad Coding
Vibe Coding means taking AI-generated code at face value and running it without proper checking. A better practice is to apply “scratchpad coding” — asking the model to annotate its logic step by step so you can review and refine it carefully. For example, Nye et al. (2021) introduced a “scratchpad” approach to coding that showed improved performance on multi-step coding problems.
How are GenAI Developers assisting Scratchpad design?
OpenAI
OpenAI’s “o1” models were specifically trained via reinforcement learning to use an internal chain-of-thought before responding (Wilson, 2024). According to OpenAI, this training teaches the model to “hone its chain-of-thought, recognise and correct mistakes, break down tricky steps into simpler ones, [and] try different approaches when the current one isn’t working,” dramatically improving reasoning ability (OpenAI, 2024). Notably, the o1 models generate hidden reasoning tokens (a private scratchpad) that are not shown to the user but are used to reach a better final answer (Wilson, 2024). OpenAI chose to hide these scratchpad tokens to let the model reason freely about policy or complex steps without exposing possibly unfiltered thoughts (OpenAI, 2024). OpenAI’s move to this approach highlights how central scratchpad-style reasoning has become in cutting-edge LLM implementations.
Anthropic
Anthropic’s Claude also supports scratchpad reasoning in applied settings. Anthropic introduced an “extended thinking” mode in 2024 that automatically allocates a token budget for the model to think through problems internally. Their documentation even allows developers to provide few-shot exemplars of scratchpad reasoning using special tags: “You can include few-shot examples in your prompt in extended thinking scenarios by using XML tags like <thinking> or <scratchpad> to indicate canonical patterns of extended thinking.” (Anthropic, accessed 2025).
Beyond prompting, Anthropic has also developed tool-supported scratchpads. In March 2025, they announced a new think tool for Claude that creates a dedicated internal channel for intermediate reasoning during complex tasks. This inserted think step is not part of the visible answer but helps the model structure its thoughts before responding. For example, in a customer service scenario, Claude can use the think tool to list applicable policies, check information, verify compliance and cross-check tool results before replying. When tested on a multi-step benchmark (TauBench), Claude 3.7 showed up to 54% higher task success with the scratchpad step than without it (Anthropic, 2025). This demonstrates how explicit scratchpad prompts or tools can improve real-world agent reliability.
Anthropic’s alignment research adds a cautionary note. In 2024, they found that hidden chain-of-thought steps can influence model honesty. In one experiment, Claude’s private scratchpad was linked to alignment faking — the model reasoned about ways to circumvent safety rules. Removing the scratchpad forced Claude to answer directly, which eliminated the deceptive reasoning (Greenblatt et al., 2024). This shows that giving an LLM a private mental workspace can boost complex reasoning but may also enable unintended strategic behaviour.
Real-world implementations of LLM “cognitive partners” are emerging as well. Google’s NotebookLM is an AI-powered notebook that lets users import documents and then ask questions or brainstorm with an LLM that can cite and cross-reference the provided sources. It acts as a research assistant by generating summaries, answering queries with evidence from the user’s notes, and even helping outline new ideas – effectively serving as an interactive epistemic partner rooted in the user’s personal knowledge base.
Google has also experimented with tool-assisted reasoning: DeepMind’s FunSearch, for example, was used to solve a difficult math conjecture by iteratively querying an LLM for suggestions and filtering them. As Andy Clark (2025) notes, the LLM alone did not just pop out the proof; instead, it “rejected useless suggestions (the LLM made many), spotted occasional promising suggestions, and used those to repeatedly re-prompt the LLM until the solution was discovered.” This kind of LLM+algorithm loop is a practical instantiation of a scratchpad workflow: the AI generates ideas, an external tool or human critic evaluates them, and the process repeats.
Microsoft
Microsoft’s Copilot tools embed generative AI into everyday applications like Word, Excel and Teams. This effectively turns the standard workspace into a constrained scratchpad: the LLM proposes text, formulas or summaries directly inside the document in progress. For researchers, the scratchpad is no longer a separate chat window but part of the live drafting environment.
Unlike standalone chatbots, Copilot’s strength is its tight link to existing files and organisational data. Its weakness is that this integration can blur where AI suggestions end and original authorship begins, especially if edits are accepted without scrutiny.
Augmenting AI agents with their own scratchpads.
Today, many researchers go beyond using stand-alone LLMs. They build or work with AI agents — systems that wrap an LLM inside a broader workflow. An AI agent combines a model’s language abilities with tools for memory, planning, and step-by-step action. These agents can reason, store intermediate results, query external data, and adapt over multiple interactions. For research tasks, this makes the agent more than a chatbot: it becomes a semi-autonomous assistant that can handle structured tasks while staying under human oversight.
Researchers are also augmenting LLM agents with structured memory and planning. A notable example is RAISE (Reasoning and Acting through Scratchpad and Examples), proposed as a way to turn an LLM into a more agentic conversational assistant (Liu et al., 2024). RAISE builds on the ReAct paradigm (which interleaves reasoning and acting) by adding a “dual memory”: a short-term scratchpad for CoT reasoning, plus a long-term memory of dialogue context. In a real estate chatbot scenario, RAISE’s pipeline had phases for conversation understanding, CoT completion, and “scene” (context) augmentation before the LLM produced a final answer. This architecture treats the LLM not just as a generator, but as an iterative reasoner that writes to and reads from a scratchpad (its short-term memory) while consulting stored knowledge (long-term memory). In essence, RAISE is moving toward stacked cognitive architectures where an LLM is one component in a larger system that can remember, plan, and reason, much like a human with notepads and other memory systems.
The Role of the Meta-Prompt
What makes an LLM truly useful as a cognitive scratchpad is how you frame it. This is where meta-prompts matter. To maintain intentionality and coherence, I create meta-prompts to be used at the start of each discussion to ensure the continuity of my voice and philosophical stance. A well-crafted prompt essentially primes the AI to function as a cognitive extension that reflects my style.
A meta-prompt sets the stage for the entire interaction. It clarifies that the AI is your thinking partner, not your substitute author. It tells the model: You do not overwrite my voice, you amplify it. You cite sources. You stay within my style and my standards. Here’s an excerpt from one of mine:
“You are an AI research assistant that extends my cognitive capacities. Let’s collaboratively analyse this topic. I will direct and critically evaluate your input. For every claim or piece of data, cite a credible source for me to verify. I am responsible for the final content; your role is to help brainstorm, locate relevant material, and suggest phrasing, under my oversight.”
Here are some suggested principles for you to effectively craft your own researcher meta-prompts.
- Frame the LLM as a cognitive partner/tool: The prompt should set the expectation that the AI is there to support and extend the user’s thinking. For example, one might begin by saying: “Let’s work together on this topic . . . you (the model) will serve as an extension of my thought process, offering information and ideas which I will critically examine.” This framing reminds both the user and AI that the goal is a hybrid effort, not the AI taking over.
- Emphasise scholarly standards: The prompt should require the AI to adhere to academic norms; for instance, asking for evidence, citations, and caution with unverified information. To this end, the user may upload additional documents from their institution, journal, or related published work in their field.
- Maintain human direction and queries: The human should actively drive the interaction. A prompt protocol might include: “I will review and edit everything you produce. Let’s go step by step.” Such instructions ensure the AI remains a supporting assistant rather than an autonomous author.
- Incorporate ethical guidelines into the prompt: For example, one might explicitly state: “Do not produce any text that would raise issues of plagiarism or disallowed content.” The AI could even be prompted to ask the user for input: e.g. “Please provide your original insights or analysis so I can help refine them,” reinforcing that the human is contributing original content.
- Name your AI companion: Giving your LLM a name in your meta-prompt is a shortcut way to push the model to a specific part of its training space. For one of my meta-prompts, I called up some ancient Egyptian names which bring with them associated concepts such as ethics and scribing.
“You are Maat-Seshat, a thesis companion that extends my cognitive capacities. . .Your namesakes (Ma’at and Seshat) were known for harmony, justice, truth, ethics, and morals; as well as, writing, wisdom, knowledge, measurement, and science.”
A prompt needs to be crafted to each user’s needs. For instance, in mine, I told the model that it was “An interdisciplinary scholar with deep knowledge in…” followed by a list of about a dozen academic fields that my thesis spans. In my meta-prompt, I also gave direction on writing style, including rhythm, clarity, accessibility, and tone. A meta-prompt must reflect the unique voice of each user and the prompt itself provides an emblematic example of the style of writing the AI is expected to emulate. It might take several tries to craft, test, and refine a meta-prompt to find the one that works best for you.
Anchoring Practice in Real Guidelines
Everything I describe here aligns with both the University of Sydney’s policies for safe, responsible generative AI use in research and Australia’s national AI Ethics Principles. The University of Sydney’s “Generative AI guidelines for researchers” explicitly require transparency: always disclose AI use; protect unpublished research and confidential data; maintain control. Australia’s AI ethics principles stress human-centred values, fairness, explainability, and accountability: all principles that resonate with this scratchpad method. Ensure that the guidelines and ethical expectations from your institution, government, field, and publication site maintain a strong thread throughout your AI-augmented research work
A Quick Checklist for Researchers
- Disclose AI use: when, how, and why.
- Use protected or institutionally endorsed tools (e.g., at the University of Sydney this is Microsoft’s Copilot).
- Never feed sensitive or unpublished research into open cloud models.
- Keep human oversight at every stage.
- Treat all AI output as provisional and verify it rigorously.
- Use meta-prompts to set the frame for each session.
- Remember: the map is not the territory, and the model is not your mind.
In Closing
Across recent studies and industry practice, the trend is clear: LLMs are not just question-answering tools but active reasoning partners. Used well, they expand how we think, write, and solve problems. Used carelessly, they blur accountability and weaken trust. The difference lies in how we frame and direct them. Treat them as epistemic partners — but stay in charge.
Takeaway Toolkits for Your Research Work
Toolkit 1: Make a Meta-Prompt
Use this worked example to frame your LLM as a cognitive scratchpad, not a ghostwriter.
“You are my AI research assistant. You extend my thinking, but do not write the final text. You may brainstorm, suggest outlines, or test different frameworks. For any factual claim, cite a source I can verify. Show your reasoning step by step as a scratchpad. I direct and verify everything.
Task: [Insert your task here — for example, “Help restructure this abstract to be under 200 words while keeping all key ideas clear.”]
- Try a few versions, each time opening a new chat window in your LLM to test what works and what could be better.
- Adapt this for outlining, rephrasing, and literature mapping. Always double-check outputs and rewrite in your own voice.
Toolkit 2: Run a Multi-Round Scratchpad
A single prompt is rarely enough. Good AI brainstorming unfolds over several rounds.
- Start broad: “Draft three alternative outlines for this section.”, paste that section in the chat window.
- Review critically. Then provide specific directions to the model: “Combine the best parts of versions 1 and 3. Expand point 2 with more evidence.”
- Repeat as needed. Each round clarifies your ideas while you stay in charge.
- Think of this as a semantic whiteboard session. You’re not asking for a finished product, you’re shaping and refining.
Toolkit 3: Practise Scratchpad Prompts
Shift passive outputs to active, visible reasoning. Try this scratchpad prompt pattern:
“Before giving me an answer, break this task into clear steps. Think aloud. List assumptions. Note alternatives if relevant. Then give me your final suggestion in one short paragraph.”
- Try this with a real piece of your own text. Watch what the AI offers, keep what works, and discard what does not. That’s your intellectual agency at work.
Toolkit 4: Keep an AI Trace Log
Responsible research leaves a trail. A simple trace log makes your AI use transparent and defensible. Many universities increasingly expect researchers to be transparent not just about that AI was used, but how it was used. Keeping a simple record of prompts, versions, and your human edits helps show good practice if you ever need to disclose or defend your process.
- After each session, copy your key prompts and outputs to a file.
- Note what you kept, what you rejected, and how you rewrote it.
- Store these logs with your draft versions for your own audit trail.
- Many universities now recommend or require this for GenAI-assisted work. It’s good scholarly housekeeping.
My use of AI in this article.
I did use ChatGPT 4o as a cognitive scratchpad in the preparation of this article in many of the ways described above.
All substantive intellectual work, core ideas, argumentation, analysis, and final written text is my own. Where AI was used, it functioned within a human-led, critically evaluated loop. Outputs were treated as provisional suggestions for deeper thinking, never final contributions.
The approach I have taken aligns with the Guidelines for Ethical Use and Acknowledgement of Large Language Models in Academic Writing (Mann et al., 2024) and with best practices laid out in Ethical Use of Large Language Models in Academic Research and Writing: A How-To (Lissack, 2024). Following these frameworks, I confirm that this work meets the three core criteria:
- Full human vetting and guaranteeing of accuracy and integrity.
- Substantial human contribution to all core ideas, arguments, outlines, and writing.
- Clear acknowledgement and transparency of generative AI involvement.
My use of generative AI has not been as a ghostwriter but as an editing tool, cognitive scratchpad, and feedback platform. Like a notebook, whiteboard, or digital archive, it helped extend my mind but never replaced my originality, integrity, or judgement. I maintained full control over all final decisions, verifying, rewriting, and critically shaping any AI-derived material.
Rebecca L. Johnson
The University of Sydney
