These are public-facing areas, where the users are citizens. The report covers fewer use cases in areas where the users are civil servants: public financial management and civil service reform have 14 and 11 respectively, and policy evaluation has the smallest number (only 5).
This distinction reflects thinking around scale as a determinant of impact. If the number of users is seen as a proxy for value generated, it is likely that citizen-facing services will be prioritised. While this can be a reasonable heuristic, it may miss other drivers of impact: reducing bottlenecks in public finance, procurement or audit processes, for example, could have a cascade of positive outcomes for public service delivery, even though these are civil servant-centric.
2. ML, GenAI and Agentic AI
The report notes that rules-based or decision support engines are still the preferred AI-powered tools in government. Rules-based approaches lend themselves more easily to applications of machine learning (ML) than generative AI (GenAI); unsurprisingly, then, many use cases relate to ML-based tools being used to streamline and automate processes and detect anomalous patterns in data.
This is not to say that GenAI tools are not in use. As Peixoto (2025) notes, they are seen in narrowly defined use cases. Large language models (LLMs) power multilingual chatbots, supporting access to services in multiple languages. Retrieval-augmented generation (RAG), a technique that guides the language model to draw on specific databases, powers tools that generate summaries and drafts from the voluminous literature of laws, regulations, administrative notifications, judgments and reports.
In all cases, though, a human user (which in government means a civil servant) is expected to review, verify and perhaps build on this content. While the private sector is increasingly interested in AI Agents – AI systems that can pursue goals and perform complex tasks autonomously, using other software tools – there is limited precedent for the use of autonomous software in government (and the most prominent example is more cautionary tale than success story).
There are at least two reasons for caution. First, the public sector typically has to demonstrate not just certain outcomes – which Agentic AI might be able to improve – but also compliance with procedure and process. Such step-by-step accountability is possible with ML, but remains elusive for GenAI (barring some use cases in forecasting, discussed later in this blog). With AI Agents, one would have to start by developing techniques and benchmarks to assess explainability, and the fact that AI models learn (and hence modify their methods) means model evaluation has to evolve rapidly as well.
Second, and more importantly from a public finance lens, if a non-AI tool can deliver similar value, why should a government invest (probably much larger) amounts in the AI version?
3. Non-adoption comes with risks as well
Governments should invest, the report says, because those who are not proactive in understanding and working with AI may find themselves passive takers of technology, rather than its shapers. These are the risks of non-adoption, and a risk-informed approach to bringing AI into government will take these into account as well.
The warning is timely. Governments are increasingly concerned with ‘digital sovereignty’ and ‘data sovereignty’; however, those ideas can be used to justify a wide swathe of policies, ranging from very limited engagement to insisting on domestic ownership (or even in-house development) of any technology used by government. There is often no obviously correct choice in these cases, as the interaction between the technologies, providers and business models chosen shapes sovereignty or dependency outcomes.
To retain digital sovereignty – to remain ‘shapers’ and not ‘takers’ – governments will have to build the knowledge and expertise to assess how AI can be used, as well as how it should be regulated. There are few ways to do this other than learning-by-doing. The report can thus be read as an argument for ‘strategic tinkering’, where governments are prepared to experiment with AI, while maintaining high thresholds for taking those experiments to scale.
It is an open question as to whether countries outside the OECD, with more limited financial and human resources to tap into, can take this path as well. The World Bank’s latest Digital Progress and Trends report, for instance, explores the idea of ‘small AI’ suitable for low-income and middle-income countries.
4. Governments are tinkering. Whether it’s strategic remains to be seen.
It’s not yet clear if OECD governments have a broader roadmap either. The report finds “…a high presence of early-stage initiatives, such as experiments and pilots”, which is consistent with learning-by-doing; however, scaling up remains rare and somewhat challenging.
This is not unique to government. Deloitte’s 2024 “State of AI in the Enterprise” Survey, cited in the report, found that most organisations (across the public and private sector) are pursuing fewer than 20 AI experiments. 70% of organisations said they did not expect to scale a majority of these experimental tools in a 3-6 month timeline.
In the public sector, factors contributing to this ‘scaling gap’ include difficulty accessing or sharing high-quality data, difficulty in demonstrating return on investment, risk aversion and skills gaps. No surprise, then, that teams within government are experimenting where they can, for instance where data or risk issues are manageable, or where cost-effectiveness can be relatively easily shown.
What seems to be missing is a way to pull together the lessons from these experiments, whether to build institutional memory and capacity, or to draw out broader trends and evidence. The report quotes the UK Parliament’s Public Accounts Committee: ‘no systematic mechanism for bringing together learning from pilots, and few successful examples of at-scale adoption across government’. This is what an AI strategy for the public sector could do; per the report, however, institutional mechanisms for working with AI remain largely absent, with a lack of actionable frameworks and guidance.
5. AI in PFM has some unique advantages when it comes to explainability
In the area of PFM, the report finds two common archetypes: ‘AI as assistant’ and ‘AI as advisor’. In both cases, the emphasis is on making large volumes of data more usable by translating them into insights relevant to specific policy or enforcement decisions.
AI assistants are being used to generate categories from unstructured or semi-structured data, for instance, to classify expenditures according to the COFOG classification. AI advisors are being used to forecast (and ‘nowcast’) key macro variables. As these projections of this kind can be adjusted and re-run to explore the impact of specific variables, their results are more interpretable (‘explainable’) than is typical of AI outputs.
Those gains in explainability and evaluability are important. PFM processes often require compliance with specific processes and procedures, so the emphasis has to be on areas where AI-powered tools can add value, without requiring parallel reworking of everything the AI does to meet those procedural requirements.
One such area could be the use of AI by Supreme Audit Institutions (SAIs). The report highlights audit as an area where AI’s anomaly detection capabilities could be usefully deployed, and there could even be a role for GenAI in drafting standardised audit reports, especially as SAIs are increasingly asked to do more with limited resources (2023 Global SAI stocktaking Report, pp. 22–25).
Conclusion: from use cases to products and outcomes
Governing with AI provides a useful snapshot of how OECD governments are currently using AI. In line with previous observations from the OECD, it shows that AI adoption remains gradual and incremental, ‘evolutionary, not revolutionary’.
This is also in line with thinking about ‘AI as normal technology’. In this view, the large-scale adoption of any technology occurs only when there is a product that makes that technology useful and affordable for the average person. As more such products are developed, the technology can become ubiquitous, but this can take decades. For instance, the internet – the general-purpose technology directly preceding AI – was invented in the early 1970’s, with the first web browser created in the 1990s; Wikipedia, consistently in the list of the ten most-visited websites, was created in 2001, and Instagram in 2010.
In other words, AI adoption will pick up when there are reliable AI-powered products that users in government can rely on, without having to be particularly proficient in AI technologies per se. Even if we think that today’s chatbots provide an easy-to-use tool or interface, until issues of explainability and process adherence are resolved, they are not reliable enough for government use cases. The preponderance of pilots and experiments is thus likely to persist for some time.
This is not to detract from the report’s warning that governments need to be more proactive in engaging with AI. This is crucial for building up the skilled workforce that can evaluate potential applications of AI, and when and how these can be scaled up in government. This is particularly important for Ministries of Finance, whose traditional challenge role will increasingly mean interrogating the costs and benefits of government departments’ AI plans.

