Team Insight

Trends and Future Outlook for Generative AI: Access, Inclusion, and Advancing the SDGs

May 13 , 2025

Overview

Advances in AI have been moving at a breathtaking pace. While we believe that there is value to encouraging the industry to be even more open, we also feel lucky that many models are getting released with open weights to allow developers to experiment with customizing and applying them to ever more diverse use cases. This trends and outlook analysis focuses on generative AI, since that’s where a lot of the excitement has been in the last few years. However, we want to underscore that AI is a lot broader than just these models.

On the frontier technology team, we work with AI in three ways. First, we partner with country offices – largely through our Venture Fund – to test and deploy AI applications for UNICEF programming. Second, we build our own prototypes to explore technologies that we think are promising for UNICEF across multiple country settings. Finally, we support efforts to leverage AI to advance UNICEF programming more broadly – including helping the organization design its AI strategy, and working with colleagues across the organization to articulate how AI can be applied in specific sectors including WASH, learning, health, and emergency response.

What’s been going on

ChatGPT was released in late 2022, and 2023 was characterized by its meteoric rise in popularity as well as a lively discussion of its capabilities and weaknesses. In 2024, we’ve seen growing competition and sophistication from the leading generative AI models, but also advances that make it easier to put generative AI models into practice and extract value for concrete use cases. For example:

We got better at leveraging large language models (LLMs) out of the box.

To deploy generative AI models in specific contexts, the initial wisdom was that it would be necessary to fine-tune models to the task at hand. However, 2024 showed a lot of promising applications extracting more value from models “out of the box”, in part due to the availability of better, more instructable models; reductions in cost relative to performance; and the lengthening of context windows which allows users to supply more instructions and information in their prompts. These benefits were complemented and amplified by the increasing sophistication and proliferation of retrieval-augmented generation systems, also known as RAG systems, which provide models with customized, question-specific information retrieved from a database to inform their answers.

This is great news for UNICEF, as these strategies can make building high-quality generative AI systems more accessible by circumventing the expense and expertise needed to fine-tune. We’ve seen RAG systems powering tailored applications like Baobab Tech’s WASH AI, a chatbot targeted at water and sanitation professionals that draws on a knowledge base of sector-specific documents, and UNIBOT, UNICEF’s own internal AI assistant.

We saw increasing promise from smaller models.

Large language model releases increasingly come with different variants, including large versions with many parameters (these perform best, but have higher computing requirements) and smaller versions that are more computationally efficient. In 2024, advances in the state-of-the-art meant that small models got quite good – for example, Meta released an 8 billion-parameter instruction-tuned version of its Llama-3 model that outperformed its 70-billion parameter Llama-2 counterpart.

This has significant implications for UNICEF, since it means that using these models is getting cheaper and more environmentally efficient. It also means that it is increasingly possible to run large language models on premise – whether on partners’ infrastructure, or even on personal devices like a laptop or phone. This can help bridge the digital divide by enabling access for people without a stable internet connection; it also unlocks possibilities for more secure, privacy-preserving applications that store and use data locally and don’t exchange information with external service providers. We’ve already seen the promise of edge computing through our Venture Fund investee Bookbot, which has managed to support children learning to read with Bahasa Indonesia speech-to-text models that run locally on mobile phones.

Models are becoming (more) multilingual.

While there are over 7,000 languages in the world, the distribution of languages on the internet and in written material is skewed, and this bias is reflected in generative AI training datasets. For example, the training dataset for GPT-2 was filtered to include English text only and English represented 93% of GPT-3's training data.

In 2024 we saw increasing attention to this disparity – whether via training models on more diverse datasets, or fine-tuning them for specific languages. Cohere released Aya, a multilingual LLM that was trained on 101 languages. Jacaranda Foundation, having previously fine-tuned a model for Swahili to support maternal and child health applications, repeated this for other Nigerian and South African languages.

Google Translate used LLMs to expand its support to over 100 new languages, even as we worked to test a freely available machine translation model (which currently supports 200 languages) for humanitarian responders in the Democratic Republic of the Congo and decided to invest in expanding language support for a child and gender-based violence case reporting system. In 2025, we plan to dig deeper into generative AI applications to support natural chat interactions in non-English languages.

These developments aren’t limited to written text. Multimodal large language models can now natively accept speech inputs, and we are also starting to see the emergence of systems that aim to make LLMs even more accessible – such as Viamo’s offering of a “voice-first” generative AI system that allows users to call in to ask questions via mobile phone. These developments are happening alongside dramatic advances in text-to-speech and speech-to-text technology; for example, open speech synthesis models are now available for 7,000 languages.

Where we’re headed

There is no sign that things will slow down in 2025. We’re watching a number of emerging trends, including big wins for open source and reasoning models, AI agents, and AI evaluation.

Open models at the top of the AI leaderboards.

In late 2024, the Open Source Initiative released a formal definition of an open-source (OS) AI system, which includes a release of the model parameters, the model code, and detailed information on the model training data, if not the data itself. Then, in early 2025, DeepSeek sent the industry into turmoil by releasing a chatbot that quickly surpassed ChatGPT in iOS’ US app store rankings. The chatbot was based on DeepSeek-R1, which (while not considered fully open source, according to the formal definition) has freely available weights, and the company took a strong stand on model openness, with founder Liang Wenfeng saying that “in the face of disruptive technologies, moats created by closed source are temporary...having others follow your innovation gives a great sense of accomplishment. In fact, open source is more of a cultural behavior than a commercial one, and contributing to it earns us respect.” DeepSeek transformed the conventional wisdom about the comparative advantage of proprietary models by offering an open-weights model that could compete with OpenAI’s offerings.

DeepSeek-R1 also upended the conventional wisdom that “bigger” is better in generative AI. Given constraints on China’s ability to procure the hardware needed to train AI models, it benefited from techniques that increased training efficiency as well as the fact that it is a reasoning model. Because reasoning models are designed to demonstrate a “thought process” when producing answers, they are often able to perform better on complex tasks than a traditional LLM of comparable size.

As noted above, efficient, open and high-performing models are great news for UNICEF because they mean that LLM technology is increasingly within the reach of our programmes. Such technology can help power a wide range of SDG-relevant applications, ranging from helping to triage pediatric fever cases in remote rural communities to analyzing crowdsourced reports of environmental and climate-related concerns.

AI agents on the rise.

As generative AI applications have gotten more sophisticated, there is a clear value to breaking up user workflows into distinct components and outsourcing tasks to the models or systems that perform them best. Enter AI agents, or “model[s] capable of reasoning, planning, and interacting with...[their] environment”.

A typical example of an AI agent might be an application that helps users search and book flights for a vacation, but we’re optimistic that agents could also enable flexible interaction with different datasets, documents, and systems to advance UNICEF goals – such as by providing tailored support to teachers, parents, or community health workers that draws on official materials, best practices, and analysis of situation-specific data to iteratively make recommendations.

We’re already seeing promise around agentic systems for converting content into more accessible multimedia formats; for example, we’re testing agentic document extraction to convert PDFs into structured, accessible HTML to enable blind readers to more easily navigate these documents and listen to their contents via screen reader. A goal for the first half of 2025 is to create additional examples of agents applied to UNICEF problems to demonstrate this new technology to colleagues and explore its potential.

The year of evaluation.

As models proliferate, evaluating their performance becomes more and more important. Leaderboards built using benchmark datasets can help compare available options and rank models on different tasks. We’re seeing developments in general-purpose benchmarks – such as humanity’s last exam, a dataset designed with subject-matter experts to pose challenging questions across different disciplines – as well as in specific fields, such as AI for Education’s efforts to develop a quality assurance framework that includes benchmarks to evaluate LLMs’ knowledge of pedagogical principles (including for special-needs children) and LLMs’ ability to solve visual math problems (with specific efforts to solicit examples from low and middle-income countries).

We are fortunate to have a choice of models for many of our use cases, but as a result it is becoming increasingly important for UNICEF to determine whether a model fits a given task and how much it will cost relative to the benefit it brings. We’re working to get better evaluation tools into the hands of UNICEF staff, whether that be through supporting the evaluation of translation models for specific language pairs and subject matter themes; building workflows for evaluating LLMs that extract, categorize, group, simplify, and/or summarize text and images for our accessible digital textbooks pipeline; or, ideally, releasing benchmark datasets that can be used for evaluating the performance of these models on topics of interest to UNICEF.

Transformation and change

With the adoption of the Global Digital Compact, digital transformation and AI-powered change are at the forefront of discussions across the UN. These forces are also transforming UNICEF as we have finalized our own internal AI strategy, embarked on an effort to update our policy guidance on AI for children to account for the changes wrought by generative AI, and begun to think about how AI can be used to address the challenges posed by increasing resource constraints for humanitarian and development work in 2025.

It feels as though we are at a defining moment. On the one hand, generative AI promises ever-greater value for professionals who use it for everything from drafting documents to generating code, and the performance of generative AI models in high-resource languages and contexts is impressive (though these models are still flawed in basic ways).

On the other hand, there is the risk that as certain workers or economies benefit from AI, others will be left behind, further exacerbating the digital divide. This may include those whose jobs are more easily automated; whose use cases are less similar to the ones AI models have been trained on; and who lack the internet, devices, language proficiency, or literacy to access AI models and services. There are ongoing struggles about the opacity and provenance of AI training data, including questions around the ethics of creating models that can impersonate individual artists’ work or that might leverage explicit images of children and minors in their training data. More fundamentally, there are questions about how AI reflects, reinforces, and amplifies existing power structures at the cost of individual dignity, autonomy, and human rights. There is also a real need to go beyond articulating ethical AI principles and provide practical, concrete guidance for implementers and decision-makers in action, which we hope to see more of in the coming year.

At UNICEF, we believe that we have the responsibility to document and publicize the risks of AI as they apply to all children, but also to explore the potential of AI that is built and implemented in a responsible manner, finding ways to extract benefits for those we serve.

Follow us as we continue on this journey in 2025.