Portfolio News

Building AI That Speaks Wolof: Reclaiming the Digital Future for West Africa

Soynade Research Data Science+AI Senegal

Jun 17 , 2026

Data Science+AI

Soynade Research

Senegal

Company Details

Amount invested $40,050 USD Funding Status active early period Founded in 2024 by Dioula Doucoure & Yaya Sy

Rebuilding the Digital World in West African Languages

Imagine you need health advice, access a school course, or talk to a customer service agent, but the system only speaks languages you never grew up with. Millions of people in West Africa face this every day. The internet and most AI tools were not built for them.

Soynade is changing that. We are an AI startup based in Dakar, Senegal, and our name comes from the Fula word soynade, meaning "to see far." We build open-source AI tools that can understand and speak West African languages - starting with Wolof.

Think of it this way: when you use Siri, Google Assistant, or ChatGPT, those products were built on decades of data and research, almost exclusively in English, French, Mandarin, and a few other major languages. Wolof, Bambara, Fula, and hundreds of other African languages were left out. We are building the missing infrastructure: speech recognition that understands Wolof, text-to-speech that can narrate in Wolof, translation tools, and AI that can read and write in these languages with cultural accuracy. All of them are free and open for anyone to use.

Children learn best in their own language. Our tech helps bring local-language education tools to classrooms. UNICEF's support lets us move faster, turning research into free tools that millions can actually use in their own language.

Soynade

The Invisible Barrier: Language as Digital Exclusion

Language is how we access everything: education, healthcare, financial services, government, news, and culture. When AI systems don't understand your language, you are effectively excluded from the digital future, not because you lack intelligence or opportunity, but because the technology was not designed with you in mind.]

For estimated 50 million speakers of Wolof, Bambara, and similar West African languages, this exclusion is worsened by another challenge: these languages are primarily oral. They speak before they write and, in some cases, speakers have never needed to read or write at all. A technology that requires typing in a foreign language creates double barriers. Voice-first, native-language AI is not just a convenience; it is a matter of basic access.

Children in these communities are particularly affected. Educational tools, digital learning platforms, and even basic information services assume literacy in a foreign language. A child who grows up speaking Wolof at home and French only at school is at a disadvantage from day one when accessing digital tools.

The Turning Point

For most of history, building language tools for an underrepresented language required enormous resources: thousands of hours from professional linguists, massive budgets, and years of work. Only the biggest technology companies with the biggest markets could afford it.

The revolution in AI, specifically in large language models and speech processing, has changed the economics. It is now possible for a small, focused team to build and train powerful language models using modern AI techniques at a fraction of the previous cost. The same tools that powered GPT, Whisper, and state-of-the-art translation models can now be adapted for African languages like Wolof if someone puts in the work.

We are doing that work. And crucially, we are doing it in the open, so that every research lab, every university, every civil society organisation in West Africa, and every parent-developer who wants to build for their community, can benefit from what we build.

Built for the Communities First

Most language technology companies either ignore African languages entirely or treat them as a side project. The few tools that exist for Wolof or Bambara are proprietary, locked behind APIs and not built with community input. They were built about these languages, not with the communities that speak them.

Soynade is different in three ways:

First, we go deep rather than wide. We focus intensively on West African languages rather than superficially supporting 100 languages. Our Wolof speech recognition model, Wolof-HuBERT-CTC, outperforms models produced by Meta and Orange, two companies with resources thousands of times larger than ours.

Second, we build for orality. Our tools are designed for voice-first. Our Oolel-Voices text-to-speech model can generate natural, expressive Wolof speech with control over tone and emotion, enabling everything from voice assistants to audiobooks to accessible education tools for children who cannot yet read.

Third, we go multimodal and cross-lingual. Our Oolel-Embed model can take a spoken Wolof question and find the right answer in French documents - without needing to transcribe the speech first.

Built in the Open, Built for Everyone

Open source means the code, the data, the models, and the training recipes are all publicly available for anyone to use, study, modify, and build on for free.

This matters for three reasons. First, trust. In communities where technology has historically been extracted from rather than built for, transparency is not optional. It is the foundation of any genuine partnership. People can look at exactly how our models were trained and on what data.

Second, speed. A researcher in Dakar, a developer in Bamako, or a university student in Ouagadougou can take our models and build something we never imagined - without asking for our permission or paying a license fee. The community multiplies our impact far beyond what two people could achieve alone.

Third, accountability. Open-source models can be evaluated, criticized, and improved by the community. This makes them more accurate, more culturally appropriate, and more reliable over time.

We publish everything on Hugging Face, the world's leading AI model repository, where our work is freely available to the global research community. To date, our datasets and models have been downloaded thousands of times by researchers and developers around the world.

It started with a tweet

Yaya had quietly built something remarkable: a Fula translation bot on X (then Twitter) called Firtanam. At a time when almost no one was building AI tools for Fula, a language spoken by over 30 million people across West Africa and the Sahel, Firtanam was just sitting there, working, being used by people who had never seen their language handled by a machine before. Dioula had found it, followed Yaya, and was watching from a distance.

Then one day, a message appeared in Dioula's inbox.

Yaya had signed up for a Masakhane challenge: a pan-African NLP research community that runs collaborative projects to build language tools for underrepresented languages. This particular challenge involved scraping Fula-language news and building a text classifier. Yaya had looked at the volunteer list for Fula and found exactly one name on it: his own. So he did what any good researcher does when the problem is bigger than one person; he went looking for collaborators.

He reached out to Dioula.

What followed was neither a formal co-founding meeting nor a pitch deck. It was two people who had never met, both spending their evenings and weekends on the same problem from different angles, suddenly realizing they had found each other. They worked on the Masakhane project together, and then they kept going.

They founded Cawoylel, a non-profit dedicated entirely to the Fula language. From Cawoylel, they released translation tools, language models, and community resources for Fula speakers, all free and open source. These tools are still in use today.

But they had both begun to feel the limits of what Fula alone could do. The model was working. The community was responding. And the same problem that affected Fula affected Wolof, Bambara, Soninke, Diakhanke, and dozens more languages whose speakers were equally invisible to the digital world.

So they made a decision: go bigger. Build infrastructure, not just tools. Build something that could scale beyond any single language. Build it as a company, so it could become sustainable and reach the scale the problem deserves.

Soynade

The People Behind the Models

Soynade Research was founded by two NLP engineers who are also native West African language speakers, a combination that matters more than most people realise.

Dioula Doucouré is an NLP and Data Engineer who leads overall strategy, business development, and external relations, while ensuring that what we build stays grounded in real market needs. As a native speaker of Fula, Wolof, Soninke, and Bambara, Dioula brings both linguistic intuition and the engineering rigour that African-language AI demands.

Yaya Sy (PhD) leads technical execution, model development, and research implementation. As the architect of Firtanam and a long-standing contributor to African NLP research communities, Yaya's expertise lies in making frontier AI methods work for languages the field routinely overlooks. He speaks Fula, Wolof, among others.

Our organisational structure reflects the nature of the work: a clear CEO/CTO division for decision-making and accountability, combined with deep joint involvement in the R&D process itself. You cannot delegate the pioneering work of building AI for African languages to a team that doesn't understand them. Both founders remain active researchers, not just managers.

Between us, we speak Fula, Wolof, Bambara, Soninke, languages that together represent tens of millions of speakers across Senegal, Mali, Guinea, and beyond. That linguistic reach is the foundation for what we build.

Lived Experience Builds Better AI

Diversity is not just a value for us; it is a technical requirement. You cannot build accurate, culturally respectful language AI without the people who speak those languages being in the room where the decisions are made.

When Dioula corrects a training example and says, "No, this is how we actually say it in Dakar", that is not a small editorial note; it shapes the model. When Yaya draws on his experience navigating both the francophone West African academic world and the Anglophone global AI research scene, that dual perspective enables us to publish competitive research while staying grounded in the communities we serve.

A diverse founding team that is genuinely embedded in the culture they are building for produces better AI. That is the lesson the industry is still learning, and it is the principle we started from.

When both founders are native speakers of the languages their models are trained on, the feedback loop between research and lived experience is immediate. Diversity here is not a policy; it is a methodology.

How We Scale

The UNICEF Venture Fund support is helping us on three fronts.

On the research side, the funding provides a runway to release more models, more datasets, and more evaluation tools - all open source. Every release raises the floor for what the entire African AI ecosystem can build on.

On the brand and go-to-market side, it helps us sharpen our product identity, understand our target customers, and build the commercial layers: The Soynade API enables developers and companies in telecom, healthcare, fintech, and education to integrate our speech and language tools. Oolel Studio is a user-friendly app that lets anyone, even without technical knowledge, synthesise speech, dub videos, add subtitles, translate content, and more. This is how the research becomes sustainable: enterprises pay for the API, keeping the research free for everyone else.

On the network side, being part of the UNICEF Venture Fund's Data and Trust cohort connects us with a global community of mission-driven funders, partners, and potential customers who care about digital inclusion. We are using this visibility to lay the groundwork for follow-on investment, enabling us to expand into more languages, more modalities, and more of the continent.

UNICEF Venture Fund's support validates our vision, helps us build partnerships and show the world there's a market for AI that speaks Africa's languages

Soynade

What We’ve Shipped

In six months, we have shipped more open-source African language AI than most organizations produce in years in the continent:

Wolof-HuBERT: our Wolof automatic speech recognition model, outperforming models from Meta and Orange. Paired with the Wolof-ASR-Data training dataset (86,000+ audio samples). Published an academic paper: "Speech Language Models for Under-Represented Languages: Insights from Wolof".
Oolel-Voices: our Wolof text-to-speech model with voice cloning and expressive control over tone and emotion, with a public interactive demo.
Bambara-Speech-Translation-Data: a 221,000-sample Bambara–English parallel corpus (54 GB), expanding our reach from Wolof to Bambara speakers across West Africa.
Oolel-Embed: a cross-lingual bilingual embedding model that enables direct retrieval of French documents from spoken Wolof queries: no transcription step needed. Academic paper: "Cross-lingual Matryoshka Representation Learning across Speech and Text" (arXiv, February 2026).
Oolel-Corrector: a 2-billion-parameter Wolof text generation and correction model.
Wolof-Agri-Captions and Wolof-Non-Standard-Orthography datasets, expanding coverage into agriculture and everyday informal language.
Brand identity and website launched at soynade.ai

Our models and datasets have been downloaded thousands of times by researchers, developers, and organizations across the world.

Build This Future With Us

If you are a researcher, developer, or organisation working in or for West African communities, our models and datasets are free and waiting for you. You can download, use, and build on everything we publish.

If you are a company in telecom, healthcare, fintech, or education looking to reach Wolof, Bambara, or Fula-speaking customers in their own language, request early access to the Soynade API here.

If you are a researcher or engineer who wants to contribute, our repositories are open at github.com/soynade-research, and our models are published at Hugging Face.

Follow our journey and subscribe to updates on our Karma page. Stay connected on LinkedIn, X (Twitter), and GitHub - all under @soynade. For partnerships or press: [email protected].

The moon shines for all. So should technology.