KARACHI: Fahad Maqsood Qazi was performing a seemingly straightforward task of developing an automated artificial intelligence (AI) dubbing system for his software firm in Pakistan’s southern city of Hyderabad last year, when he hit the wall: the fundamental text-to-speech (TTS) and speech-to-text (STT) models simply didn’t exist for his native Sindhi language.
This unexpected hurdle while working at Flis Technologies ignited a passion in the 23-year-old IT professional to bridge the AI gap for a language spoken by around 40 million people globally, including a significant diaspora whose children risk losing their linguistic heritage, and soon he started working on his Sindhi-language TTS and STT systems.
In August last year, he began manually transcribing hours of Sindhi audio content from YouTube, stories, audiobooks, vlogs and news reports to form a training dataset. Qazi took a sigh of relief when he discovered that a Google employee, Asad Memon, had recently added Sindhi to Mozilla’s Common Voice project, a global effort to crowdsource voice data for underrepresented languages.
Qazi merged the Common Voice data with his own and began training the AI models. By January this year, he had built functioning Sindhi TTS and STT systems. Sindhi also lacked a tokenizer, a crucial component to process text in AI models, so Qazi built his own. Months of rigorous work, training and refining various models led the 23-year-old to a significant breakthrough that can help future generations of his community to connect with their roots — Sindhi, an Indo-Aryan language with a history that spans approximately 2,400 years and its origins dating back to the 3rd century BCE.
“Since Sindhi isn’t formally taught in most diaspora communities, many young Sindhis grow up without the ability to read or write the language,” said Qazi, who graduated in computer science, explaining a lack of exposure to Sindhi could lead to a gradual loss of identity.
“My tools aim to change that. By allowing people to communicate in Sindhi through speech and text, my tools would help them stay connected to their roots.”
In March, Qazi publicly shared these models on LinkedIn and uploaded them to HuggingFace, an open-source platform for machine learning models, making them freely available to developers and researchers worldwide, which marked a pivotal moment for Sindhi in the digital age.
Recalling the days when he started working on these tools, Qazi said he realized that Sindhi was missing from the AI revolution and without publicly available speech datasets, tokenizers or linguistic tools, the language had virtually been excluded from the digital future.
“This was shocking for us,” he told Arab News. “Imagine, 40 million Sindhis in the world, yet no one had built these essential AI systems for their language.”
Qazi says his work will have a “profound impact,” particularly on Sindhi-speaking children growing up in countries such as Saudi Arabia, the United Arab Emirates, the United States and the United Kingdom.
For the diaspora communities wherein the language isn’t formally taught, these Sindhi AI tools offer a vital link to their cultural identity, according to the IT professional.
These models can be integrated with mobile keyboards for Sindhi voice-to-text (VTT) messaging, while the TTS model can be used to listen to written Sindhi content, according to Qazi.
They have the potential to empower uneducated adults and the elderly within the Sindhi community, both at home and abroad.
“This means everyday conversations with family and friends, even over messaging apps, can happen in Sindhi. That kind of natural, daily use can help preserve the language and keep it alive across generations,” he said.
“A parent who doesn’t know how to read Sindhi will be able to read stories out loud to their children through my text to speech model. Elderly people who never learned to read or write Sindhi can now speak to search for information and listen to responses.”
Qazi hopes his AI tools will play a significant role in long-term growth and integration of the Sindhi language on global digital platforms.
“This technology can play a key role in ensuring that Sindhi doesn’t just survive, it thrives in the digital age,” he said.
“By giving Sindhi a presence in AI systems like TTS and STT, I am ensuring it to be part of global platforms such as voice assistants, educational apps, audiobooks, and translation tools. That kind of integration was impossible before.”
Young Pakistani introduces smart tools to bridge AI gap for millions of Sindhis worldwide
https://arab.news/8jq7q
Young Pakistani introduces smart tools to bridge AI gap for millions of Sindhis worldwide
- Sindhi, an Indo-Aryan language with a history that spans approximately 2,400 years, is spoken in Pakistan and India, and by diaspora in several regions
- Fahad Maqsood Qazi has developed previously unavailable Sindhi text-to-speech and speech-to-text AI models and shared on open-source platforms
Pakistan says it seized 32 square kilometers inside Afghanistan as border clashes escalate
- Security official describes ‘limited tactical action’ in Gudwana after Afghan assaults
- Islamabad accuses Kabul of sheltering militants as UN, China and Russia urge restraint
ISLAMABAD: Pakistan has seized a 32-square-kilometer area inside Afghanistan following overnight fighting, a security official said on Saturday, as cross-border clashes between the two countries escalated sharply.
A Pakistani security official, speaking on condition of anonymity, said troops carried out a “limited tactical action” in the Gudwana area opposite the Zhob sector along the frontier, capturing Afghan territory after responding to attacks on Pakistani positions.
“On the night of Feb. 26/27, posts opposite the Zhob sector launched anticipated physical attacks on multiple Pakistani positions,” the official said, referring to fighters linked to Afghanistan’s Taliban authorities, whom Islamabad identifies as Tehreek-e-Taliban Afghanistan (TTA).
“In response to aggressive unprovoked fire and physical attacks, Pakistan security forces launched a limited tactical action on the night of Feb. 27/28 in the general area of Gudwana with a view to capture TTA Tahir Post,” he continued, adding that 32 square kilometers of Afghan territory were seized.
The official said special combat teams crossed the border after preparatory bombardment, supported by intelligence, surveillance and reconnaissance assets providing “real-time battlefield awareness.”
He said 24 Afghan Taliban fighters were killed and 37 wounded, with no Pakistani casualties reported.
The claims could not be independently verified, and there was no immediate confirmation from Taliban authorities in Kabul of any territorial loss in the Gudwana area.
The latest clashes erupted after Pakistani airstrikes targeted what Islamabad described as militant hideouts inside Afghanistan over the weekend, triggering retaliatory fire along the frontier and sharply escalating long-running tensions. Islamabad accuses Kabul of sheltering Pakistani Taliban militants responsible for attacks inside Pakistan, an allegation that Afghanistan denies.
Pakistan’s Information Minister Attaullah Tarar said on Saturday evening that 352 Afghan Taliban fighters had been killed and more than 535 wounded since the latest phase of hostilities began.
Tarar said Pakistani strikes had destroyed 130 check posts, 171 tanks and armored vehicles and targeted 41 locations across Afghanistan by air. Those figures could not be independently verified.
The United Nations, as well as China and Russia, have called for restraint.
The United States said Pakistan has the right to defend itself against cross-border militancy.










