Why I’m committed to breaking the bias in large language models
As a consultant in orthopaedic surgery at Khoo Teck Puat Hospital, Singapore, I’ve seen first-hand how cultural differences can be overlooked by large language models (LLMs).
Back in 2005, Singapore’s Health Promotion Board introduced categories of body mass index (BMI) tailored specifically for the local population. It highlighted a crucial issue — Asian people face a higher risk of diabetes and cardiovascular diseases at lower BMI scores compared with European and North American populations. Under the board’s guidelines, a BMI of 23 to 27.4 would be classified as ‘overweight’, a lower range than the global standard of 25 to 29.9 set by the World Health Organization (WHO).
Nature Career Guide: Faculty
I was reviewing recommendations for a person’s health plan generated by an artificial intelligence (AI) system, when I realized that it had categorized the person’s BMI of 24 as being inside conventional limits, disregarding the guidelines we follow in Singapore. It was a stark reminder of how important it is for AI systems to account for diversity.
This is one example of many. Having lived and worked in Malaysia, Singapore, the United Kingdom and the United States, I’ve gained an understanding of how cultural differences can affect the effectiveness of AI-driven systems. Medical terms and other practices that are well understood in one society can be misinterpreted by an AI system if it hasn’t been sufficiently exposed to its culture. Fixing these biases is not just a technical task but a moral responsibility, because it’s essential to develop AI systems that accurately represent the different realities of people around the world.
Identifying blind spots
As the saying goes, you are what you eat, and in the case of generative AI, these programs process vast amounts of data and amplify the patterns present in that information. Language bias occurs because AI models are often trained on data sets dominated by English-language information. This often means that a model will perform better on an English-language task than it will on those in other languages, inadvertently sidelining people whose first language is not English.
Imagine a library filled predominantly with English-language books; a reader seeking information in another language would struggle to find the right material — and so, too, do LLMs. In a 2023 preprint, researchers showed that a popular LLM performed better with English prompts than with those in 37 other languages, wherein it faced challenges with accuracy and semantics1.
Gender biases are another particularly pervasive issue in the landscape of LLMs, often reinforcing stereotypes embedded in the underlying data. This can be seen in word embeddings, a process in which words are represented by how semantically similar they are. In a 2016 preprint, Tolga Bolukbasi, a computer scientist then at Boston University in Massachusetts, and his colleagues showed how various word embeddings associated the word ‘man’ with ‘computer programmer’ and ‘woman’ with ‘homemaker’, amplifying gender stereotypes through its output2,3.
In a 2023 study, researchers prompted four LLMs with a sentence that included a pronoun and two stereotypically gendered occupations. The LLMs were 6.8 times more likely to pick a stereotypically female job when presented with a female pronoun, and 3.4 times more likely to pick a stereotypically male job with a male pronoun4.
Navigating the bias
To ensure that bias doesn’t creep into my work when using LLMs, I adopt several strategies. First, I treat AI outputs as a starting point rather than as the final product. Whenever I use generative AI to assist with research or writing, I always cross-check its results with trusted sources from various perspectives.
In a project from this Feburary that focused on developing AI-generated educational content for the prevention of diabetic neuropathy — a condition in which prolonged high blood-sugar levels causes nerve damage — I consulted peers from various backgrounds to ensure that the material was culturally sensitive and relevant to the diverse population groups in Singapore, including Malay, Chinese and Indian people.
After the AI created an initial draft of the prevention strategies, I shared the content with colleagues from each of these cultural backgrounds. My Malay colleague pointed out that the AI’s recommendations heavily emphasized dietary adjustments common in Western cultures, such as reducing carbohydrate intake, without considering the significance of rice in Malay cuisine. She suggested including alternatives such as reducing portion sizes or incorporating low-glycemic-index rice varieties that align with Malay dietary practices. Meanwhile, a Chinese colleague noted that the AI failed to address the traditional use of herbal medicine and the importance of food therapy in Chinese culture. An Indian colleague highlighted the need to consider vegetarian options and the use of spices such as turmeric, which is commonly thought, in Indian culture, to have anti-inflammatory properties that are beneficial for managing diabetes.
In addition to peer review, I ran a controlled comparison by writing my own set of prevention strategies without AI assistance. This allowed me to directly compare the AI-generated content with my findings to assess whether the AI had accurately captured the cultural intricacies of dietary practices among these groups. The comparison revealed that, although the AI provided general dietary advice, it lacked depth in accommodating cultural preferences from diverse population groups.
By integrating this culturally informed feedback and comparison, I was able to make the AI-generated strategies more inclusive and culturally sensitive. The final result provided practical, culturally relevant advice tailored to the dietary practices of each group, ensuring that the educational material was rigorous, credible and free from the biases that the AI might have introduced.
Despite these challenges, I think that it’s crucial to keep pushing forward. AI, in many ways, mirrors our society — its strengths, biases and limitations. As we develop this technology, society needs to be mindful of its technical capabilities and its impact on people and cultures. Looking ahead, I hope the conversation around AI and bias will continue to grow, incorporating more diverse perspectives and ideas. This is an ongoing journey, full of challenges and opportunities. It requires us to stay committed to making AI more inclusive and representative of the diverse world we live in.
Competing Interests
The author declares no competing interests.
link