Skip to content

OpenAI's AI inference model sometimes “thinks” in Chinese, but no one really knows why

    OpenAI's AI inference model sometimes "thinks" in Chinese, but no one really knows why

    OpenAI's AI inference model sometimes “thinks” in Chinese, but no one really knows why

    Shortly after OpenAI released o1, its first “inference” artificial intelligence model, people began to notice a strange phenomenon. The model would sometimes start “thinking” in Chinese, Farsi or other languages ​​- even when questions were asked in English.

    Given a problem to solve—for example, “How many R's are there in the word 'strawberry'?”—o1 will begin its “thinking” process by performing a series of reasoning steps to arrive at the answer. If the question is written in English, o1's final response will be in English. But the model performs some steps in another language before reaching a conclusion.

    “(O1) suddenly started thinking in Chinese midway through,” said one user on Reddit.

    “Why did (o1) suddenly start thinking in Chinese?” another user asked in a post on X. “No part of the conversation (5+ messages) is in Chinese.”

    OpenAI has yet to explain o1's strange behavior or even acknowledge it. So what might happen?

    Well, AI experts aren't sure. But they have some theories.

    Several people on X, including Hugging Face CEO Clément Delangue, mentioned that inference models like o1 are trained on datasets containing large numbers of Chinese characters. Google DeepMind researcher Ted Shaw claimed that companies including OpenAI use third-party Chinese data annotation services, and o1's switch to Chinese is an example of “Chinese language affecting inference.”

    “(Labs like OpenAI and Anthropic) leverage (third-party) data labeling services to obtain PhD-level inference data for science, math, and coding,” Xiao wrote in a post on X. “(F)or the availability and cost of expert labor the reason is that many of these data providers are based in China.”

    Labels, also called tags or annotations, help the model understand and interpret the data during training. For example, labels for training an image recognition model might take the form of markers or captions around objects, referencing each person, place, or object depicted in the image.

    Research shows that biased labels produce biased models. For example, average annotators are more likely to label phrases in African American Vernacular English (AAVE), the informal grammar used by some black Americans, as toxic, and leading artificial intelligence toxicity detectors trained on labels label AAVE Considered disproportionately toxic.

    However, other experts do not agree with the o1 China data labeling hypothesis. They noted that o1 may also switch to languages ​​other than Hindi, Thai or Chinese when looking for solutions.

    Instead, these experts say, o1 and other inference models may simply use the language they believe is most effective in achieving their goals (or illusions).

    “The model doesn't know what the language is, and it doesn't know that languages ​​are different,” Matthew Guzdial, artificial intelligence researcher and assistant professor at the University of Alberta, told TechCrunch. “It's just a text message.”

    In fact, the model does not process words directly. They use tokens instead. Token able It could be something like “awesome.” Or they can be syllables, like “fan,” “tas,” and “tic.” Or they can even be individual characters within a word, such as “f”, “a”, “n”, “t”, “a”, “s”, “t”, “i”, “c”.

    Like labels, tags can also introduce bias. For example, many word-to-token translators assume that spaces in sentences represent new words, although not all languages ​​use spaces to separate words.

    Tiezhen Wang, a software engineer at AI startup Hugging Face, agrees with Guzdial that the inference model's language inconsistency may be explained by the associations the model makes during training.

    Wang wrote in a post on , because each number is just one syllable, which makes calculations clear and efficient, but when it comes to topics like unconscious bias, I automatically switch to English, mostly because that's where I first learned and absorbed these ideas.”

    Mr. Wang's theory makes sense. After all, models are probabilistic machines. Trained on many examples, they learned patterns in making predictions, such as how “to whom” in emails often precedes “may involve”.

    But Luca Soldini, a research scientist at the nonprofit Allen Institute for Artificial Intelligence, cautions that we can't know for sure. “Because these models are so opaque, this kind of observation of deployed AI systems is impossible to support,” he told TechCrunch. “This is one of many examples of why transparency in how AI systems are built is critical. “

    Since OpenAI did not give an answer, we can only think about why o1 thought of French songs but synthetic biology in Mandarin.

    German plant companies enter alternative Milk deal Nome da China dezenas de lugares na região que são controversos com a Índia | Ásia Fed's Powell warns of raising long-term interest rates as “supply shock” offers policy challenges Bolools ou fungos: Quais são os riscos à saúde? – peso e nutrição British investigation says Os aliados estão criando um “tribunal especial” para condenar Putin. Mas será difícil Dick's sporting goods buy struggling football lockers for $2.4 billion Piratas foram “aprimorados através da inteligência artificial” LA28 has a “new solution” for Olympic transportation: electrical taxis Legislação: IL espera “mudar tudo” no Ministério da Educação para entender o número de pessoas que não têm professores – Portugal