Multilinguals cannot help but code-switch!

Linguists have been saying this for ever, but now we have empirical evidence from cognitive scientists to support this: If you speak two languages and have ever found this task to be difficult—choosing the “right” tongue based on the context you’re in—it’s because both languages are always “on” in the brains of bilinguals.

In fact, “ [exposure to multiple languages] is not confusing them [bilingual babies] or messing them up developmentally—the opposite is true.” says Judith F. Kroll , Professor Emeritus, University of Pennsylvania.

Read more.

Functions of code-mixing

Rafiya Begum, Microsoft Research India

So far on this blog, we have seen many examples of code-mixing that occurs frequently among bilingual and multilingual communities. A very interesting question is why people mix two languages (code-mix) or switch between two languages (code-switch).

I have come across school kids whose mother tongue is different from the medium of language (second language) used among friends in school. Since they spend so much time with friends, they code-mix their mother tongue and the language used among friends even when they are back home. This continues even when they grow up since they learnt this phenomenon of mixing or switching between languages from an early age. This sometimes gives a hilarious effect when they use words or phrases from another language into their native language even if the translations of those expressions are present in their native language. See the examples of Hyderabadi Urdu-English sentences below:

ten baje hai (It is ten o’clock)

tum log double meaning dialog bolke sata rai  (You are irritating me by saying double meaning dialogs)

In the above examples, ten and double meaning dialog (phrase) are from English and the rest of the part is in Urdu.

People change their speech in order to fit in with the person they are talking with. They code-switch when they have to talk about a particular topic or to change the context or to convey the identity of the person who is code-switching. People code-switch to show formality or their attitude to the listener and when certain words are lacking in a language they get those words from another language.

Here is an example of code-switching between Hyderabadi Urdu and Telangana Telugu.

Urdu                            Telugu

arey suno miyaa… naaku  ii  pani  iiyaradey?

(Hey, listen Mister …. Can’t you give this work to me?)

In the above example, speaker is switching from Hyderabadi Urdu to Telangana Telugu in the same conversation. The speaker is using Urdu to grab the attention of the listener or address the listener and then switches to Telugu to express the actual matter. The switching location between two languages is called as switch point and it carries a lot of significance. In other words, we can say that the purpose information behind the switching is carried by the switch point. Switch points represent various code-switching categories. Looking at the Twitter code-switched Hindi-English tweets we observed the following categories which are divided into two types, i.e., Pragmatic and Structural.


Fact to opinion switch is where speakers switch languages when they are switching from expressing facts to opinions. They switch to another language for reinforcing a positive or a negative sentiment/opinion expressed in a language. In Sarcasm, a simple opinion about a particular topic is expressed in a language and a switch to another to express a sarcastic opinion about the same. Quotations, which are often employed to express opinions, are stated in the original language, while the context or fact might be stated in another language. Cause-Effect switch is used to express the reason or cause in one language and effect in another. In Translation, a fact or opinion expressed in one language is translated to the other language, perhaps for reinforcement or wider reach of the tweet.


In Reporting-Speech, we observed that often Hindi is used to quote real conversations which took place in Hindi while the reporting part is in English. The conversations may be in quotes, and the reporting may contain specific English cue words such as ‘say’, ‘ask’, ‘think’, ‘tell’, etc. The other examples of code-switching are use of wishes, greetings and addressing in one language (usually English) and then switching to another.

If you want to know more about the functions of code-switching, you can refer to the following paper:

Begum, R., Bali, K., Choudhury, M., Rudra, K., Ganguly, N. (2016). Functions of Code-Switching in Tweets: An Annotation Scheme and Some Initial Experiments. In Proc. LREC.

Borrowing Ya Mixing? (Part 1)

Kalika Bali, Microsoft Research India

An English speaker might go to a café and order an egg-sandwich made with egg, mustard and mayonnaise. If she stops to think, she might realize that she has the French language to thank for the words, café , and mayonnaise. However, unless she is a linguist major with a specific interest in English Etymology, she might be surprised that the word mustard, that so very quintessential ingredient of English cooking, is also of French origin.

A villager from the heart of Hindi-speaking rural India, also might not think that when he goes to the station and buys a ticket for the bus, he is actually using English vocabulary.

The historical linguist, Hans Hock, says that “languages do not exist in vacuum”.  Languages and dialects which are in contact or co-exist are continuously being influenced by each other. The extent and the type of influence can vary depending on many socio-political, cultural and linguistic aspects and can range from borrowing of sounds, words and sometimes even entire syntactic structures.

So, when a English-French bilingual says, “Je vais à Nice pour le week-end”, is he code-mixing or is “week-end” a borrowing from English into French?

Even linguists cannot agree on “other language embeddings”.  Is it true Code-mixing?  What is nonce-word borrowing? Do these differ from loanwords that are integrated into the native vocabulary and grammatical structure?

Many linguists believe that loan-words start out as Code-Mixing or Nonce-borrowing but by repeated use and diffusion across the language they gradually convert to native vocabulary and acquire the characteristics of the “borrowing” language. In spoken forms, this would be the adaptation of the loanword to the sound-system and the grammar of the native language, that is phonological and morpho-syntactic convergence.

The problem with this is that in many cases a native accent might be mistaken for phonological convergence, and a morpho-syntactic marking might not be readily visible.

For example, most Hindi speakers of English would pronounce an English alveolar /d/ as a retroflex because an alveolar plosive is not a part of the Hindi phonology. However, this does not imply that the said English word has become a part of the native vocabulary.

Similarly, if we look at the two sentences:

“sab artists ko bulayaa hai” (all artists have been called),


“sab artist kal aayenge”

(all artists will come tomorrow)

In the first sentence the English inflection –s on the word artist marks it as plural but in the second case, the plural is marked on the Hindi Verb.

Does this imply that in the first case it is Code Mixing and in the second a case of borrowing given that both the forms and the structures are equally acceptable and common in Hindi?

It is not easy to decide these categories especially for single words without looking at diachronic data and the inherent fuzziness of the distinction itself. In general, it is believed that there exists a sort of continuum between Code Mixing and loan vocabulary where the edges might be clearly distinguishable but it is difficult to disambiguate the vast majority in the middle especially for single words.

In a future post, we will look at what this continuum might look like and one possible way we can try to distinguish true code-mixing from loanwords.

In the meantime, you can look at some earlier studies on borrowing, mixing, and what lies in between.

  1. Frederic Field. 2002. Linguistic borrowing in bilingual contexts. Amsterdam: Benjamins.
  2. Carol Myers-Scotton. 2002. Contact linguistics: Bilingual encounters and grammatical outcomes. Oxford University Press.
  3. Pieter Muysken. 2000. Bilingual speech: A typology of code-mixing. Cambridge University Press.
  4. Shana Poplack, D. Sankoff, and C. Miller. 1988. The social correlates and linguistic processes of lexical borrowing and assimilation. Linguistics 26:47-104.
  5. Shana Poplack and Nathalie Dion. 2012. “Myths and facts about loanword development.” in Language Variation and Change 24, 3.
  6. David Sankoff, Shana Poplack, and Swathi Vanniarajan. 1990. The case of the nonce loan in Tamil. Language Variation and Change, 2 (1990), 71-101. Cambridge University Press.