Project Mélange @ Interspeech 2017

Interspeech is the annual conference of the International Speech Communication Association (ISCA). This year, Interspeech was held in beautiful Stockholm, Sweden in August 2017 and saw a record participation of over 2000 attendees!

Project Mélange members Sunayana Sitaram and Kalika Bali co-organized a special session on code-switching with Prof. Alan Black from CMU, Prof. Julia Hirschberg from Columbia University, Prof. Thamar Solorio from University of Houston and Prof. Mona Diab from George Washington University. The special session was in the pre and post-lunch sessions on the first day of the conference, and we were very excited to see the room full of people for most of the session!

Topics in the special session ranged from data collection and resource building to applications such as Automatic Speech Recognition and Speech Synthesis. The session covered various language pairs – French-Algerian Arabic, Dutch-Frisian, Hindi-English, isiZulu-English and Spanish-English. At the end of the 9 oral presentations, we had an interesting and engaging panel discussion on techniques, applications, data and resources for code-switching in speech and language.

Take a look at the special session website for the full list of papers and more details!



Code switching and code-mixing in 19th century Bengal

Aniruddha Baul, Jadavpur University

Code switching and code mixing strategies took an important part in the conversation pattern in 19th century Bengal.  I would like to highlight the variations of code switching and code mixing in Bangla, considering the prahasan (skit) and Rupchand Pokkhir gaan (songs of Rupchand Pokkhi, a renowned poet cum musician). Before discussing the examples and variations I would like to elaborate why these two particular literary traditions were chosen as the documents of code switching and code mixing while talking about conversation.

The dialogues of prahasans were mainly based on real conversations. The main target of the prahasans was to show the real image of the contemporary society including their language. There were no questions of loyalty to the standard language in the dialogues of most of the prahasan. So, we can take the dialogue of the prahasans as authentic language data. I chose a prahasan named “Ekei ki bole sovvota” by Michael Madhusudan Dutt for initial analysis. The prahasan is primarily based on the lifestyle of youth who have just learned English in 19th century Kolkata . Here are some dialogues –

Shibu: ja bol bhai, kintu ora dujon lekhanpora besh jane

(Whatever you would say but they are very learned persons)

Bolai: between ourselves, emon ki jane ?

(among us, what do they know?)

Mahesh: hya,hya, sokoleri biddye jana ache! se din je nobo ek khana chithi likhechilo ta toh dekheicho, tate Lindley Morer je durdosa ta toh mone ache ?

(I know about their knowledge! I think you can remember the letter written by Nobo where the English grammar was poor)

Naba :kintu gentle man, ekhon e desh amader pokkhe jeno mosto jelkhana;ei griho kebol amader liberty hall orthath amader swadhinotar dalan;ekhane jar je khusi se ti koro|gentleman, in the name of freedom let us enjoy ourselves

(But gentleman, now this country is like a prison to us, this room is our liberty hall, the passage of our freedom ,you can do everything whatever you wish to. Gentleman, in the name of freedom let us enjoy ourselves)

Despite the two having the same language background, we notice that there is a lot of code mixing and code switching in this dialogue. We see that sometimes vocatives or address terms of Bangla are replaced by English words like “gentleman”. Concepts like “liberty hall”, which came from Western thought, remained English in the dialogue. Prepositions are often replaced but the syntax follows the structures of the matrix language. Knowledge was measured by skill in the English language and English was the status symbol for native Bengalis during the colonial period of 19th century.  These kinds of code mixing and code switching established the speakers’ identity as “educated” people.

Sibu : That’s a lie

Naba: What! Tumi amake lair bolo? Tumi jano na ami tomake akhoni shoot korbo?

(What! You call me liar! Don’t you know that I am going to shoot you? )

Chaitan : Ha! jete dao, jete dao, ekta trifling kotha niye miche jhogra keno ?

(Ignore the thing! Why are you fighting meaninglessly for a trifling word?)

Naba: Trifling! – o amake liar bolle –abar trifling? O amake bangala bolle na keno? O amake mitthebaadi bolle na keno? Tate kon shala ragto? Kintu –liar –e ki bordasto hoye?

( trifling! he calls me liar? why did not he tell it in Bangla? If he calls me “mittyebadi” {Bengali word for “liar”} instead of liar I don’t mind anything).

From the examples above, we can see in sentences like “ami tomake akhoni shoot korbo” (I am going to shoot you), the verb “shoot” acts as a noun and  “korbo” the auxiliary verb of Bangla is added after “shoot” to become a compound verb like “shoot korbo”. If we focus on the content we can see the hierarchy of prestige languages – “liar” and  “mitthyebadi” have the same meaning but the word “liar” was considered superior to the Bangla equivalent term “mitthyebadi”. These dialogues show the language hierarchy during the Colonial period.

When the British communicated with each other they choose to do so in English but when they had to communicate with native people they had to use mixed language for the sake of negotiation. But it is interesting to observe that when the upper class and middle class native people communicated with each other they used mixed languages too. In this case, there was no question of negotiation. If we follow the songs of famous singer Rupchand Pokkhi, who became famous for singing using mixed languages in his songs in 19th century Bengal, we can understand the possible reasons why native people chose to use mixed language. There was a story that one day a high class British officer was invited as the main guest of the party organized by Rupchand’s patron. The British officer wanted to hear Rupchand’s song but it was impossible for him to understand the lyrics. Rupchand saved the day by singing ,

                                                   Let me go Ohe dwari I visit to bongsidhari

                                                  ( Oh gatekeeper)           (flute player, Krishna)

                                                    Esechi brojo hote ami brojer kulonaari

                                  ( I am coming from Braj, I am a respected woman from Braj)

                                         Beg you doorkeeper let me get, I want to see blockhead,

                                                Far whom our radhe dead ,ami search kori

                                                                      (I am searching for)

                               Srimoti radhar kena servant ,ei dakh ache daskhoth agreement,

                            (Servant owned by Srimati Radhe, see we do have signed agreement)

                                                   Ekhoni korbo present ,brojopure lob dhori,

                                       (I shall be presenting now, shall be hijacking to Brajapur)

                                                   Moral character suno or,butterthief nonichor

                                   (Please find the moral character of him, the butter thief)

                                                 Blaggard rakhal poor, chor mothurar dondodhari

                                        ( the poor shepherd, the thief is the authority of Mathura)

                                               Kohe R.C.D Bird king , black nonsense ver cunning


                                                     Flute te kore sing, mojayeche Raikishori

                                          (By singing using a flute, convinced the Raikishori)

(Friend of Radha wanted to meet Krishna and tell him the condition of Radha. But, the doorkeeper of the palace did not allow her inside so she was rebuking Krishna for cheating on Radha )

We can see the nature of code switching very clearly in the song where the matrix language of some sentences are in English and some of them are in Bangla. Most of the address terms are from Bangla. We also notice compound verbs like “search kori” where one item of the verb is in English and one is in Bangla. Apart from this, we can see that many lexical codes are mixed in the song. Both lexical mixing and structural mixing happened in the songs of Rupchand. Let us see a song here,

                                        Amare fraud kore kaliya damn tui kotha geli

                                 (To me)          (by doing)             (Where did you go?)

                                       I am for you very sorry, golden body holo kaali

                                                                                                 (Became pale)

                                        Ho my dear dearest , modhupure tui geli kesto

                                                                ( Krishna you went to madhupur)

                                      Oh my dear,how to rest,here dear bonomali

                                                       Soon re shyam tore boli

                                    (Oh Shyam, please listen, what I am saying)

                                      Poor creature milk gerel(girl),tader breast’e marli shel

                                                                         (their)               (by targeting arrow)

                                       Nonsense tor naiko akkel,breach of contract korli

                                                   (You have no sense)                   (did)

                                                        Femalegone fail korli

                                                     (You have failed the female)

                                       Lompot sother fortune khullo,mathura’te king holo

                               (The clever became fortunate, he becam the king at Mathura)

                                       Uncle’er pran nashilo,kubujar kuj pele dali

                                          (Killed his uncle……rest not understood)

                                                         Nile dashi re mohishi boli

                                                 (took your maid servant as queen)

                                       Sri nandar boy young lad ,croocked mind hard

                                    (of Sri Nanda)

                                      Kohe R.C.D Bird e pelacard krishnokeli


                                                  Half English half Bangali

(Radha was dumped by Krishna so she was rebuking him and expressing her grief)

In this song, there is not only lexical mixing but also structural mixing. For example, in the word “femalegon”, “gon” is the plural marker of Bangla which is added to the English world “female”. Bangla case marker like “e” or “r” which are added to the noun and make words like “breast-e” and “uncle-r”. We also notice syntactic changes of English sentences, where the changes are inspired by Bangla syntax. Objects follow the subject just like Bangla sentence structure so here we can see “I am for you very sorry” instead of “I am very sorry for you” .

Considering the sociolinguistic aspects of code mixing, we can ask: what are the reasons behind this kind of code switching and code mixing in Rupchand’s song? At first, we should know about the audience of his songs. If there were a few British people in the audience attending the performance, then these kinds of code switching and code mixing were natural. But, one could not be famous among the natives following this policy. So there had to be huge acceptance and demand for these kind of songs. We can assume that Rupchand created his songs for the British audience but the songs became famous gradually among the English loving natives, who could relate to the language of the song with their language of conversation where code switching and code mixing take a great part. So code mixing and code switching can be related to the identity of the natives in the 19th century Bengal.


Bandhopadhyay, Asit Kumar,first edition-1973, Bangla Sahittyer Itibrittyo,vol-4, by (History of Bengali literature by Asit Kumar Bandhopadhyay.) Modern Book Agency Pvt LTD,Kolkata

Chakraborty, Ramakanta, Bismrito Darpan (Forgotten Mirror edited by Ramakanta Chakraborty)Sanskrito Pustok bhandar,Kolkata

Khetrogupto, first edition-1965, Madhusudon Rachanabali (Collected Works of Micheal Madhusudon Dutt edited by khetrogupto) Sahittya samsad, Kolkata

Lahiri, Durgadas, first edition-1905, Bangalir Gaan  (Songs of the Bengalis edited by Durgadas Lahiri) Bangabasi Electric Press, Kolkata

Myers Scotton, C. 1982. ‘The possibility of code switching: Motivation for maintaining multilingualism’ in Anthropological Linguistics, Vol. 24, No. 4, pp. 432-444

How code mixing can be used for education

Dr Dripta Piplai, Jadavpur University

IMG-20170403-WA0003Author, “Nijer bhashaye galpo” (Stories in one’s own tongue)

A close observation of the everyday language use of children in India reveals many instances of code mixing. Children can mix and switch between two or more languages. Children acquire more than one set of codes based on different situations at their surroundings. Acquisition of multiple set of codes is observed in both rural and urban children of India. In reality, absolutely no child will be found as an ideal monolingual in this country. Children regularly get access to multiple codes through school, market, television and playgrounds. In fact, it can be argued that every child is bilingual or multilingual as default. It can be stated that children use one set of grammar and borrow linguistic items from other known languages. It is also possible to claim that instead of simply borrow from a language, children utilize the structures and lexical items of two or more languages and to use mixed codes. As Tom Roeper (1999) has pointed out, there is a ‘Mini Grammar’ inside every child’s head. Thus, every child is bi/multilingual.

There is a need to understand the nature of this bi/multilingual grammar of children. We can assume that there is a multilingual grammar inside ever child’s head. There is an obvious question related to the assumption: how are the different codes arranged inside the head. (Like different emotions were arranged inside Riley’s head in the Disney movie ‘Inside Out’) There are different possibilities. We can argue that there are different slots for different languages in our mental grammar (Universal Grammar, to put in a Chomskyan way). As children modify the building blocks of languages (or features), different set of codes are obtained and the codes are mixed often.

If one observes the playground talk by children, it will be clear that during play children use lot of mixed codes. In reality, code mixing is a strategy for negotiation during play. A detailed understanding of the code mixing in child language can be obtained through playground talk.

Why do children negotiate at playground? How does the negotiation process use code mixing? One important answer, perhaps, is that children mix codes to assert certain identities and deny certain identities while interacting with other children.

Code mixing has a direct relationship with language variation. Children use codes that are variants of certain linguistic items. For example, a rural child uses variants from his/her home language and the regional standard (the so-called ‘prestige language’). The same child also uses a variant from the link language (or language of the marketplace of a village). There are continuous switching and mixing utilizing these three sets of codes or three variants of a same linguistic item.

The following sentence has been uttered by a Rajbanshi speaking child from northern part of Bengal, in India:

  1. EkTa           haS     khacche                  murgiTa           dekhtese

‘One duck is eating and  a hen is watching that’

The sentence above has two verbs. The first verb ‘khacche’ (eats) uses Bangla verb inflection –cche. The second verb ‘dekhtese’ (watches) uses inflection –ese in an inflection which is neither from their home language nor from the regional language. But children are mixing two sets of codes in a single sentence.


  1. Ek hate noukaTi nise ar arek hate ghuRiTa niye dekhche

‘(He/she) has taken the boar in one hand and a kite on the other hand’

The first verb ‘nise’ (has taken) is a so-called non-prestigious verbal form. The second verb ‘dekhcche’ (watching), on the contrary, is used from regional standard.

Negotiation and assertion of identities through playground talk represents instances from a larger domain. It can be assumed that different set of codes are representation of different identities. Thus, when rural children want to identify themselves with a teacher from a city, they tend to use codes from so-called prestigious varieties. When children want to play among close-knit group members, the language use tend to focus on the home language.

The teachers in rural schools (also in urban schools, but I am focusing rural school for the present purpose) are often not aware of this default multilingual nature of the children’s mental grammar. The teacher mostly assumes that children primarily use the regional standard and their home variety (which is a less prestigious form and thus cannot be used in schools). The fact that children naturally mix codes very often in day to day conversation is not considered by many teachers.  So, teachers do not utilize the multilingual codes for classroom tasks.

Apart from that, there is an understanding from the teachers’ side: children should always use one language in classroom. There is a misconception that mixing codes or utilizing multilingual codes can be cognitively ‘bad’ for children. According to Perez and Nordlande (2004): ‘when children switch between or mix their two languages, it may seem that the children do not have good skills in their either language’. But Cummins (2008) has mentioned that multilingual children are cognitively more demanding. It has been found that children naturally tap linguistic resources, using rules and vocabulary from both the languages (Genesee, Paradis and Cargo, 2004). Ironically, the potential for using multilingual codes or utilizing children’s mixed code utterances is not considered as doable task for regular classroom.

There are possibilities of using code mixing utterances of children as resource of the classroom. Recorded peer talk narrative comprising different codes can be used to design activities based on various skills: e.g. listen to the text and answer/discuss. Spontaneous storytelling and retelling, describing an event, pretend play tasks can be designed by teachers. Theatre activities using code mixing can also be done by allowing children to create dialogues using code mixed grammar.

The use of default code mixed constructions of children in classroom has benefits. As the actual utterances of children are the target texts for various uses in classroom, no  imposition of ‘ideal’ text can be feared from these situations. In other words, using code mixed grammar or default grammar of children in classroom can lead to joyful learning experience for the children too.

How Do We Characterize Code-mixing?

Gayatri Bhat, Microsoft Research India

If you are a frequent reader of this blog, you have a fair idea of what code-mixing is. In case you aren’t, it is the practice of going back and forth between two languages in the course of just one ek hi conversation, as jaise I’m doing right now abhi.

Here’s a curious thing about code mixing. Most people seem to agree that you cannot arbitrarily alternate between languages while uttering a sentence. For instance, if you speak both Hindi and English with a co-worker, you might tell him,

Office aane ke raste main I fell into a basket of machhli.

(On the way to office, I fell into a basket of fish.)

But you definitely will not say –

Office aane ke on the way I giri into a basket of fish.

It just sounds odd.

So, we might say that there are rules for code-mixing. In that case, what are they? Must code-mixers know all the rules? People who code-mix usually do so easily, without speaking slowly so that they can decide when to switch languages and definitely without trying to check whether they’re sticking to the rules. It turns out that unlike, say, writing sonnets, code-mixing is one of those things you can accomplish without consciously knowing the rules you’re using to do it.

There are people though, who are still trying to figure out the rules for code-switching, some because they’re just curious, others because they’re trying to teach computers how to participate in a code-mixed conversation (Machines don’t seem to think code-mixing is any easier than writing sonnets. Tougher, perhaps.) The frustrating bit is that nobody seems to be coming up with the correct rules. For every rule that’s made, there’s a perfectly good code-mixed sentence that violates it.

One major dispute is regarding the roles of the two (or more, but for now, let’s take two) languages being mixed. Some say that one language is in charge and only lets the other peek in here and there, while others maintain that the two languages are equal partners. This is an important debate, because it determines what sort of rules we’re looking for.

Consider the first alternative – Every sentence is originally in a single language (the superhero, or the matrix language). While code-mixing, we essentially pull out clumps of one or more words from this sentence and plug in fragments from the other language (the sidekick, or the embedded language). A fragment might have fewer or more words than the clump it replaces, and might be ordered differently, but always conveys the same information as the original clump. One may not, of course, pull out bits of these sidekick-clumps and replace them with hero-clumps. The catch, though, is that one cannot do this exercise with any group of words one fancies. Take, for instance, the sentence –

Mere kurte pe maine doodh gira diya.

(I spilt milk on my kurta.)

English-Hindi code-mixers might swap ‘mere kurte pe’ out in favour of its English counterpart –

On my kurta maine doodh gira diya.

However, one will not do this with ‘pe‘ to say –

Mere kurta on maine doodh gira diya.

In this paradigm, the matrix-embedding model, the ‘rules’ for code-switching would indicate what sorts of word-groups one can swap out. The example above illustrates a couple of rules suggested in this paper, which say that it is alright to ‘swap’ or ’embed’ a noun phrase (‘mere kurte pe’), but not a lone postposition (‘pe‘). We should note here that not being able to swap postpositions does not mean that you will never encounter a Hindi postposition in an English-hero sentence. It only means that any Hindi postposition in the sentence was swapped in as part of a particular sort of group, perhaps a noun phrase.

The other idea, which is based on both languages being equal partners, goes like this – To start off with, you have two copies of the same sentence, one in each of two languages. In order to code-mix, you start off with a slice of one of these sentences. Now place a slice of the other sentence next to it. Now another of the first. And so on, until you’ve got a code-mixed sentence that says the same thing as either of the initial single-language sentences.

A simple example in Hindi and English again. You’ve got these two –

Agar main kahoon, mujhe tumse mohobat hai, meri bas yehi chaahat hai, toh kya kahogi?

If I say I am in love with you, that this is my only wish, then what will you say?

We slice and layer to come up with –

Agar main kahoon, I am in love with you, meri bas yehi chaahat hai, toh what will you say?

This model proves a lot trickier to use than the first one. (Check it out here) The ‘rules’ here must ensure that the code-mixed sentence doesn’t include the same fragment twice, once in each language. They also mustn’t allow words that were next to each other in the original sentence to be at opposite ends of the new one, just because we sliced the sentence right between these two words. We need rules to check whether every part of the code-mixed sentence sounds grammatical according to at least one of the two languages, and whether… oh, all sorts of things, far too many things.

Definitely not something one could work out in one’s head while talking at normal speed, right? 😉

Pronunciation Modeling for Code Mixing

Sunayana Sitaram, Microsoft Research India

Have you ever wanted to have your texts and WhatsApp messages read out to you? Have you ever used a foreign word while using a system like Cortana, only to find that it does not recognize words that are not in the language it is expecting to hear? Speech Recognition and Synthesis of code-mixed utterances is a very challenging problem. Most speech processing systems are designed to be used with a single language. Moreover, people may pronounce words differently when they are speaking multiple languages at the same time, which may confuse such systems.

Let us look at the problem of reading out a recipe on a popular Hindi recipe website Nishamadhulika. Here’s the link to the recipe, if you want to take a look http://nishamadhulika.com/1064-creamy-mushroom-soup-recipe.html

Now as you can see, most of the text in the recipe description is in Hindi, written in the native script (Devanagari). This should be fairly easy for a Hindi Text to Speech system to read out to the user. However, we see some English words in the title, and also numbers in the Roman script to denote quantities.  If you scroll down to the comments, you see that many of the comments are in Hindi, but are not written in native script. Let us look at a couple of comments.

“bahut yammi recipe thi nisha ji ye soup mere baby ne jo ki 15 month ka hai bahut shok se piya hai”

“Nisha ji musroom soup bht acha bna h.mje cooking bhi bht achi lgti h.bus ye btao is e without cream healthi kaise bnaya ja skta h ans jrur dena”

We find that there are many English words in these sentences (“soup”, “yammi”, which is “yummy”, “15 month”, “baby”, “cooking” etc.). We also find that users don’t always follow a standard way of transliterating Hindi into Romanized script. For example, in the first sentence, the word “बहुत” is written as “bahut”, while in the second one, it is shortened to “bht”. Similarly, the word “है” is written as “hai” in the first comment, and only as “h” in the second one!

Now imagine if you are a Text to Speech system and you need to read out such text! You need to identify what languages the words are in, rectify spelling mistakes, expand contractions and then figure out how you are actually going to pronounce the word. This is made even harder by the fact that the training data for most Text to Speech systems today only consists of single language, clean, well-written data.

In a future post, we will talk more about how we make Text to Speech systems capable of synthesizing mixed language text. Meanwhile, you can read this paper:

‘Speech Synthesis of Code Mixed Text’, Sunayana Sitaram and Alan W Black, in Proceedings of LREC 2016, Portoroz, Slovenia

Word appropriation: To be, or not to be… formalized?

Andrew Cross, Microsoft Research India

English-adapted words, especially around technology use, are increasingly common in other languages. For instance, to tweet in Spanish is often called “tuitear”, taking the original English word and adding a Spanish grammatical ending. Similarly, “le hardware” or “le software” are used in French to describe the rather obvious English-counterparts (for other interesting Franglais phrases, check out an amusing list here). Some words, like “computer”, “bus”, or “phone/mobile” are almost universally understood around the world.

While widespread adoption of these words gives a certain uniformity and intelligibility to global conversations, there are those who lament this trend and think it undermines the original language and therefore culture. Language institutions like the Academie Française or the Real Academia Española regularly wrestle with what words to embrace from other languages, versus promoting more local renderings of the same idea (one example the director of the Real Academia Española gives is his preference to use “auto-photo” instead of “selfie”). One clear goal of defining a unified dictionary of a language as geographically dispersed as Spanish, a majority language in over 20 countries, is not only to protect the language from being infiltrated by outside influence, but also to build an identity and cultural unity for speakers and countries that use the language.

And so emerges a funny paradox that is by no means limited to the human interpretation of “language” – on the one hand you have an organic blend and evolution of language through increasing global travel, business, and media. On the other, you have a need or desire to canonize certain aspects of language both for utility (one needs to be understood), and for preserving a certain culture associated with a language. At one extreme, wholesale adoption of outside languages could lead to the ultimate demise of a language. But at the other extreme, the outright rejection of any word deemed “foreign” undermines the very nature of language dynamics.

Which brings the conversation back to technology. The global world is much more connected which presents more opportunities for languages to interact and evolve. With the near immediacy for interchange available through the internet, one can expect many of these new blends and linguistic evolutions to brew locally, but make their international debut online. How will this debate play out as words like “selfie” or “friend request” or “email” become increasingly common in online forums? Perhaps more importantly for bodies governing the words that are officially part of a language, can (or should) such standardizing efforts keep up with the rapid spread of foreign words in the new era of the internet?