How Do We Characterize Code-mixing?

Gayatri Bhat, Microsoft Research India

If you are a frequent reader of this blog, you have a fair idea of what code-mixing is. In case you aren’t, it is the practice of going back and forth between two languages in the course of just one ek hi conversation, as jaise I’m doing right now abhi.

Here’s a curious thing about code mixing. Most people seem to agree that you cannot arbitrarily alternate between languages while uttering a sentence. For instance, if you speak both Hindi and English with a co-worker, you might tell him,

Office aane ke raste main I fell into a basket of machhli.

(On the way to office, I fell into a basket of fish.)

But you definitely will not say –

Office aane ke on the way I giri into a basket of fish.

It just sounds odd.

So, we might say that there are rules for code-mixing. In that case, what are they? Must code-mixers know all the rules? People who code-mix usually do so easily, without speaking slowly so that they can decide when to switch languages and definitely without trying to check whether they’re sticking to the rules. It turns out that unlike, say, writing sonnets, code-mixing is one of those things you can accomplish without consciously knowing the rules you’re using to do it.

There are people though, who are still trying to figure out the rules for code-switching, some because they’re just curious, others because they’re trying to teach computers how to participate in a code-mixed conversation (Machines don’t seem to think code-mixing is any easier than writing sonnets. Tougher, perhaps.) The frustrating bit is that nobody seems to be coming up with the correct rules. For every rule that’s made, there’s a perfectly good code-mixed sentence that violates it.

One major dispute is regarding the roles of the two (or more, but for now, let’s take two) languages being mixed. Some say that one language is in charge and only lets the other peek in here and there, while others maintain that the two languages are equal partners. This is an important debate, because it determines what sort of rules we’re looking for.

Consider the first alternative – Every sentence is originally in a single language (the superhero, or the matrix language). While code-mixing, we essentially pull out clumps of one or more words from this sentence and plug in fragments from the other language (the sidekick, or the embedded language). A fragment might have fewer or more words than the clump it replaces, and might be ordered differently, but always conveys the same information as the original clump. One may not, of course, pull out bits of these sidekick-clumps and replace them with hero-clumps. The catch, though, is that one cannot do this exercise with any group of words one fancies. Take, for instance, the sentence –

Mere kurte pe maine doodh gira diya.

(I spilt milk on my kurta.)

English-Hindi code-mixers might swap ‘mere kurte pe’ out in favour of its English counterpart –

On my kurta maine doodh gira diya.

However, one will not do this with ‘pe‘ to say –

Mere kurta on maine doodh gira diya.

In this paradigm, the matrix-embedding model, the ‘rules’ for code-switching would indicate what sorts of word-groups one can swap out. The example above illustrates a couple of rules suggested in this paper, which say that it is alright to ‘swap’ or ’embed’ a noun phrase (‘mere kurte pe’), but not a lone postposition (‘pe‘). We should note here that not being able to swap postpositions does not mean that you will never encounter a Hindi postposition in an English-hero sentence. It only means that any Hindi postposition in the sentence was swapped in as part of a particular sort of group, perhaps a noun phrase.

The other idea, which is based on both languages being equal partners, goes like this – To start off with, you have two copies of the same sentence, one in each of two languages. In order to code-mix, you start off with a slice of one of these sentences. Now place a slice of the other sentence next to it. Now another of the first. And so on, until you’ve got a code-mixed sentence that says the same thing as either of the initial single-language sentences.

A simple example in Hindi and English again. You’ve got these two –

Agar main kahoon, mujhe tumse mohobat hai, meri bas yehi chaahat hai, toh kya kahogi?

If I say I am in love with you, that this is my only wish, then what will you say?

We slice and layer to come up with –

Agar main kahoon, I am in love with you, meri bas yehi chaahat hai, toh what will you say?

This model proves a lot trickier to use than the first one. (Check it out here) The ‘rules’ here must ensure that the code-mixed sentence doesn’t include the same fragment twice, once in each language. They also mustn’t allow words that were next to each other in the original sentence to be at opposite ends of the new one, just because we sliced the sentence right between these two words. We need rules to check whether every part of the code-mixed sentence sounds grammatical according to at least one of the two languages, and whether… oh, all sorts of things, far too many things.

Definitely not something one could work out in one’s head while talking at normal speed, right? 😉


One comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s