Every language has one. The kind of hot thing that rolls off a native tongue all sweet, but presses into your own ear jagged, curling your hair and making your skin itch. Some kind of clitic that ends a party, a string of morphemes that’s chin music. An optative or an elative come out and you wish you could get the hell out. Even meek language learners feel a savageness when the strangeness comes around.

    How much is lost in translation when we try to process only in English? Perhaps 90% of academic and commercial Natural Language Processing has focused only on English. If you are trying to find broad topics this might not matter, but if you are trying to identify all the subtle (or not so subtle) metaphors, sentiment and emotion, translating into English will often strip away the very phenomena you are most interested in.

    Translation Literal meaning Machine translation
    L’étoffe dont sont faits les rêves. “The material of which are made the dreams”. “The stuff that dreams are made of”
    Dolgok, amikről álmodunk. “things, about which we dream”. “Things I dream of.”
    Rüyalarin yapildigi maddeden. “dreams’ were-made material-of”. “Material dreams are made.”
    Translations for “The stuff that dreams are made of” in French, Hungarian and Turkish

    In the examples above, how much of the full impact “the stuff that dreams are made of” is lost in translation? Only French machine translation turns it back into the correct English, but we suspect that this is because it knows the famous quote. Imagine the full range of expressions in English that would lose their punch when translated: “bare your heart”, “give up”, “beside yourself”, and realize that every single one of the world’s languages has an equally rich set of expressions and idioms that cannot be adequately translated, by humans or machines.

    This is why we need intelligent Natural Language Processing that works within each language, not just with translations: it is often the most emotionally charged expressions that cannot be translated.

    For this post we’ll break down this one example, taken from the most famous line from The Maltese Falcon. We choose this among all possible idioms or expressions for reasons close to our hearts: a month or so ago we moved into our new offices five floors above where the author Dashiell Hammett worked as a private eye.

    The Maltese Falcon

    A police detective picks up the Maltese Falcon statue and notes how heavy it is. “What is it?” he asks Sam Spade.

    The, uh, stuff that dreams are made of.

    Let’s take the lid off and see the works. We’re going to use the translations that actually appear in subtitles, courtesy of OpenSubtitles via Jörg Tiedemann’s OPUS corpus. (None of them choose to translate the uh, which is a bit sad since it’s one of the stronger stylistic markers). The machine translations are from a well-known search engine.


    We’ll start with a pretty easy one. French is a broadly spoken language and since it is related to other widely spoken languages like Spanish and Portuguese, odds are that it won’t be all that foreign to you.

    L’étoffe dont sont faits les rêves.

    This is something like “The material of which are made the dreams”.

    The Maltese Falcon in French (Le Faucon Maltais)

    The word étoffe in French means ‘material’. It’s a feminine noun, which you might guess from the final –e (though that’s not really a sure-fire indication). Normally, you could tell based on the article, but since the word starts with a vowel you turn la just to l’. (The idiom il manque d’étoffe means ‘he lacks personality’, btw.)

    Gender systems are pretty common around the world, not just in Indo-European languages. For example, Bantu languages across Africa have lots of genders—often between 7-10. What gender means for language learners and computational linguists is that we have to pay attention to a noun’s classification in order to know how to do stuff with it (like pluralize it) and how to handle agreement with other words like adjectives and verbs. In general, the more genders a language has, the more word forms there are that correspond to what we might want to call “the same” word.

    Let’s press on. The dont is a ‘relative pronoun’ that indicates possession, so it could be translated as ‘of which’, ‘from which’.

    The verbal ‘are made’ meaning is found in sont faits. The first of those words is the third-person plural present tense for ‘to be’ . Faits is from the verb faire, ‘to do’. They agree in plurality—if we were talking about the stuff that a dream was made of, we’d have est fait. In language-after-language, the verbs ‘to be’ and ‘to do’ are painfully irregular. Well, painful for the language learner. If you’re a native English speaker, when was the last time you said I am’ed or he do’ed? Frequency helps you learn (and it helps the form escape the grinding power of regularization).

    Finally, les rêves are ‘the dreams’ (the singular is le rêve). That’s pretty straight-forward, so I won’t say anything more about it.


    In Hungarian, the line is a bit more like “things, about which we dream”.

    Dolgok, amikről álmodunk.

    The Maltese Falcon in Hungarian (A máltai sólyom)

    The word for ‘thing’ in Hungarian is dolog. But if you want to pluralize it, you don’t get to just add a letter at the end. Instead, you have to flip some stuff around: dolgok. There’s some fun linguistic processes at work here, so let me know in the comments if you’re interested.

    Ami is the way you say ‘which’ and the k in the middle is like the k at the end of dolgok, a marker of plurality. Now, about the ending: Hungarian has a nearly-limitless supply of affixes. You add ről to indicate ‘off’ or ‘about’. Check out this link to go have your mind boggled by the major noun cases: http://www.hungarianreference.com/Nouns/. (A “case” is basically one way that a language might keep track of which words are related to which other words in what kinds of ways—for example, a nominative case marker roughly means something is the subject of the sentence and an accusative roughly means something is the object of a sentence.)

    Most of the case suffixes have two forms. That’s because Hungarian has what’s called “vowel harmony”. Vowel harmony is the phonetic equivalent of “don’t wear stripes with leopard prints”. It means that you need to make the vowel in a suffix match with the noun’s last vowel. But by “match”, I don’t mean “be identical to”. Open up your mouth and say a bunch of vowels a few times—you’ll notice that some of them happen in the front of your mouth and some of them in the back. That’s what matters in Hungarian. Other languages harmonize other things, sometimes at quite some distance (meaning that there are other consonants and vowels that may intervene in between the two things that depend upon each other).

    The verb álmodik is ‘to have a dream’, but you have to conjugate it. The form álmodunk is for ‘we dream’…except that Hungarians like to mess with your mind so there are actually two different ways to say ‘we dream’. The –unk ending indicates that there’s no definite object that the verb is about. Otherwise, if you wanted to say we dreamed some particular dream, then you’d need to use the –juk ending.


    In Turkish, the line is something like “dreams’ were-made material-of”.

    Rüyalarin yapildigi maddeden.

    The Maltese Falcon (if you know of a Turkish movie poster let me know)

    For this, I’ll break it down word by word:

    • Rüyalarin
      • Rüya is ‘dream’
      • –lar is the plural
      • –in means that the dream owns something (‘defined genitive case’)
    • yapildigi
      • yap is a root of the base verb (yapmak), ‘to make, to do’
      • –il is the passive
      • –di is the past tense
      • –gi…oh, gi. I’m going to talk about gi in a moment.
    • maddeden
      • madde is ‘material, substance’
      • –den is ‘of’ (though it is also sometimes ‘to move away from, by, via’)

    Okay, you know how you hear people using impact as a verb (it used to just be a noun). Languages have all sorts of ways to change parts of speech. Sometimes you just take a word and leave it as-is (like impact), but other processes work, too (noun-ify is a verb from a noun, noun-y is an adjective from a noun, nouniness is a noun from an adjective from a noun).

    In Turkish, the –gi turns a verb into an adjective. In this case, that lets it get tied to a noun. You can’t just use –gi willy-nilly, though. You can only use it with some conjugations. (Fwiw, if you drop the noun that the adjectivized verb is modifying, then you can use it as a noun instead and keep on appending affixes.)

    Turkish is also a great example of why ‘keyword’ based Natural Language Processing is not sufficient in many languages, as most of the action is happening within the words, but we’ll leave more about suffixes and prefixes for another post.

    One of the reasons this Turkish translation is good is because it evokes the standard Turkish translation of Shakespeare. Part of what you might hear in Sam Spade’s line is from The Tempest: “Leave not our rack behind. We are such stuff / As dreams are made on; and our little life / Is rounded with a sleep”. In Turkish, the middle part is ruyalarin yapildigi maddeden yapilmayiz biz…”, so the subtitle gets to evoke it for Turkish speakers, too.

    Now that you’ve been vexed on your tongue and troubled in your brain, we’ll sign off. Go still your beating mind.

    – Tyler Schnoebelen (@TSchnoebelen)

    ps–Thanks very much to Bence Farkas and Ali Alpay for their help!

    pps–The line we’ve worked on here is probably one of the most famous in film noir…but it actually doesn’t appear in Dashiell Hammett’s story.

      Tyler Schnoebelen

      Tyler finds the patterns in data that make it meaningful. He has ten years of experience in UX design/research in Silicon Valley and a PhD from Stanford. His work there included experimental psycholinguistics, fieldwork on endangered languages, and a dissertation on emotion (he got his BA at Yale studying playwriting and poetry). His insights on social media have been featured in The New York Times Magazine, The Boston Globe, The Atlantic, and NPR. He is incorrigible.

      15 thoughts on “The Multilingual Falcon

      1. “il manqué d’étoffe” is incorrect. You mean “il manquait d’étoffe”, or “il manque d’étoffe”, don’t you?

      2. As a Turkish speaker, what I remember from Turkish grammar classes is that we consider -digi as a separate suffix that noun-ifies the verb before it. As far as I can remember, you cannot use -gi with any other conjugation. Hence, the suffix cannot be broken further than -digi.
        Proof of Concept:
        are all meaningless in Turkish, at the very least in daily life. I may be incorrect, but I just wanted to share my opinion as accurately as I can recall.

      3. Charles Wells says:

        Shouldn’t the “i”‘s in “yapildigi ” be dotless? I spent 18 months in Turkey 55 years ago so I am not an expert.

      4. Thank you for the translations and explanations.

        About “étoffe” in “il manque d’étoffe” ‘he lacks personality’: it is difficult to gloss or translate “étoffe” literally in a metaphorical context. The verb “étoffer” might give additional clues as it means something like “to fill out” or “stretch out”. For instance, a text consisting of one paragraph (like an abstract) would need to be “étoffé” with argumentation and examples in order to make it into an article. So a person lacking “étoffe” is probably not capable of filling a role more demanding than the one he is in now. Another phrase with this word is “il a (or “il n’a pas”) l’étoffe d’un (président, général, etc)” ‘he has (or “doesn’t have)” the stuff (presidents/generals etc) are made of’, the required breadth and depth of a (presidential, etc) personality.

        Another note: I have always felt that the French translation of the title, “Le faucon maltais”, is not right. I would have used “Le faucon de Malte”, especially since the jewelled bird is supposed to originate with “Les chevaliers de Malte”, the Knights of Malta (a phrase which is a calque of an Italian or French original). “Le faucon maltais” seems to refer to an actual bird species typical of Malta.

      5. That’s correct. But then again, the correct form would’ve been “Rüyaların yapıldığı maddeden.” if they had used Turkish characters.

      6. In the German version, “Ein Stoff, aus dem man Träume macht.”
        the literal translation:
        “a material [from/out of] which one dreams makes.”

      7. Stumbled upon this via reddit, thank you, really lovely.

        I thought that there was a little problem with the turkish translation here. That is, if the dialogue was like «What is this? / The, uh, stuff that dreams are made of.» the correct and exactly correspondent translation would be «Bu nedir? / Eee, rüyaların yapıldığı madde».

        Also, this is the correct way to break down the sentence:

        * rüya -> noun of arabic origin
        + -ler -> affix of plurality, becomes -lar because of vowel harmony
        + -in -> affix of possession in a noun phrase

        * yap- -> verb stem, from «yapmak»
        + -ıl -> affix of passive voice, the action was carried on the subject
        + -dık/ğ -> sort of a «past participle» [1]
        + -i -> affix of compliment noun in a noun phrase [2]

        * madde -> noun of arabic origin in nominative form. Nominative implies definiteness in Turkish.
        + -dir -> from auxiliary verb -imek, allows using nouns as verbs: Maddedir -> It is the stuff.

        [1] like «bildik (known)», «söylenmedik laf kalmadı (we are out of words)». «k» morphs into «ğ (voiced velar fricative)» before an affix beginning with a vowel

        [2] -i becomes -ı because of vowel harmony.

      Leave a Reply

      Your email address will not be published. Required fields are marked *

      You may use these HTML tags and attributes:

      <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>