Saturday, September 14, 2013

Sisters & Cousins & Aunts: Language Families for Fantasy Writers

Last year, I posted Fantasy Languages for Dummies here, where I outlined some of the basic issues to think about when inventing words and names for an imaginary language.  Judging from the number of hits, it seemed to be something that interested a lot of people, and I thought I'd try something a bit more advanced, for those who want a little more out of their fantasy languages.

Most fantasy writers who create a secondary world invent imaginary languages to some extent, even if it's only a handful of names.  Some, of course, stick strictly to real-world languages, but most have something invented.  In many cases, these don't offer any consistent sense of phonology or morphology, but there's usually a hint of it, even if it's only the Burroughs Universal Constant (that, wherever you go in the universe, female names always end in a).

Writers who take a genuine interest in language and naming, though, might put a lot more thought into the matter, creating names that follow similar linguistic forms when they come from the same culture and distinct forms when they don't.  Well and good; but few fantasy writers (unless they happen to be linguistically orientated professors of English from Oxford) seem to consider how their various languages relate to one another.

Languages don't exist in isolation.  Well, OK, some do, like Basque or Burushaski, but they're exceptions.  We'll come to that later.  Most languages, though, are grouped into a hierarchy of families, super-families, super-super-families etc. in structures so similar to biological taxonomy that the same terms are often used, though not consistently.  English, for instance, is a Low German language, belonging to the West Germanic division of the Germanic family, part of the great Indo-European phylum.

What exactly does that mean, though?  Everyone knows that English is essentially a rag-bag language that contains French, Latin, Celtic and Greek, as well as words from almost every part of the world the British Empire ever came into contact with.

There's an easy experiment that can show what the classifications mean.  Well, easy to imagine and explain: not quite so easy to do.  Take an average passage of English (not too scholarly or technical, not too monosyllabic) and list every word in it.  Then look up the origins of those words (a good dictionary would give that) and count how many words derive from each source.

What you're likely to find is that the great majority are either Germanic, whether from Anglo-Saxon or Scandinavian, or Latin, whether directly or via French.  There'll be a smattering of Celtic and Greek, together with odd words from further afield.  The balance of Latin and Germanic will probably be roughly even, maybe with more Latin words. 

So why isn't English counted as a Latin language?

Now repeat the experiment, but counting each word each time it occurs, and you'll find a dramatic change, with the vast majority derived directly from Anglo-Saxon.  This is because the Anglo-Saxon vocabulary of English includes the most common words, repeated over and over: the, and, to, for and so on.

This is what linguists mean when they classify a language as belonging to a family, and it's easy enough to see why.  Vocabulary changes all the time in a language.  It's not hard to imagine foreign words for, say, house or table becoming fashionable and eventually replacing the original words, but why would anyone create a different word for the?  These words do change, of course, but very slowly and usually only in ways that can be easily followed.  That makes them a breadcrumb trail back to where the language has come from.

Languages drift apart when two groups of speakers don't often interact.  They borrow different foreign words and coin different words for new concepts, but their pronunciation often diverges quite radically.  Pronunciation is changing all the time.  It's true what the old people say: youngsters don't speak the way we did at their age.  Of course not; and, over the centuries, it can become completely different, till a linguist comes along and explains how it's really all related.

There's a saying I remember hearing long ago: when comparing languages, vowels count for nothing, and consonants for very little.  That's not quite true, but you have to understand how the different letters are formed to understand why one can change into another.  One of the most famous sets of changes is known as Grimm's Law (yes those Grimms — they were primarily linguists) which explains how the Germanic languages differ from other Indo-European groups.  For instance, the English word foot is actually the same word as the Latin equivalent, whose stem is ped (as in pedal).  Under Grimm's Law, p mutates into the related sound f, d into the related sound t, and the vowel just mutates.

It goes further.  During the 1st millennium AD, the Western Germanic dialects divided into High German and Low German (that's not like High Elvish, by the way, just a matter of whether they were spoken on the upper or lower Rhine) and foot in Low German (which includes English and Dutch) became fuss (the vowel's pronounced much the same) in High German, which is modern German.  Similarly, in many cases d again turned into t — door/tur, deer/tier etc.

These might seem small changes but, with large numbers of them going on over thousands of years, together with word replacement, languages that started almost the same can change beyond recognition.  In a context without a strong, centralised political or cultural structure, this will take a form where the language spoken in village A and village B diverge a little, although not too much for them to understand one another.  Similarly, the villagers in B won't have too much trouble understanding the speech in village C, but those in A might struggle a bit.  By the time you reach village Z, there appears very little similarity at all.

If, on the other hand, there's a strong state that needs to issue laws and proclamations that will be understood, or if authors and poets are writing works that are understood to be expressions of the entire culture, standard forms will gradually emerge that become national languages.  These often differ little from the language next door, but their speakers like to think of them as different for political or religious reasons: Dutch/Flemish, Serbian/Croatian, Hindi/Urdu, Malay/Indonesian etc.  Curiously, with English and American, which are almost as distinct as some of those pairs, the political separation seems to have fostered a sense of being a common language, instead.

Many of the world's languages are grouped together into widespread super-families — examples include Indo-European, Afro-Asiatic, Altaic, Sino-Tibetan, Austronesian and Niger-Congo, along with many smaller families.  In the New World, the old picture of a dozen or so families has been brought down (though with disagreement from many linguists) to three, with the Amerind family covering all of South and Central America, much of the contiguous US and eastern and central Canada.

The temptation is always to try to link up language families into fewer and larger super-families, but that's not always easy.  Indo-European is a relatively straightforward case, and even that produces controversy and disagreement. 

It's a special case for two reasons.  One is that it's a relatively young family.  Although there are different models for its origin, it's probable that it descends from a group of mutually comprehensible dialects spoken between four and five thousand years ago.  The other is that it includes languages (such as Greek, Latin and Sanskrit) that were extensively written more than half that time ago, and some, such as Hittite, that have left written records from when the family was fairly young.  This makes it quite easy to establish what it was that all these languages are descended from and so identify other members of the family.

On the other hand, many language families are significantly older, and their languages have sometimes only been written down for the first time within the last few centuries.  This makes identifying their relationships even more difficult, especially when they're isolated remnants of families that have largely been replaced by a later wave, a process that's still going on, especially in areas of the world that were colonised by Europeans.  Isolated languages like Basque in the Pyrenees, Burushaski in Kashmir, or the so-called Paleo-Siberian languages may not have had any interaction with any "relatives" that might survive for ten thousand years or more.

Some languages are more conservative than others (Lithuanian, for instance, is often taken as the closest we can get to how the original Indo-European language might have been) but ultimately they all change, mutating their sounds and replacing their vocabulary with foreign imports.  There's a point beyond which current techniques just aren't good enough to detect whether two languages are related or not.

Besides, associations can be as important as "descent", as we saw with the huge quantity of imported vocabulary in English.  Sometimes these influences are greater, affecting even the structure and the stable words used by linguists.  For a long time, there was a controversy over whether Vietnamese was a Kadai language, like its neighbour Lao, or an Austroasiatic language, like its neighbour Khmer.  Most linguists have plumped for the latter, but the language is very much a hybrid.

Japanese is an even stranger case.  The jury's still out here over whether it's an Altaic language that arrived via Korea or an Austronesian language that arrived via the Philippines.  It displays elements of both.

So it's unlikely that we'll ever know for sure whether all human languages ultimately derive from the same source, or if they arose independently in many parts of the world.  The human brain appears to be hard-wired for language, so the multiple invention theory is quite plausible.  It's likely to come down to when our ancestors started talking.  Until quite recently (the idea appears in Pullman's His Dark Materials trilogy) it was believed that language started with a quantum leap in human intellect as recently as 33,000 years ago, at a time when the species was already spread all over the planet, and that would have favoured the multiple origin of language.

Recently, though, both circumstantial evidence of earlier symbolic thought and physical evidence of a speech-enhancing mutation of the larynx have suggested that humans were using language a great deal earlier, probably at a time when they were confined to a relatively small area of Africa.  Although this doesn't prove the single-origin theory, it makes it much more likely.

Then again, maybe they were all taught by Quenya-speaking elves.  It's possible.

So what bearing does any of this have on writing fantasy?  Unless you're going to actually create a whole raft of languages, and then treat your readers to a lecture about them, does it really matter what families they belong to?

Well, yes and no.  It's very much a background aspect, which the reader's unlikely to notice (though it may be far more noticeable if it's ignored or poorly done) but it can contribute to that seamless gloss of reality that the best fantasy worlds somehow achieve.  World-making always reminds me of a swan: seeing it gliding gracefully and effortlessly across the water gives no idea at all of how its legs are going nineteen to the dozen underwater to achieve that impression.

The sounds and elements that make up names can give away subtle relationships between their countries' languages.  Suppose, for instance, several towns in your main country have names ending in -ket — perhaps this is the equivalent of the English element -ton.  Another country (three or four across, perhaps) has a town whose name ends in -gad.  This could suggest the same kind of sound-shift as those described by Grimm's Law, indicating the two countries have related languages, but not closely related.

It can also affect the difficulty characters encounter when learning to speak foreign languages.  It tends to be easier to learn a language if we can latch onto familiar elements and harder if nothing's alike.  I have a scene in an unpublished novel where a character who's good with languages is trying to learn to speak to the people he's staying among.  He comments that it's easier to learn than many, observing that Some of the words seem a bit like Kimdyran.  Like, they say duvin for a cow, and we say tovien.  On the other hand, he tries and totally fails later to learn another, very strange language.  Besides giving an element of his character, it helps to define the relationships between languages, and therefore cultures, within his world.

On the principle that "messy worlds rule", this doesn't have to be too predictable.  In our world, even before languages like English, Spanish and French went global, some language families were extremely far flung — Austronesian, for instance, is spoken all the way from Madagascar to Easter Island, and is mixed up in places with other families.  On the other hand, bordering countries may be linguistically unconnected, such as Hungary, forming a Uralic island in a sea of Indo-European Slavonic languages.

You can perfectly well ignore all this.  Most authors do, I suspect, and often still manage to produce worlds with believable names and cultures.  It can give an extra layer of reality, though, to think about how your languages relate to one another.  And, if you're like me, it's fun.  Which is the main thing, of course.


  1. Cool article. Still fine tuning the feel for language in my own world, so the lesson is helpful.

    "the Burroughs Universal Constant (that, wherever you go in the universe, female names always end in a)" Hadn't heard this term for it before, but it's always been something that makes me roll my eyes when I encounter fantasy or SF worlds where this is true. It's not like all female names end with a, even in English. Molly, Elizabeth, Allison/Alice, Constance/Connie, etc. It's easy to fall into, though. I have to do a search and destroy of feminine names with an "a" at the end in my own, even. At least my protagonist has a different sounding name.

    1. Of course, there's nothing wrong with some names (male or female) ending in a - just not all of them. I have some, but I try to vary it.

      The idea seems to be based on the first declension endings in Greek and Latin, but that's only the most common feminine form, and a lot of Roman male names follow the same pattern (Nerva, Cinna, Agrippa etc). Elsewhere, an a ending is just as likely to be male as female.

  2. Great post, Nyki, thank you!
    Re Japanese, I taught myself a little while my son was learning it for a couple of years. His teacher was Finnish, and there are some identical words in Finnish and Japanese, with not identical but similar meanings.
    Now that would have been some language route!

    1. Thanks. Actually, that's not so far out as it might seem. Finnish is a Uralic language, like Hungarian, and there is a theory (though not considered proven) that Uralic is related to the Altaic languages (Turkic, Mongolian et al) and that's one of the two families Japanese is sometimes considered to belong to. At minimum, both Japanese and Uralic seem to have been influenced by Altaic, so maybe words have migrated between them.