The Voynich Manuscript is one of the most famous mysteries in the world. It’s a book from the 15th century, but no one has been able to identify what language it’s written in, or even what alphabet it uses. So many crazy theories have been proposed that one writer invented the Voynich Bullshit Index to score them. Of course, I haven’t solved the mystery, but I’ve spent a few weeks thinking about it over the last couple of years.
After weighing the evidence, it seems extremely likely that the Voynich is simply written in an unknown natural language, rather than a cipher, a code, or more exotic options listed by Wikipedia. The first major reason is that Voynich writing passes most known statistical tests for natural languages, such as Zipf’s Law. Since Zipf’s Law wasn’t discovered until the 20th century, it would have been impossible to deliberately fake. The second reason is prior probabilities: the number of manuscripts written in languages that now can’t be read (such as Etruscan or Linear A) is pretty large, while the number of manuscripts written entirely in ciphers is very small. The third major reason is one of information asymmetry. In 2015, cryptography is vitally important to the world economy; hence, we know far more about cryptography (and associated disciplines like steganography) than the ancients did. On the other hand, since lost languages are unimportant economically, very little is known about many of them; what information exists is usually locked up in obscure manuscripts, not available online; while a native speaker would obviously know their language well. The Voynich is very mysterious to us, but probably not mysterious to its writer, who (from handwriting analysis) is known to have written it quickly and fluidly; hence, the information asymmetry matches a natural language and does not match a code. This paper summarizes some of this evidence, and concludes from machine learning analysis that the Voynich is most likely an abjad, an alphabet without vowels (like Arabic or Hebrew).
My own best guess is that the Voynich is written in the Cuman language. To the best of my and Nick Pelling’s knowledge, no one has ever proposed this theory, which seems shocking considering the sheer extent of Voynich hypotheses (this long page just lists some of the more popular ones). Cuman is, by the standards of lost languages, quite well-understood; it has a Wikipedia page, substantial surviving literature, and it’s very clearly related to modern languages like Kazakh. It seems like a strong indicator that this class of theories is under-explored. If nobody’s thought of Cuman before, there are surely many other less-known languages that haven’t been looked at either.
Evidence in favor of Cuman:
– Cuman, unlike almost every language spoken in Europe, is non-Indo-European. (It’s related to Turkish, Mongolian, and Kazakh.) This would explain the Voynich’s lack of typical Indo-European language features.
– Cuman (like Turkish and Manchu, another proposed language) employs vowel harmony, which several have observed in the Voynich glyphs.
– The Voynich’s first known owner was Emperor Rudolf II, who was also king of Hungary. Cuman was spoken widely in Hungary in the early 1400s (per Wikipedia, the last known Cuman speakers died in Hungary in the 18th century). The Golden Horde used Cuman extensively, and it was spoken widely in the area they conquered (eastern Europe through central Asia) during the 14th century. Hence, it makes geographical sense in the relevant time period.
– Like many central Asian languages of the time, Cuman appears to have lacked a written script. We know the 14th century Church wrote dictionaries for translating it into Latin, to help convert the Cumans to Catholicism. Hence, it makes sense that a script would be invented for it.
– Computer analysis of the letter frequency distribution of the Voynich shows the best match to Voynichese is Moldavian (Moldavia, next to Hungary, was also home to Cumans), followed by two other languages of the former Golden Horde area (eastern Europe and what is now Kazakhstan).