One of my favorite movie franchises is Jurrasic Park. I love the movies (both Jurrasic Park and Jurrasic World). In the first movie, they introduce how they were able to recreate dinosaur DNA. In the beloved explainer video, Mr. DNA explains that DNA recovered from prehistoric mosquitos was used and subsequently gene sequencing was done to find similarities. However, it was incomplete and gaps were found in the sequence and had to be replaced with sequences of frogs. I won’t go into the errors of all of this but suffice it to say that the idea of gene sequencing is a a very real analysis that is done all the time.
This type of analysis is gaining traction in historical linguistics where some linguists understand words, grammar, phonemes…etc hold information just like a DNA sequence. For example, the words “tea” and “eat” both contain the same phonemes but when arranged differently the information changes completely. The same is true with morphemes, “walk” and “walked” are two similar concepts but when the morpheme “ed” is added to the word walked a different concept is conveyed.
By using sequencing algorithms we can start to reconstruct past languages. The table below could use sequence analysis to compare the romance language word for “goat.” The sequence analysis show that there are more vowel shifts in French than in Italian for example. In French the vowel ‘c’ shifted to a ‘ch’ and the ‘a’ shifted to an ‘e’.
The Indo-European languages are well studied and sound shifts are well documented. What happens when a language is not as well documented? Sequencing can still be valuable. Sequencing can still identify patterns and where data is not known it simply puts a gap character in to inform that an unknown change has occurred but that it fits the pattern. This informs us that further study is needed on that particular word.
Italian | Spanish | Portugues | French | Latin |
Capra | Cabra | Cabra | Chevre | Capra |
Some would think that this is the end and a manual process is needed to find more answers. However, with the advent of artificial intelligence, it doesn’t have to be. My studies involve using Neural Networks to fill in the gaps the sequencing algorithm has produced. This involves a detailed analysis of how phonemes work in that particular language and applying advanced algorthims and pattern matching to fill in the gaps.