Stylometry and Authorship Analysis: a Review of John Hilton III's Appearance on Keystone Podcast

The podcast conversation with John Hilton is engaging and, at times, genuinely thought-provoking. It brings computational language—“word prints,” statistical patterns, intertextual links—into a discussion that is usually framed in theological or historical terms. That alone gives it a certain intellectual appeal. But once you look closely at the methods being described, it becomes clear that the analysis sits in an in-between space: it borrows the vocabulary of quantitative disciplines without fully adopting their methodology. The result is a hybrid approach—part exploratory data analysis, part literary interpretation—that works well for generating interesting observations but struggles when it comes to supporting stronger claims.

What the Conversation Gets Right

At a basic level, the discussion highlights something uncontroversial but important: texts often exhibit internal variation. Different speakers, narrators, or sections can show distinct preferences in vocabulary, phrasing, and emphasis. That idea aligns with the core intuition behind Stylometry, where analysts look for consistent linguistic signals that differentiate one author from another. The use of tools like WordCruncher to isolate speakers and count word frequencies is a reasonable starting point. For example, noticing that one speaker disproportionately uses a phrase like “my soul delighteth,” or that another favors “God” over “Lord,” is the kind of pattern that might prompt further investigation. On its own, that doesn’t prove anything about authorship, but it does surface structure that casual reading would miss.

The intertextuality discussion is arguably the strongest part of the conversation. Examples like Alma reworking Abinadi’s phrasing, or Samuel the Lamanite shifting prophetic statements from future to present tense, are genuinely interesting from a literary perspective. They suggest a text that is internally referential, where later passages echo or reinterpret earlier ones. Another example mentioned is Moroni’s so-called “curtain call,” where multiple earlier voices appear to be woven together at the end of the text. Observations like these are not trivial—they point to patterns of reuse and transformation that, in other contexts, would be studied systematically. In fact, this is exactly the kind of structure that could be formalized using Network Science, where relationships between textual units are modeled as a graph.

Where the Methodology Falls Short

The problem is not that these observations are wrong; it’s that the analytical framework stops short of rigor. In strict stylometry, you don’t just identify patterns—you test them. That means defining features in advance, comparing them across controlled corpora, and evaluating whether the observed differences are statistically significant relative to some baseline. For example, if one speaker uses “Lord” more than “God,” you would want to know: how large is that difference, how stable is it across the text, and how often do similar differences appear in texts known to have a single author? Without that context, the observation remains descriptive rather than explanatory.

The same issue appears in the intertextual analysis. The examples are compelling, but they are selected after the fact. There is no clear rule for what counts as a meaningful overlap, no measurement of how rare a given phrase is, and no comparison to what you would expect by chance in a text with repetitive religious language. Phrases like “and it came to pass” or “repent ye” are so common that overlaps are inevitable. Even more distinctive phrases need to be evaluated against a baseline: how often do similar-length phrases recur in unrelated sections? Without that, it’s difficult to separate signal from noise.

There’s also a recurring logical leap. The conversation often moves from “this pattern exists” to “this would be very difficult to produce intentionally.” That’s a strong claim, and it requires evidence that isn’t really provided. To support it, you would need to show that comparable patterns do not arise in texts produced under simpler conditions—for example, in works by a single author who adopts multiple voices. Studies of authors like Charles Dickens have already shown that significant variation in character voice can emerge within a single work. The question isn’t whether variation exists—it’s how much variation, of what kind, and how it compares to known baselines.

What a Proper Network Analysis Would Look Like

This is where the discussion could have gone much further. Intertextuality, as described in the podcast, is essentially a network problem waiting to be formalized. Imagine constructing a graph where each node represents a speaker or passage, and edges represent shared phrases or thematic overlaps. Those edges could be weighted based on factors like phrase rarity or frequency. Once you have that structure, you can start asking quantitative questions.

For instance, you could measure clustering: do certain speakers preferentially reuse each other’s language? You could look at centrality: are there specific passages that act as hubs, heavily referenced across the text? You could analyze motif recurrence: do certain patterns of reuse appear more often than expected under random conditions? Most importantly, you could compare the observed network to a null model—a randomized version of the text that preserves basic properties like word frequency but removes intentional structure. If the real network shows significantly stronger clustering or more coherent pathways than the randomized one, then you have evidence that something non-random is happening.

Concrete examples make this clearer. Take the case of Alma and Abinadi. Instead of highlighting a handful of overlapping phrases, you would extract all shared n-grams above a certain length, weight them by rarity, and compute an overall similarity score. Then you would compare that score to similarity scores between unrelated pairs of speakers. If Alma–Abinadi similarity is unusually high relative to the baseline, that’s meaningful. The same approach could be applied to Samuel the Lamanite and earlier prophetic voices, or to Moroni’s concluding passages. In each case, the goal is to move from anecdotal examples to systematic measurement.

Another important step would be cross-text comparison. If similar network structures appear in other long, internally consistent texts—religious or otherwise—then the patterns observed here may not be unique. Without that comparison, it’s hard to know whether the Book of Mormon stands out or simply behaves like many complex narratives.

What This Work Is, at Its Best

Even with these limitations, the approach has real value. As a form of computationally assisted reading, it encourages attention to detail and highlights connections that might otherwise go unnoticed. It invites readers to think about voice, repetition, and structure in a more deliberate way. The intertextual examples, in particular, function well as literary insights. They show how a text can build meaning through internal reference, regardless of how that structure came to be.

The key is framing. When presented as a set of intriguing patterns that deepen engagement with the text, the method works. When presented as strong evidence for specific conclusions—especially about authorship or historical origin—it goes beyond what the methodology can support.

This approach occupies a middle ground between quantitative analysis and literary interpretation. Its strengths lie in surfacing patterns, generating hypotheses, and enriching close reading. Its weaknesses stem from the lack of formal testing, the absence of baseline comparisons, and the reliance on selectively chosen examples. The analysis is interesting and sometimes insightful, but it is not, in its current form, conclusive. To reach that level, it would need to adopt the tools it gestures toward—rigorous stylometric methods and fully developed network analysis—and apply them systematically. Until then, it remains best understood not as proof of anything, but as a starting point for more careful and disciplined investigation.

Stylometry and Authorship Analysis: a Review of John Hilton III’s Appearance on Keystone Podcast

Jacob Billings

Stylometry and Authorship Analysis: a Review of John Hilton III’s Appearance on Keystone Podcast

What the Conversation Gets Right

Where the Methodology Falls Short

What a Proper Network Analysis Would Look Like

What This Work Is, at Its Best

Newsletter Signup

What the Conversation Gets Right

Where the Methodology Falls Short

What a Proper Network Analysis Would Look Like

What This Work Is, at Its Best

Newsletter Signup

Related posts:

Post navigation