Friday, March 27, 2015

Incunable leaf sizes

Confirmed: The earliest printed books look very much like books. Specifically, the ratio of leaf height to leaf width and the height-width ratio of the type space are precisely what you would expect.

That sounds complete uninteresting, but before making that statement in an article I'm working on, I wanted some actual data. That's the tricky part, however, as most incunable catalogs, and all of the incunable databases that I'm aware of, only record the format - as opposed to manuscript catalogs, which usually record the page dimensions, but not the format. Thanks to a tip from Oliver Duntze, I checked the British Museum incunable catalog. For 23 Mainz codex editions to 1470 recorded in BMC, the average leaf size ratio is 1: 1.44, while the type space ratio (from 25 editions) is a bit narrower, 1: 1.51. There is some variation, but most of these early printed books fall quite close to the mean, as the plot below shows. Leaf size is in red, while type space dimensions are in blue, with linear trend lines added to each.
 Fig. 1: Leaf height and width (red) and writing space height and width (blue) in Mainz codex editions to 1470.

To compare incunable leaf sizes rather than ratios, the BMC records for Mainz printing to 1470 might not be the best source, as many of those volumes are deluxe folio editions on vellum. Instead I referred to the Bodleian Library incunable catalog, which also provides leaf sizes. The graph below shows the leaf height for 15 folio editions, 26 quarto editions, and 2 octavo editions. More editions would of course be preferable, but since I don't have electronic records to work with, the data have to be entered manually. You can in any case already see the distinct formats: octavo leaf heights appear in red, quartos in gray, and folios in blue.
 Fig. 2: Leaf heights (mm) of a selection of folio (blue), quarto (gray), and octavo (red) incunables from the Bodleian Library.

Two things stand out: First, the folios clearly comprise different paper sizes, one with an average height around 290 mm, and another with an average height around 410 mm. Second, small quartos overlap with octavos. It would be interesting to look at more leaf sizes of these smaller formats.

Friday, March 13, 2015

The history of the late medieval book in one boxplot

One of the basic ways to describe the types used for fifteenth-century printed books is the method refined by Konrad Haebler that involves, among other things, measuring the height of twenty lines of type. The height of a typeface affected its legibility, or how far a reader could be from a text and still be able to read it. The height of the type was also significant in relation to the other types used in a book, as a type taller than the one used for the main text often identified titles and other structural paratexts, while a shorter type was typically used for marginal commentary. So what at first glance might sound like a number only interesting to antiquarians turns out to have some interesting implications for the history of reading as a cultural practice.

With the availability of the Typenrepertorium der Wiegendrucke as an electronic resource linked to the Gesamtkatalog der Wiegendrucke, it's now possible to survey type heights systematically, so it should be possible to look at how type heights develop during the fifty years following the invention of printing. What may not be obvious is that we can do something similar for German manuscripts as well, as the Handschriftencensus records the height of the writing space and the number of lines for manuscripts where this can be determined - so we can divide the writing space height by the number of lines, multiply by twenty, and arrive at the "Haebler height" for each manuscript.

The boxplot below summarizes the means and 25th/75th-percentile limits for German vernacular mansucripts between 1351 and 1450 (with approximate dates coerced to a single year), and printed books separated by decade (with the 1450s and 1460s combined due to the small number of editions from the 1450s). While we're measuring what I think are comparable things, they're not precisely the same: the left column is looking at the line height per manuscript, while the other four columns look at the line height per occurrence of a give type - so a type used in five editions over fifteen years will be counted five times over two different decades. This is, I think, the best way to determine what a typical book might look like, with a frequently-used type counted more times than a type that was only used once. (To be completely consistent, we would also need to look only at books printed in Germany rather than all incunables.)
The next boxplot limits the y axis to make the picture clearer.
What we see here is that the earliest printed books used types that were very similar in height, on average, to the line heights found in manuscripts over the preceding century, while the 1470s form a period of transition between the earliest printed books and the 1480s and 1490s, when noticeably smaller types were preferred. The use of smaller types allowed for the production of less expensive books, with more printed text per unit of paper, but it took a few decades before the technical possibilities of the printing press could reshape reader's expectations for what their books should look like.

Friday, March 6, 2015

One Republic of Learning: Counting Stares

Recently in the New York Times editorial pages - the most prominent platform for short-form opinion writing in the United States and equal to any in the English-speaking world - Armand Marie Leroi, a professor of developmental biology, argued that the humanities, if they are to have a future, must make the transition to a mathematically-based science.

There is much in Leroi's argument that I agree or sympathize with. The digitization projects of the last decade or more truly have changed what is possible in the humanities. We have easy access to a breadth of sources that was entirely unknown just a few decades ago. It is also true that scholars in the humanities sometimes make overly broad statements based on slim evidence, and that we sometimes make assertions with statistical implications without bothering to gather data or test the likelihood of those assertions. As Leroi states, "Digitization breeds numbers; numbers demand statistics." I've beat on this drum a few times myself. At the conceptual if not the computational level, statistical and computational methods are not out of our reach. With less work than it takes to learn Latin, we in the humanities can make these methods our own.

And yet several times while reading the essay, I found myself staring at the text and wondering what Leroi could possibly be thinking. Now that we have a decade of experience with digitization, we can recognize both its promise and its limits. Digitization does not turn "caterpillars into butterflies"; we have seen media change before, and we know that there are both gains and losses. The easy access to facsimile images obscures the difficulty of determining what other pamphlets were bound together as a single volume, for example, an important fact that would have once been obvious to anyone visiting an archive in person. And it is not only scientists who "know that impressions lie"; humanists have been studying representation and memory for a long time.

A telling episode in Leroi's essay involves a hypothetical graduate student who reacts to the argument of a traditional scholar based on textual evidence by downloading texts, running algorithms, applying statistical analysis, and visualizing the results in order to disprove the traditional scholar's point. Now, digital texts are marvelous things, but you have to understand what it is they represent. Is it an autograph? The first edition? The last authorized edition? A critical edition? A transcription of an early, fragmentary manuscript, or a late, complete one? You can postpone some of these questions, but you can't avoid them forever. Textual editing, electronic or not, is difficult, painstaking, and often thankless work. The point is that these downloadable texts don't simply exist; they have been created by people with particular outlooks and specific places and histories, and serious work in the humanities has to be aware of those aspects. And what algorithms should the graduate student run? The coin of the realm in the textual humanities remains close reading, with careful attention to context and levels of meaning. At the moment, the algorithms at our disposal enable only distant reading. It's certainly true that the graduate student and the scholar may end up talking past each other, but that won't do anyone any good (especially the graduate student, who will be fishing for recommendation letters when he or she hits the job market in a few years).

If we do in fact reach the point where the digital humanities expresses its results "not in words, but equations," where the "analog scholar won't even know how to read the results," then the digital humanities will fail. The humanities as academic disciplines have a particular set of guiding questions, and if a would-be contribution to the field does not address any of those questions, or does so in a way that is incomprehensible to practitioners, then it will be ignored. Leroi, a biologist, thinks that the new humanities disciplines will resemble evolutionary biology, with contributions from "biologists, economists, and physicists." While all of these disciplines have useful insights and methods for the humanities, what they do not have is a grasp on the questions that are of primary importance to humanists, or the language humanists use to express their findings. It is furthermore not at all clear that the tools of evolutionary biology, where reproduction is the first imperative of the most basic building blocks of life, should apply to culture, where it is not.

It is not as if we have not been down this path before. There is a history of mathematical approaches to the humanities, and it is a history littered with dead ends. While lurking in the stacks as a graduate student at the University of Illinois, I would regularly come across books published in the 70s and 80s, precursors of a sort to the Mittelhochdeutsche Begriffsdatenbank, that attempted to semantically encode various medieval texts so that one could search for not just textual but rather significant semantic collocations. I know of no useful scholarship that ever came out of these efforts. On a happier note, corpus linguistics has been a well established discipline for several decades - but it has supplemented, not supplanted, other approaches to syntax and morphology. Leroi holds up the "unforgiving terms and journals that scientists read," and yet STEM peer review has not proved to be an effective arbiter of quality in the humanities. Instead, leading STEM journals have regularly published headline-grabbing articles that apply computational and statistical methods to historical linguistics, for example, and failed to recognize in the nonsensical results a mirror image of the Sokal affair. If the basic assumptions of one field (for example, that rates of gene mutation are predictable) simply don't apply to another (linguistics really and truly reject glottochronology), then the methods will not be transferable, not because analog scholars are hidebound, but because they have a grounding in their disciplines that their neighbors across the quad simply do not have.