Abstract: Bibliographic databases and early printing: Questions we can’t answer and questions we can’t ask.
In 1932, Ernst Consentius first proposed addressing the question of how many incunable editions have been lost by graphing the number of editions preserved in a few copies or just one and projecting an estimate based on that graph. The problem Consentius posed is in fact only a variation of a problem that can be found in many academic fields, known in English since 1943 as the “unseen species problem,” although it has not been recognized as such until very recently. Using the well-established statistical methods and tools for approaching the unseen species problem, I and Frank McIntyre have recently updated the estimate that we first published in 2010. Depending on the assumptions used, our new estimate is that between 33% (of all editions) and 49% (of codex editions, plus an indeterminate but large number of broadsides) have been lost.
The problem of estimating lost editions exemplifies how data-drive approaches can support book history, but it also illustrates how databases of early printing impose limits on research in the way they structure their records and in the user interfaces by which they make data available. Of the current database projects relevant for early printing in Germany (GW, ISTC, VD16, VD17, and USTC), each has particular advantages in the kinds of data it offers, but also particular disadvantages in how it permits that data to be searched and accessed. Clarity and consistency of formatting would help in some cases. All of the databases could profit by adding information that only one or none of the databases currently provide, such as leaf counts. User interfaces should reveal more of each database’s fields, rather than making them only implicitly visible through search results. Monolithic imprint lines, particularly those that make use of arcane or archaic terminology, must be replaced by explicit differentiation of printers and publishers.
Of the current databases, the technological advantages of VD16 are often overlooked. Its links from editions to a separate listing of shelf marks makes it possible to count copies more simply and accurately than any other database, and its links from authors’ names to an external authority file of biographical data provide the basis for characterizing the development of printing in the sixteenth century. Most importantly, VD16 provides open access to its MARC-formatted records, allowing an unequaled ease and accuracy when analyzing records of sixteenth-century printing. Many VD16 records lack information about such basic information as their language, however.
The missing language fields in VD16 provide an example of the challenges faced in attempting to compare bibliographic data across borders or centuries. One approach to this problem, taken by the USTC as a database of databases, is to offer relatively sparse data to users. I suggest as an alternative to this a different approach: Databases should open their data to contributions from and analysis by scholars all over the world by making their records freely available. Doing so will allow scholars of book history to pursue data-driven approaches to questions in our field.
Friday, September 5, 2014
Abstract: "Bibliographic databases and early printing: Questions we can’t answer and questions we can’t ask"
For the upcoming meeting of the Wolfenbütteler Arbeitskreises für Bibliotheks-, Buch- und Mediengeschichte on the topic of book history and digital humanities, the paper I'm giving will briefly summarize the paper I gave at the "Lost Books" conference in June, then look specifically at the bibliographical databases of early printing that we have today. These databases are invaluable, but their current design makes some kinds of research easy and other kinds quite difficult. Along with charts and graphs, my presentation will look at some specific examples of what can and can't be done at the moment, and offer some suggestions of what might be done in the future.