Abstract: Bibliographic databases and early printing: Questions we can’t answer and questions we can’t ask.
In 1932, Ernst Consentius first proposed addressing the question of how many incunable editions have been lost by graphing the number of editions preserved in a few copies or just one and projecting an estimate based on that graph. The problem Consentius posed is in fact only a variation of a problem that can be found in many academic fields, known in English since 1943 as the “unseen species problem,” although it has not been recognized as such until very recently. Using the well-established statistical methods and tools for approaching the unseen species problem, I and Frank McIntyre have recently updated the estimate that we first published in 2010. Depending on the assumptions used, our new estimate is that between 33% (of all editions) and 49% (of codex editions, plus an indeterminate but large number of broadsides) have been lost.
The problem of estimating lost editions exemplifies how data-drive approaches can support book history, but it also illustrates how databases of early printing impose limits on research in the way they structure their records and in the user interfaces by which they make data available. Of the current database projects relevant for early printing in Germany (GW, ISTC, VD16, VD17, and USTC), each has particular advantages in the kinds of data it offers, but also particular disadvantages in how it permits that data to be searched and accessed. Clarity and consistency of formatting would help in some cases. All of the databases could profit by adding information that only one or none of the databases currently provide, such as leaf counts. User interfaces should reveal more of each database’s fields, rather than making them only implicitly visible through search results. Monolithic imprint lines, particularly those that make use of arcane or archaic terminology, must be replaced by explicit differentiation of printers and publishers.
Of the current databases, the technological advantages of VD16 are often overlooked. Its links from editions to a separate listing of shelf marks makes it possible to count copies more simply and accurately than any other database, and its links from authors’ names to an external authority file of biographical data provide the basis for characterizing the development of printing in the sixteenth century. Most importantly, VD16 provides open access to its MARC-formatted records, allowing an unequaled ease and accuracy when analyzing records of sixteenth-century printing. Many VD16 records lack information about such basic information as their language, however.
The missing language fields in VD16 provide an example of the challenges faced in attempting to compare bibliographic data across borders or centuries. One approach to this problem, taken by the USTC as a database of databases, is to offer relatively sparse data to users. I suggest as an alternative to this a different approach: Databases should open their data to contributions from and analysis by scholars all over the world by making their records freely available. Doing so will allow scholars of book history to pursue data-driven approaches to questions in our field.
Jonathan Green's research notes on early printing and the language, literature, and culture of medieval and early modern Germany
Showing posts with label USTC. Show all posts
Showing posts with label USTC. Show all posts
Friday, September 5, 2014
Abstract: "Bibliographic databases and early printing: Questions we can’t answer and questions we can’t ask"
For the upcoming meeting of the Wolfenbütteler Arbeitskreises für Bibliotheks-, Buch- und Mediengeschichte on the topic of book history and digital humanities, the paper I'm giving will briefly summarize the paper I gave at the "Lost Books" conference in June, then look specifically at the bibliographical databases of early printing that we have today. These databases are invaluable, but their current design makes some kinds of research easy and other kinds quite difficult. Along with charts and graphs, my presentation will look at some specific examples of what can and can't be done at the moment, and offer some suggestions of what might be done in the future.
Wednesday, July 23, 2014
All sheets are not created equal
At the recent Lost Books conference in St Andrews, a topic that came up during discussion was "survival of the fattest": Books with more leaves tend to survive in greater numbers of copies that thinner books. The USTC apparently has plans to include the number of sheets used in the production of each edition.
The number of sheets is useful, but not quite the key information that one would hope it would be. As Frank McIntyre and I were preparing our paper, we originally considered sheet counts as a way to to enable comparison between formats. If you fold a sheet in half for a folio, or in four for a quarto, or in eight for an octavo, should be of no concern: a sheet is a sheet is a sheet.
Alas, it is not so. When it comes to book survival, how that sheet gets folded matters, as format is still the single most important variable in book survival.
For example, consider books of 8-16 sheets, including folios of 17-32 leaves, quartos of 33-64 leaves, and octavos of 65-128 leaves. Those are thin folios, handy quartos, and thick little octavos, but all composed of the same number of sheets. (Ideally, we would also factor in paper sizes, but that will have to wait for another generation of bibliographic databases.)
If we graph the percentage of incunable editions that survive in a given number of copies, this is what we find:
Despite all having 8-16 sheets, 12% of these folios survive in a single copy, while 20% of the quartos and nearly 30% of the octavos are known by just one copy. Book format informed choices about production, use, and survival more than leaf count did, and in a way that can't be reduced to a simple matter of bulk.
The number of sheets is useful, but not quite the key information that one would hope it would be. As Frank McIntyre and I were preparing our paper, we originally considered sheet counts as a way to to enable comparison between formats. If you fold a sheet in half for a folio, or in four for a quarto, or in eight for an octavo, should be of no concern: a sheet is a sheet is a sheet.
Alas, it is not so. When it comes to book survival, how that sheet gets folded matters, as format is still the single most important variable in book survival.
For example, consider books of 8-16 sheets, including folios of 17-32 leaves, quartos of 33-64 leaves, and octavos of 65-128 leaves. Those are thin folios, handy quartos, and thick little octavos, but all composed of the same number of sheets. (Ideally, we would also factor in paper sizes, but that will have to wait for another generation of bibliographic databases.)
If we graph the percentage of incunable editions that survive in a given number of copies, this is what we find:
Despite all having 8-16 sheets, 12% of these folios survive in a single copy, while 20% of the quartos and nearly 30% of the octavos are known by just one copy. Book format informed choices about production, use, and survival more than leaf count did, and in a way that can't be reduced to a simple matter of bulk.
Subscribe to:
Posts (Atom)
