Research Fragments: GW

Showing posts with label GW. Show all posts

Friday, March 13, 2015

The history of the late medieval book in one boxplot

One of the basic ways to describe the types used for fifteenth-century printed books is the method refined by Konrad Haebler that involves, among other things, measuring the height of twenty lines of type. The height of a typeface affected its legibility, or how far a reader could be from a text and still be able to read it. The height of the type was also significant in relation to the other types used in a book, as a type taller than the one used for the main text often identified titles and other structural paratexts, while a shorter type was typically used for marginal commentary. So what at first glance might sound like a number only interesting to antiquarians turns out to have some interesting implications for the history of reading as a cultural practice.

With the availability of the Typenrepertorium der Wiegendrucke as an electronic resource linked to the Gesamtkatalog der Wiegendrucke, it's now possible to survey type heights systematically, so it should be possible to look at how type heights develop during the fifty years following the invention of printing. What may not be obvious is that we can do something similar for German manuscripts as well, as the Handschriftencensus records the height of the writing space and the number of lines for manuscripts where this can be determined - so we can divide the writing space height by the number of lines, multiply by twenty, and arrive at the "Haebler height" for each manuscript.

The boxplot below summarizes the means and 25th/75th-percentile limits for German vernacular mansucripts between 1351 and 1450 (with approximate dates coerced to a single year), and printed books separated by decade (with the 1450s and 1460s combined due to the small number of editions from the 1450s). While we're measuring what I think are comparable things, they're not precisely the same: the left column is looking at the line height per manuscript, while the other four columns look at the line height per occurrence of a give type - so a type used in five editions over fifteen years will be counted five times over two different decades. This is, I think, the best way to determine what a typical book might look like, with a frequently-used type counted more times than a type that was only used once. (To be completely consistent, we would also need to look only at books printed in Germany rather than all incunables.)

The next boxplot limits the y axis to make the picture clearer.

What we see here is that the earliest printed books used types that were very similar in height, on average, to the line heights found in manuscripts over the preceding century, while the 1470s form a period of transition between the earliest printed books and the 1480s and 1490s, when noticeably smaller types were preferred. The use of smaller types allowed for the production of less expensive books, with more printed text per unit of paper, but it took a few decades before the technical possibilities of the printing press could reshape reader's expectations for what their books should look like.

Friday, February 20, 2015

How atypical are the editions in Eric White's census of print runs?

In the introduction to his census of known fifteenth-century print runs, Eric White cautions against taking his results as representative for all incunables:

As the census will make immediately apparent, a large percentage of editions for which we know the print runs were produced to fulfill institutional functions.... Remunerative and relatively risk-free for printers, the original commissions for projects such as these tended to end up in surviving archives, and they tended to afford very large editions. It should be noted, therefore, that the print runs known from such institutional commissions do not represent a normative cross-section of fifteenth-century press production, but rather a selection of large scale projects carried out with institutional funding and pressure to produce. As a group they almost certainly reflect higher-than-average print runs.... Moreover, the majority of the recorded print runs reflect the output not of the ‘average’ printing shop, but rather that of a few exceptionally successful publishers who received commissions from well-funded institutions. It is worth remembering that a documented print run may not be a representative print run.

White's characterization of his sample is correct. Compared to all recorded incunables in the ISTC, folios are much more prevalent in the print run census, while quartos are underrepresented, and broadsides do not appear at all.

Comparison of format distribution

For each format, the books are also substantially longer, with the average number of leaves 60-100% higher than for the ISTC as a whole. (NB: Averages can be a misleading way to describe the distribution of leaf counts, but they give a correct impression in this case.)

Comparison of average leaf count by format

White's suggestion that the sample of known print runs enjoyed a better survival rate than other incunables is also correct, with an average number of surviving copies 20-65% higher than what one finds for the ISTC as a whole. (NB: Averages can be even more misleading for describing survival rates.)

Comparison of average surviving copies by format

This doesn't mean that we should ignore White's census of print runs as an atypical sample, however. Rather, we can say that its sample differs from the body of known incunables in various ways, some of which have well-understood effects. For example, the size and format of editions in White's sample are larger on average than for the ISTC as a whole, and the included editions likely benefited from association with an institutional sponsor, all of which are associated with higher survival rates than other fifteenth-century printed books, so that we would expect the survival rate for White's sample to be higher than for the ISTC as a whole.

Friday, November 21, 2014

Really early, very small, printed German literature (in the narrow sense)

If you want to look at the literary works that would have been accessible to the broadest range of people in the fifteenth century, then one place to start is with works printed in the vernacular and in smaller formats. In the vernacular, education is less of a barrier, and in the smaller formats (initially defined as broadsides, octavos, and quartos of less than 48 leaves), the economic challenge of acquiring literature is as low as it gets at the time. To look at the market for these works before printing reorganized the market for texts and the medium of the book, it makes sense to look only as late of 1480.

While I'm actually in favor of an expansive definition of literature and an inclusive approach to the objects of literary study, a narrow definition of literature is sometimes pragmatically necessary. We'll eliminate for now saints' lives and other devotional works, and pragmatic and educational texts (including history, current events, and the natural world).

Given those criteria, the resulting bibliography is quite short. It can be succinctly categorized like this:

Narrative works and literary classics
The first two clearly belong together. The Ackermann is an established part of the literary canon, but it's more similar in some ways to the humanist works below. On the other hand, the Ackermann and Pfaffe Amis have a considerable manuscript tradition, while the Pfarrer von Kahlenberg is only known in print.

Der Stricker, Pfaffe Amis (ca. 1478, GW M4411)
Philipp Frankfurter, Der Pfarrer von Kahlenberg (ca. 1480, GW 10287)
Johannes von Tepl, Der Ackermann von Böhmen (1463-1477, GW 193-198)

Humanist translations
These end up being the works of just two translators: Heinrich Steinhöwel and Nikolaus von Wyle.

Heinrch Steinhöwel/Fracesco Petrarca, Griseldis (1470-1480, GW M31576-78, M31580-81, M31583, M3158410, M31597)
Heinrch Steinhöwel, Apollonius of Tyre (1471, GW 2273)
Leonardus Aretinus/Nikolaus von Wyle, Guiscardus et Sigismunda (1476, GW 5643, 564210N)
Aeneas Sylvius Piccolomini/Nikolaus von Wyle, Euryalus et Lucretia (1478, GW M33548)
Lucian/Nikolaus von Wyle, Der goldene Esel (1477-1480, GW M18985, M18988)

Hans Folz
For shorter literary works to 1480, Folz only makes it in by two years, but even in that short time he has too many titles to list.

Sixteen titles (in seventeen editions) from 1479-80

Border cases
These are works that might be excluded as devotional or educational works under a narrow definition of literature. As I prefer a broad definition, I'll include them here.

Die wunderbare Meerfahrt des hl. Brandan (1476, GW 5004)
Sibyllen Weissagung (1452, 1475; GW M41981, M41983)
Visio Fulberti (1473, GW 10422)
Wie Arent Bosman ein Geist erschien (1479, GW 4944)

Friday, October 31, 2014

Word, Zotero, Excel, Perl and back: online facsimiles and the digital research process in the humanities (with source code)

At a recent conference on book history and digital humanities in Wolfenbüttel, I was struck by the very different places the presenters were in with respect to adopting digital tools. Some presenters did little more than write their papers using Word, some made use of bleeding-edge visualization tools, and others were everywhere in between. One is not necessarily better than the others, as long as the appropriate tools are being used for the task - some projects just don't require complex visualization strategies. While watching the presentations, it occurred to me that selecting digital tools for my own research is like peeling back the layers of an onion.

For the simplest research projects, Word is enough for taking notes and then turning those notes into a written text. When I write book reviews, for example, I rarely need anything more than Word.

For more complex projects, I use Zotero to manage notes and bibliography, and for creating footnotes. For my forthcoming article on Lienhard Jost, I took several pages of notes in Word. Then I created a subcollection in Zotero that held a few dozen bibliographical sources, some with child notes. When I started writing, I used Zotero to create all the footnotes.

If I have a significant amount of tabular data, however, of if I need to do any calculation, then Excel becomes an essential tool for analysis and information management. If I'm working with more than five or ten editions, I'll create a spreadsheet to keep track of them. For a joint article in progress on "Dietrich von Zengg," I have notes from Word, a Zotero subcollection to manage bibliography and additional notes, and an Excel spreadsheet of 20+ relevant editions (including among other things the date and place of printing, VD16 number, and whether I have a facsimile or not). For establishing the textual history, I start with Word, but I need Excel to keep track of all the variants. I use Zotero again for footnotes as I write the article in Word.

Once I have more than several dozen primary sources, however, neither Excel nor Zotero are sufficient, especially if I need to look at subsets of the primary sources. For Printing and Prophecy, I made a systematic search of GW/ISTC/VD16 for relevant printed editions before 1550 (extended to 1620 for an article I published in AGB), then entered all the information into an Access database. It took months of daily data-entry, and I still update the database whenever I come across something that sounds relevant. For my work now, it's a huge time-saver. If I want to know what prophetic texts were printed in Nuremberg between 1530 and 1540, it takes me about ten seconds to find out, and I can also create much more complex queries. Or if I'm planning to visit Wolfenbüttel and want to find out which printed editions in the Herzog August Bibliothek I've never seen in person or in facsimile, a database search will send me in the right direction. I'll also import tabular data into Access and export tables from Access into Excel based on particular queries, such as the table of all editions attributed to Wilhelm Friess. The Strange and Terrible Visions was an Excel project, but Printing and Prophecy was driven by Access.

For projects that require very specific or complex analysis, or that involve online interaction, Access is not the right tool for me. To create the Access database for "The Shape of Incunable Survival and Statistical Estimation of Lost Editions," I spent several hours writing Perl scripts to query the ISTC and analyze the results. Every so often I'll need to write a new Perl script to extend one of the Access databases that I work with. (One doesn't have to use Perl, of course. Oliver Duntze's presentation involved some very interesting work on typographic history using Python. R and other statistical packages probably belong here, too.)

* * *

Problem: As I'm located a few thousand miles from most of my primary sources, digital facsimiles have become especially important to my work. I check several digitization projects daily for new facsimiles and update my database accordingly. Systematic searching for facsimiles led me to the discovery Lienhard Jost's lost visions, so I've already seen real professional benefit from keeping track of new digital editions. But what if I miss a day or skip over an important title? I might be missing out on something important.

Solution: Create an Access query to check my database and find all the titles in VD16 for which no facsimile is known. Time: 10 seconds. Result: 1,565 editions.
Export the results to Excel, and save them as a text file. Time: 10 more seconds.
Search VD16 for all 1,565 editions by hand and check for new facsimiles? Time: 13 mind-numbing hours?

No.

Instead, write a short Perl script (see below) to read the text file, query the VD16 database, and spit out the VD16 numbers when it finds a link to a digital edition. Time: 1 hour (30 minutes programming, 30 minutes run time). Result: 280 new digital facsimiles that I had overlooked.

To inspect all of those facsimiles and see if they're hiding anything exciting will take a few weeks, but most of the time I spend on it will involve core scholarly competencies of reading and evaluating primary sources, and it will make my knowledge of the sources more comprehensive and complete. In cases like this, digital tools let me get to the heart of my scholarship more efficiently and spend less time on repetitive tasks.

Does every humanist need to know a programming language, as some conference participants suggested? I don't know. We need to constantly acquire or become conversant in new skills, both within and outside our discipline, but sometimes it makes more sense to rely on the help of experts. I don't think it's implausible, however, that before long it will be as common for scholars in the humanities to use programming languages as it is for us to use Excel today.

* * *

After all, if you can learn Latin, Perl isn't difficult.

### This script reads through a list of VD16 numbers (assumed to be named 'vd16.txt' and found in the same directory where the
### script is run, with one entry on each line in the form 'VD16 S 843'). The script loads the corresponding VD16 entry and checks
### to see if a digital facsimile is available. If it is, it prints the VD16 number.

### This script relies on the following Perl modules: LWP

use strict;
use LWP::Simple;    ### For grabbing web pages

my $VD16number;
my $url;
my $baseurl = 'http://gateway-bayern.de/';
my $VD16page;
my $formatVD16number;

open FILE , "<VD16.txt";

while (<FILE>) {
    ### Create the URL to check
    ($VD16number) = ($_);
    ### Remove the newlines
    chomp ($VD16number);
    ### Replace spaces with + signs for the durable URL (seems not to be strictly necessary, will accept spaces OK)
    $formatVD16number = $VD16number;
    $formatVD16number =~ s/[ ]/+/g;
    ### Set up the URL we want to retrieve
    $url = $baseurl . $formatVD16number;
    ### Now load that page
    $VD16page = get $url;
    ### Now look for a facsimile link
    if ($VD16page =~ /nbn-resolving.de/) {
    ### If we find one, print the VD16 number
        print "$VD16number\n";
    }
}

Friday, September 5, 2014

Abstract: "Bibliographic databases and early printing: Questions we can’t answer and questions we can’t ask"

For the upcoming meeting of the Wolfenbütteler Arbeitskreises für Bibliotheks-, Buch- und Mediengeschichte on the topic of book history and digital humanities, the paper I'm giving will briefly summarize the paper I gave at the "Lost Books" conference in June, then look specifically at the bibliographical databases of early printing that we have today. These databases are invaluable, but their current design makes some kinds of research easy and other kinds quite difficult. Along with charts and graphs, my presentation will look at some specific examples of what can and can't be done at the moment, and offer some suggestions of what might be done in the future.

Abstract: Bibliographic databases and early printing: Questions we can’t answer and questions we can’t ask.
In 1932, Ernst Consentius first proposed addressing the question of how many incunable editions have been lost by graphing the number of editions preserved in a few copies or just one and projecting an estimate based on that graph. The problem Consentius posed is in fact only a variation of a problem that can be found in many academic fields, known in English since 1943 as the “unseen species problem,” although it has not been recognized as such until very recently. Using the well-established statistical methods and tools for approaching the unseen species problem, I and Frank McIntyre have recently updated the estimate that we first published in 2010. Depending on the assumptions used, our new estimate is that between 33% (of all editions) and 49% (of codex editions, plus an indeterminate but large number of broadsides) have been lost.

The problem of estimating lost editions exemplifies how data-drive approaches can support book history, but it also illustrates how databases of early printing impose limits on research in the way they structure their records and in the user interfaces by which they make data available. Of the current database projects relevant for early printing in Germany (GW, ISTC, VD16, VD17, and USTC), each has particular advantages in the kinds of data it offers, but also particular disadvantages in how it permits that data to be searched and accessed. Clarity and consistency of formatting would help in some cases. All of the databases could profit by adding information that only one or none of the databases currently provide, such as leaf counts. User interfaces should reveal more of each database’s fields, rather than making them only implicitly visible through search results. Monolithic imprint lines, particularly those that make use of arcane or archaic terminology, must be replaced by explicit differentiation of printers and publishers.

Of the current databases, the technological advantages of VD16 are often overlooked. Its links from editions to a separate listing of shelf marks makes it possible to count copies more simply and accurately than any other database, and its links from authors’ names to an external authority file of biographical data provide the basis for characterizing the development of printing in the sixteenth century. Most importantly, VD16 provides open access to its MARC-formatted records, allowing an unequaled ease and accuracy when analyzing records of sixteenth-century printing. Many VD16 records lack information about such basic information as their language, however.

The missing language fields in VD16 provide an example of the challenges faced in attempting to compare bibliographic data across borders or centuries. One approach to this problem, taken by the USTC as a database of databases, is to offer relatively sparse data to users. I suggest as an alternative to this a different approach: Databases should open their data to contributions from and analysis by scholars all over the world by making their records freely available. Doing so will allow scholars of book history to pursue data-driven approaches to questions in our field.

Wednesday, July 23, 2014

All sheets are not created equal

At the recent Lost Books conference in St Andrews, a topic that came up during discussion was "survival of the fattest": Books with more leaves tend to survive in greater numbers of copies that thinner books. The USTC apparently has plans to include the number of sheets used in the production of each edition.

The number of sheets is useful, but not quite the key information that one would hope it would be. As Frank McIntyre and I were preparing our paper, we originally considered sheet counts as a way to to enable comparison between formats. If you fold a sheet in half for a folio, or in four for a quarto, or in eight for an octavo, should be of no concern: a sheet is a sheet is a sheet.

Alas, it is not so. When it comes to book survival, how that sheet gets folded matters, as format is still the single most important variable in book survival.

For example, consider books of 8-16 sheets, including folios of 17-32 leaves, quartos of 33-64 leaves, and octavos of 65-128 leaves. Those are thin folios, handy quartos, and thick little octavos, but all composed of the same number of sheets. (Ideally, we would also factor in paper sizes, but that will have to wait for another generation of bibliographic databases.)

If we graph the percentage of incunable editions that survive in a given number of copies, this is what we find:

Despite all having 8-16 sheets, 12% of these folios survive in a single copy, while 20% of the quartos and nearly 30% of the octavos are known by just one copy. Book format informed choices about production, use, and survival more than leaf count did, and in a way that can't be reduced to a simple matter of bulk.