Digging into Data

What to do with a Million Images


For more than 40 years, humanities scholars have used computational analysis to help resolve issues of authorship. Through stylistic and linguistic analysis, researchers have puzzled out answers to questions that range from who wrote The Federalist Papers and to who collaborated with Shakespeare on Henry VIII and Pericles. While determining a writer’s “genetic fingerprint” is a difficult task, the wealth of scholarship and algorithms that have developed around printed textual analysis promises to help solve a number of vexing authorship issues as well as expand our knowledge of the written arts. However, in the area of visual arts, computational analysis of authorship has not made the same inroads. To do authorship studies of visual works, scholars must often do painstaking point by point analysis of small sets of 2D images of the objects. This work becomes all the more difficult when dealing with cultural artifacts such as quilts, maps and medieval manuscripts that often have corporate and anonymous authors working in community groups, guilds, artisan shops, and scriptoriums. Beyond the difficulties of authorship attribution, larger important humanities questions about the influence and migration of artistic elements and patterns become all but impossible to assess when large datasets require individual scholarly inspection of each image. To this end, we propose to address authorship and the corresponding image analyses leading to computationally scalable and accurate data-driven discoveries in image repositories.

This effort will utilize three datasets of visual works –15th-century manuscripts, 17th and 18th-century maps, and 19th and 20th-century quilts. Overarching humanities research questions emerge from these groups of works, such as how visual and production styles reflect regional tastes or historical moments, how traumatic historical events manifest in cultural production, and how artifacts reflect and influence relationships between cultural groups. Together these works present a range of complex authorial relationships and a strong base for advancing knowledge on the research problems inherent to scalable, automated image analyses. Open research problems are divided below into artistic, scientific and technological questions based on the specific datasets that elicit those questions. We expect these questions will be useful across the work of all three groups.

For the 15th-century manuscripts, Froissart’s Chronicles, the artistic questions include: Where and by whom were these manuscripts created? How does a manuscript reflect the tastes of the particular region and historical moment to which it belongs? What does the codicological evidence—scribal hands, catchwords, page layouts, artistic styles in the miniatures and marginal decoration—suggest about book production in this period? The scientific questions for Froissart’s Chronicles ask: Since these manuscripts were made during the Hundred Years’ War, what was the impact of war on culture as measured by the various aspects of these manuscripts, e.g., evidence of patronage? How do they reflect contacts between the cultures of France and England? How do they reflect the ideology of chivalry or the concept of history? The questions for these medieval manuscripts are related to: (a) studying the composition and structure (codicology) of the manuscripts as cultural artifacts of the book trade in later medieval Paris; and (b) identifying the characteristic stylistic, orthographic and iconographic ‘signatures’ of particular scribes and artists and their collaborators who contributed to the illustration and decoration of the volumes, through the use of image recognition and data mining techniques. A further potential output from identifying scribal hands using image analysis techniques is a process that can transcribe the text from the images, a task that is currently done manually by skilled scholars. Thus not only would the content be subjected to analysis but it might also be possible to process it to allow scholars to perform further text-based mining (although not as part of this proposal) on the previously untouchable textual corpus that is locked away as pixels in an image.

The 17th- and 18th-century maps come from atlases by Joan Blaeu and Herman Moll (original atlases and digital scans held at the University of Illinois Library). The artistic questions for these maps include: What characteristics distinguish individual and corporate groups of artists and engravers? Criteria such as color palette, graphic representations of ships, shading of coastlines, and fonts can be considered as distinctive traits that identify both a) particular artists and engravers, b) the corporate styles developed by the Blaeu family in 17th-century Amsterdam (Joan was the son of Willem Blaeu, who founded the largest mapmaking engraving and publishing house in the world) and by Moll and his collaborators who adapted Dutch conventions of mapmaking for English audiences in the early 18th century, and c) national styles of depicting specific geographic and manmade features (cities, fortifications, trade centers, etc.). The scientific and technological questions are: Do specific maps show a more detailed geographical and/or climatological knowledge in representations of coastlines and harbors? Or navigable rivers? Or shoals and sand bars that pose dangers for ships? Or mountain passes that indicate potential routes for exploration and trade? The scientific and technological questions both influence and are influenced by the artistic
questions. In particular, engravers develop specific artistic techniques for representations that were essential for ships’ captains, navigators, and merchants who used published maps to sail often unfamiliar and dangerous waters in South America, Asia, and the Pacific (see Appendix D). Maps therefore negotiate among art, science, trade, and politics, and determining the principles that allows researchers to distinguish among different maps and mapmakers will aid scholars working in the history of science and cartography, art, literary studies, colonial history, and economic history.

For 19th- and 20th-century quilts, artistic questions include: What are the distinct characteristics of an individual quiltmaker or relevant quiltmaking group’s choices of pattern selection, fabric and color choices, execution of measurement, layout, needlework and craftsmanship of the pattern design, and, most interestingly, original deviations from traditional patterns? Published quilt patterns became much more common starting in the late 1800s, when certain pattern designers mass-produced their patterns and disseminated them through ladies magazines, and later in syndicated newspaper columns. Geographically dispersed quiltmakers who were exposed to this media began gaining new patterns and pattern ideas. Thus, in a large test bed of documented historic quilts, the societal rise and influence of mass media should be seen through the proliferation of quilts that execute patterns disseminated through syndicated columns. The scientific questions include: Can the quilts created by quiltmakers from a cloistered family, community, ethnic, or religious group at a particular time period be differentiated from those of other communities, especially those more exposed to mass media? If so, can changes in the community’s participation in mass culture be found through changes in quiltmaking styles? Can a resurgence or interest in a particular historic cultural community’s quiltmaking styles be found in quiltmaking a century later? To what extent are quilts made by one Amish family in the 19th century similar or dissimilar to those made by urban quilters in the same time period? Does this change over time? Or, from an even more fine-grained perspective, do we find more or less divergence in quilts from the North and from the South? To what extent are quilt patterns regional and to what extent national? Does this change over time? A major theme in American cultural history is the eclipse of regional cultural differences during the 20th-century. Can we test that hypothesis by looking at quilts? Can we use the Quilt Index dataset to measure the impact of traumatic historical events—say 9/11 or Pearl Harbor—on American culture? Do we see a measurable change in imagery, colors, or composition after such events?
While identifying distinct characteristics of artists is time-consuming, computer-assisted techniques can help humanists discover salient characteristics and increase the reliability of those findings over a large-volume corpus of digitized images. Computer-assisted techniques can provide an initial bridge from the low-level image units, such as color of pixels, to higher-level semantic concepts such as brush strokes, compositions or quilt patterns. The technological questions are related to the design of algorithms that can extract evidence at the low-level image units that could be aggregated into higher-level semantic concepts and support humanists in image understanding and authorship assignment. The further technological questions are about the statistical confidence of authorship hypotheses obtained by processing volumes of images that could not have been visually inspected with the current human resources within a reasonable time frame. How to extract knowledge about authorship and how to increase our confidence in the characteristics of authorship are the key technological questions.

The Digging into Image Data is to Answer Authorship Related Questions (Dean Rehberger and Wayne Dyksen, Michigan State University, NEH; Peter Bajcsy, University of Illinois at Urbana-Champaign, NSF; Peter Ainsworth, University of Sheffield, JISC). This project will take three specific resources (manuscripts, maps and quilts) and develop tools to analyse and identify authorship of visual images.