The Art of Extraction:
Toward a Cultural History and Aesthetics
of XML and Database-Driven Web Sites

Alan Liu, English Dept., UC Santa Barbara


<TALK> [note 1]


In a book titled The Laws of Cool: The Culture of Information (in press at Stanford UP), I explore the cultural history and aesthetics of information "cool," the "ordinary" aesthetics–as I conceive it–not just of the mass consumer in the media age but, on the production side, of the cubicle worker of the new media age. "Why is the ordinary 'knowledge worker' today so desperate to be cool," I ask, "and how does cool–in its design, aesthetics, and politics–work?"

Anthology of Poetry with Patchwork Quilt on CoverMy next book project, just started and tentatively titled Patchworks: New Arts and New Humanities in the Information Age explores by contrast extraordinary information arts that "go outside the box" or (more exactly) "cubicle." Of course, in exploring avant-garde information arts, I am joined by many in this room. My niche, as I currently plan it, is to focus in particular on the relation between the new media arts and experimental methods in the contemporary humanities (including the so-called "new literary history," a mode of information retrieval, circulation, and presentation that is conceptually homologous with recent information technology even though its preferred trope is the very old tech of "patchwork quilting"–as in the many new literary anthologies showing quilts on their covers).

My paper today is early work toward a chapter of Patchworks. The chapter explores an idea that has emerged from the practical digital work I have been involved with here at UCSB in recent years–in particular, various database-to-Web projects and, more recently, XML projects. Among these are the following, in chronological order:





<ARGUMENT Title="The Blind Spot on the Page">

What this practical digital work has taught me is the importance of a leading emphasis or hot spot in contemporary information technology that is very advanced in the business world but that most university humanists and artists either do not see at all or do not yet recognize at the strategic level for what it really is: the contemporary pressure point of a much longer-standing, broader social phenomenon soliciting the most serious humanistic and artistic interpretation. The IT emphasis I refer to appears most concretely in what might be called (in Lyotard's term for the contemporary sublime) the "unpresentable" spot on a Web page where content pours through a so-called "data island" in the interface code from transcendental sources in the background–whether databases or XML documents. (Databases and XML, indeed, are increasingly convertible with each other as transcendental or deus ex machina data sources. The latest versions of Microsoft's Access or SQL Server, for example, export/import seamlessly with XML; and XML itself is developing in the direction of "XML-native databases" and an "XML Query" language that allow it to act much in the manner of "structured query language" [i.e., SQL] databases.

        <DEMO>Here are some examples of data islands (or their conceptual equivalents):


In short, the "interface" as we know it increasingly surrenders its soul to data islands that throw transcendental information onto the manifest page, but in a manner that is also manifestly different from the "thrownness" (in Heidegger's phrase) or, more simply, rendered "thereness" of the rest of the page–a complex phenomenology of dissonance that appears visibly to the user, for example, in the tell-tale way that data-island pages normally eschew "cool" Web design (about which more below) in favor of regular, minimalist, or ultimately Modernist page layouts with simple geometries. This is what Lev Manovich calls the "Bauhaus filter" that is surprisingly prevalent in contemporary information aesthetics. [note 4] It is only a simpler list or table structure, for instance, that can easily accommodate the serial repetition of a variable number of structurally-similar items–i.e., the kind of items thrown forth from databases or XML documents that are like volcanoes able to hurl forth only identically-shaped rocks. [note 5]



Now we can name what I called the emphasis or hot spot in current IT development that data islands emblematize. This is the emphasis on the technology of "extracting" content from presentation. Or, rather, the term "technology"–along with its whole complement of undecidably objective/social complements ("technique," "procedure," "protocol," "routine," "practice," etc.)–is too narrow. What we are really talking about is an ideology of strict division between content and presentation: the very religion, as it were, of both text-encoding and databases. According to the religion, true content abides in a transcendental noumen so completely structured and described that it is in and of itself inutterable. Content may be revealed only through a logos of presentation that is purely interfacial rather than, as it were, sacramental–i.e., not consubstantial with the noumenal. Unless content is "hacked," therefore (which is how our most extreme protesting reformers of information technology today attempt to transcend the interfacial altogether to experience direct revelation), it is to be rendered only through GUIs that are defined as ipso facto superficial rather than (in the original, Orthodox rather than Apple or Microsoft sense) iconic. Unlike an Orthodox icon that embodies inextricably in its beaten gold the very particles of transcendence, in other words, our interfaces today are ever more transparently just "skins" or, put technically, "templates," "schemas," "style sheets," etc., designed to be extricable.

Behold, then: there is now a great blind spot on the Web page that authors, artists, and designers of the interface no longer control but can only parameterize. In an earlier time, this spot through which data floods from transcendental sources would have been called the sublime. Even earlier in the history of transcendence, it was God (which is the license for my religious analogy above). But now we pray in SQL or XML. Not "Our father which art in heaven, . . . Give us this day our daily bread," but instead the "select" statement that is the soul of data islands–e.g., "SELECT * FROM Artists ORDER BY LastName, FirstName, Dates, Nation" or "<xsl:value-of select="LEXIA_TITLE"/>." Not "give us," in other words, but "select from"; not the Lord's Prayer, but our great contemporary prayer, the "query."



<ARGUMENT Title="The Building Bricks of Data Extraction">

This paper points the way toward an examination of "the blind spot on the page" from two perspectives. The first is the cultural history of data extraction. Much of my earlier research, beginning with my book on Wordsworth: The Sense of History (1989) and extending to my theoretical essays in the 1990s, was devoted to the methodology of cultural history; and such history remains a focus in my work on information culture. My essential question in the present regard is this: what is the cultural history of the separation of data content from presentation? Building on a line of thought I first tried out in a talk at the 2001 ACH-ALLC conference (supplemented after that event by correspondence with Wendell Piez, a humanities-trained scholar now working in the private sector as a consultant and developer of electronic text systems, with a special interest in XML), I would like to suggest the following prospectus for a thesis.

The thesis is that the contemporary logic of data extraction dates back to early industrialism in the mold of John Hall and Frederick Winslow Taylor. As Piez has taught me, Hall's "interchangeable part" manufacturing process of the 1820s and '30s (in his Harper's Ferry Rifle Works) was the predecessor to the logic of separating content from presentation that ultimately triggered–not so much databases and XML–as the exact social and economic need for databases and XML. "At his Rifle Works," Piez notes,

Hall developed a system by which guns could be made without the hand-crafting traditionally required of them. . . . Instead, the parts were all made to more-than-humanly possible close tolerances by machine, and then assembled not by piece, but by type. That is, any barrel could fit on any stock, with any receiver, any lock, etc. This required a rigid adherence to standards, enforced by the use of machine tools fitted with jigs, and by a careful regimen of testing with gauges. [note 6]

Indeed, these latter "gauges" designed to test finished guns were the paradigm of the new system. As Piez argues, the real proof of quality in Hall's manufacture of a gun was not that the gun fired but that its parts–tested separately in disassembled form–fit against the gauges, which thus became the "Platonic form" of the gun. In the language of XML rather than of Plato, that is, the gauges were the equivalent of a DTD (Document Type Definition) or, better, Schema used to "validate" the particular "instance" of an XML document against strict standards of complete, consistent, and lawful data structure. "Shades of text-encoding, anyone?" Piez asks.

My own addition to this argument extends the schema, as it were, from Hall to Taylor, but at a level of specificity that may supply new insights to our now standard accounts of Taylorism. From the retrospect of the information age, after all, we can see that Taylor's "scientific management" added to Hall's production model precisely the management model necessary to found our own so-called postindustrial "knowledge work"–the kind of work, in other words, that requires databases and XML. In particular, it was Taylorism (and its white-collar adjunct, the "scientific office management" of William Henry Leffingwell) that created what might be hypothesized to be the first economically and socially significant form of "programming." [note 7] I refer specifically to the system of distributed "functional management," mediated through "instruction cards," that Taylor described as early as his Shop Management (1903). A good exemplum–one of Taylor's own favorite illustrations–is bricklaying. [note 8] Once, we know, workers built a wall by deciding ad hoc or by custom how many bricks to cart over, how close to place the pile, how many bricks to lift at one time, how to tamp the bricks down, etc. But post-Taylor, such decisions were extracted from the embodied work of the laborer and described on instruction cards as procedures that could be optimized, reprogrammed, distributed, and otherwise mediated." Work thereafter became the structured, modular, and algorithmically manageable process by which, as it were, each individual <BRICK> was nested within <WALL>. That is, each "node" or "field" in the work process (in XML-speak and database-speak, respectively) became part of a programmatic description of wall-building that allowed the "content" (e.g., actual bricks) to be separated from the "presentation" of the actual wall. Previously consigned to the "craft" or "habit" of the individual worker in his/her social habitus, presentation now became the instantiation of a Platonic schema of "wall" programmed by ever more remote, multiple, and distributed "functional managers." (It is symptomatic, we may thus say, that the software client through which one today administrates a Microsoft SQL Server database is named "Enterprise Manager." Databases and XML are now our ultimate "functional managers." They are the automatically distributed mediators of the bricks of contemporary knowledge-work.)

The upshot of such a cultural history of databases and XML is that the normal criteria by which we now legitimate data extraction–e.g., the mantra of standardization, functionality, interoperability, and, of course, B2B "services" (XML, apparently, is the final language of man and the first tongue of angels, aka servers)–are relatively shallow. Such legitimations rest upon the older, deeper principle of managed knowledge and managed work. Let us be clear, in other words: the separation of content from presentation now being mandated by business-oriented information technology is a euphemism. From a historical perspective, "knowledge" (the great "content" of postindustrial business) is being extracted from what "presentation" really means: labor. What Marx called "surplus" labor value is in the post-Marxist, postindustrial world nothing other than the programmability of work–a programmability that can be "functionally managed," extracted, optimized, and distributed (e.g., licensed to other companies or the end-user) for what in a classically Marxist view, is excess gain. Any theory of the extraction of content from presentation, that is, must at some point take account of the very theory of extraction that Marx suggested in his 1841 manuscripts: alienated labor. Of course, Marx on alienation (especially in those sketchy, early manuscripts) cannot be taken as our final word. There is also the entire, contemporary infrastructure and superstructure of "networking" to consider, according to which the processes of extraction, circulation, and distribution morph the concept of alienation (and the early-industrial context of commodification that Marx addressed) into what Manuel Castells calls "networked society" with its multiple and disputed modes of extraction, circulation, and distribution extending classical commodification into uncharted territory (Napster being the poster child of such morphed commodification). [note 9]

In sum, a historically deep inquiry into contemporary information technology would ask not just what IT is for (speed, flexibility, standardization, etc.) but, as it were, the purpose of that purpose: the foundations of the extractability and programmability that today facilitate networked production and consumption.



<ARGUMENT Title="The Aesthetics of Data Extraction">
        (Note: Most of this portion of the talk was delivered extemperaneously.)

The other perspective on the "blind spot on the page" that I would like to offer, though in even more prospective form here, is that of the aesthetics of data extraction, which in the end is what my cultural-historical approach to information technology takes as its goal. Here, then, is a slender prospectus, which I will make depend on a specious chart–a chart of aesthetics that is wrong insofar as it suggests, in the mode of all such charts, that there can actually be a clear, consistent, and stable chart. The purpose of this overview is merely a way to begin thinking about the contemporary situation of aesthetics in the information age:

Chart of Aesthetics in the Information Age        <DEMO>[Click on chart for larger view]


Let me introduce as an afterword a brief review of past and recent aesthetics of transcendental data–i.e., of any data that cannot be directly interfaced or "imaged" (in Kant's use of the term "imagination" in his discussion of the sublime in the Critique of Judgement) but that can only be known through the grace of a transcendental data pour:

In the early years of the Web, when data transcendence occurred in piecemeal ways that could still be manually supervised and activated (e.g., hypertext links invoking cgi scripts) or in automated ways that were still relatively crude (e.g., "push" processes), "cool" was the dominant aesthetic of the interface. "Cool" as rendered by HTML, we may say, was the barbarian in the church of the separation of content from presentation. It was the secret adherent of non-standard, proprietary, hand-coded, and other clearly infidel practices of embodying content inextricably in presentation (e.g., pages with "layers" that work only in a particular browser, pages with fixed-width tables sized exactly to match a particular graphic image, etc.). But in today's world of massive, automatic data-pours through untouchable "data islands" embedded within retro-Modern rather than cool formalisms (i.e., the regularism of the lists and tables I mentioned earlier), what can still be cool? Or was cool and its ostensible avant-gardism ever an adequate response to the cultural history of extraction?

<DESIGN_SPECULATIONS>Cool things that might be done with data pours:





1 As an experiment in allowing code to leak though into manifest presentation, I use an incomplete, minimal set of XML tags to mark out the sections of this paper. This experiment sympathizes with the notion of artistic "codework" as Rita Raley develops it. See her "Interferences: Elements of Style at the Interface," talk delivered at The Digital Cultures Project/Microcosms "Interfacing Knowledges" conference, Univ. of California, Santa Barbara, 9 Mar. 2002.

2 Eric Feay, an undergraduate in the UCSB English Department, created the graphic design for the site.

3 The general purpose of my Tracker project is to explore how the West Coast digital humanities and arts academic community–many of whose members (e.g., in the University of California's The Digital Cultures Project and DARNet [Digital Arts Research Network]) have been working with databases–might join up with the major East Coast SGML (and increasingly also XML) text-encoding humanities initiatives (e.g., at the University of Virginia's Institute for Advanced Technology in the Humanities or Brown University's Women Writer's Project). XML seems fated to be the glue, splice, or–to appropriate a term from relational databases–"join" between the text-encoding and database worlds, academic and otherwise.

4 Lev Manovich, "From Cultural Interfaces to Info-Aesthetics (Or: from Myst to OS X)," talk delivered at The Digital Cultures Project/Microcosms "Interfacing Knowledges" conference, Univ. of California, Santa Barbara, 10 Mar. 2002.

5 In developing this line of argument about the phenomenality of transcendental data sources, I am indebted to Jennifer Jones for an incisive question at an early point in my writing of this talk: how do end users, as opposed to the originating designers or programmers who work with the underlying source code, perceive such transcendence as differentiated from any other content that is rendered on a Web page? The presence of a "data island," after all, is not explicit unless one looks at the source code; and in the case of database-driven Web pages even the source or underlying HTML code that a user might see by using the "View Source" command in a browser is screened from view. (The apparent HTML code that produces the Web page a user sees is generated on-the-fly by the real code in the background, which includes scripting or other algorithmic processes designed to write content into HTML.)

6 Wendell Piez, e-mail to the author, 20 June 2001. Subsequent citations of Piez reference this letter.

7 One could, of course, go back to Charles Babbage's Analytical Engine in the earlier nineteenth century to locate the origin of programmable computing. But I am here concerned with the origin of the need for such programmability, which clearly was not in evidence in Babbage's time except in the limited form of Jacquard looms. For Leffingwell's adaptation of Taylorism to office work, see for example, William Henry Leffingwell and Edwin Marshall Robinson, Scientific Office Management (Chicago: A. W. Shaw, 1917).

8 Taylor defines the use of "instruction cards" by "functional managers" as follows in Shop Management, in Scientific Management, Comprising "Shop Management," "The Principles of Scientific Management," "Testimony Before the Special House Committee ( 1947; rpt. Westport, Conn.: Greenwood, 1972), pp. 102-103: "The 'instruction card,' as its name indicates, is the chief means employed by the planning department for instructing both the executive bosses and the men in all the details of their work. It tells them briefly the general and detail drawing to refer to, the piece number and the cost order number to charge the work to, the special jigs, fixtures, or tools to use, where to start each cut, the exact depth of each cut, and how many cuts to take, the speed and feed to be used for each cut, and the time within which each operation must be finished. It also informs them as to the piece rate, the differential rate, or the premium to be paid for completing the task within the specified time (according to the system employed); and further, when necessary, refers them by name to the man who will give them especial instructions. This instruction card is filled in by one or more members of the planning department, according to the nature and complication of the instructions, and bears the same relation to the planning room that the drawing does to the drafting room."

9 Manuel Castells, The Information Age: Economy, Society and Culture, 3 vols. (Malden, Mass.: Blackwell, 1996-97).

10 Jennifer Jones, "Virtual Sublime: Romantic Transcendence and the Real," dissertatation in progress, Univ. of California, Santa Barbara; Steven Johnson, Interface Culture: How New Technology Transforms the Way We Create and Communicate (San Francisco/New York: HarperCollins, 1997).

11 William Gibson, Neuromancer (New York: Ace Books, 1984).

12 Jean-François Lyotard, The Postmodern Condition: A Report on Knowledge, trans. Geoff Bennington and Brian Massumi (Minneapolis: Univ. of Minnesota Press, 1984).