intangitexts: Towards an Electronic Visualization of Mathematics (2004)

One of the major obstacles in understanding any complex system is a lack of understanding of overall system structure. This can easily be witnessed in any student of mathematics that does not understand large portions of theory because of a lack of understanding one crucial definition, theorem, or portion of a proof which fuels the theory^¹. This paper will propose an interactive computer-based technology, based on currently-existing software, that will aid in defeating these roadblocks by presenting “big picture” views of a subject while retaining the ability to focus on individual portions without losing the greater idea. We will focus on applying this methodology to higher mathematics throughout, and suggest the applicability of this software on other fields of study.

Not every mathematician I have spoken to has agreed that electronic pedagogical aids would be a good or useful thing. The majority have, however, primarily citing the vast quantity of information as daunting and its organization in most textbooks as somewhat unclear. Hence, if we are to construct a pedagogical tool, we wish it to convey the necessary information in smaller, more easily absorbed chunks, and to give more and more organized information than any individual textbook could. The hypertext structure is the most obvious first choice, and has been adopted by many thriving online sources (which will be reviewed below). Jerome McGann supports the use of hypertext:

... electronic tools in literary studies [or any other field] don't simply provide a new point of view on the materials, they lift one's general level of attention to a higher order. The difference between the codex and the electronic Oxford English Dictionary provides a simple but eloquent illustration of this. The electronic OED is a meta-book, i.e., it has consumed everything that the codex OED provides and reorganized it at a higher level. It is a research tool with greater powers of consciousness. As a result, the electronic OED can be read as a book or it can be used electronically. In the latter case it will generate readerly views of its information that cannot be had in the codex OED without unacceptable expenditures of time and labor.^²

We will desire a tool that can do the same as hypertext, but more. We wish to look “above” the material at hand, to, as the Square taken above Flatland by the Sphere^³, see mathematics resting below as a web of interconnected concepts, and discover from there.

Mathematics is primarily about structure – discovering structural information about abstract spaces^⁴ through the use of logical statements (and some cleverness) about the properties of each given space. Hence, a good strategy for teaching mathematics is to display the specific structures in question for a particular curriculum as often and with as varied a knowledge as possible. Rote memorization of facts (in this case, equations, units of measurement, important theorems, etc.) alone, as with any field, is not enough to convey the methods of the “expert”.

All too often, higher mathematics is taught in a rigid definition/theorem/proof format that, while conveying all the necessary information, rarely elucidates key structural concepts fueling the reason one is studying the topic in the first place. For example (as stated in a previous note), when speaking about continuity in a calculus course, the greater concept of continuity (dealing with abstract open sets as opposed to open intervals of the real line) is never touched. Hence, those moving from calculus to topology will struggle with the generalizations, even though there are elementary examples that may be given to quickly clear up the confusion at hand.^⁵

Returning to the notion of the “expert” mathematician: according to How People Learn,

Mathematics experts are also able to quickly recognize patterns of information, such as particular problem types that involve specific classes of mathematical solutions (Hinsley et al., 1977; Robinson and Hayes, 1978). ... The expert knowledge that underlies the ability to recognize problem types has been characterized as involving the development of organized conceptual structures, or schemas, that guide how problems are represented and understood (e.g., Glaser and Chi, 1988).

We should therefore attempt to build software that allows a novice to see the way experts organize their knowledge internally, and maintain a view of the subject as a whole, while still giving students a deep understanding of key concepts. Support for a change in current curricula is given later in the chapter:

The Third International Mathematics and Science Survey (TIMSS) (Schmidt et al., 1997) criticized curricula that were "a mile wide and an inch deep" and argued that this is much more of a problem in America than in most other countries. Research on expertise suggests that a superficial coverage of many topics in the domain may be a poor way to help students develop the competencies that will prepare them for future learning and work. The idea of helping students organize their knowledge also suggests that novices might benefit from models of how experts approach problem solving--especially if they then receive coaching in using similar strategies....^⁶

To begin to develop strategies for doing just this within the realm of computer-based technology, we will project the fundamental mathematical notions of theorem and proof into a structure media theorists and practitioners are fond of manipulating: space and time.

David Crystal's rather rigid “differences between speech and writing” differentiate between speech as a “time-bound, dynamic” medium and writing as a “space-bound, static, permanent”^⁷ medium. This dichotomy seems to apply all too well to the dichotomy of “theorem” and “proof” in mathematics; a theorem is a statement (static), whereas a proof of the theorem is a logical process – a means, via definitions (also static entities), previous theorems, and logic, to conclude the result of our statement. Thinking of the theorem/proof dichotomy in this way, we may consider finding a way to reconcile “speech” and “writing” as communications media, as also a way to reconcile thinking about theorems and proofs.^⁸

Theorems and their proofs are indivisible; to understand mathematics it is crucial that one can handle both sides of this “divide” simultaneously. Therefore, if we are to develop technologies to aid in mathematics education, it must not merely be text- and symbol-based as most of the currently available technologies. As Douglas Kellner states:

Within multimedia computerized culture, visual literacy takes on increased importance. On the whole, computer screens are more graphic, multi-sensory, and interactive than conventional print fields that disconcerted many of us when first confronted with the new environments. Icons, windows, mouses, and the various clicking, linking, and interaction involved in computer-mediated hypertext dictate new competencies and a dramatic expansion of literacy. Visuality is obviously crucial, compelling users to perceptively scrutinize visual fields, perceive and interact with icons and graphics, and use technical devices like a mouse to access the desired material and field. But tactility is also important, as individuals must learn navigational skills of how to proceed from one field and screen to another, how to negotiate hypertexts and links, and how to move from one program to another if one operates, as most now do, in a window-based computer environment.^⁹

Thus, like current children's math aids (which often take the approach of cute calculation-drill gaming), they must be more visual than merely mathematical symbols, and interactive.

How can we reconcile the visual with the temporal in conveying math to students? We may borrow (and possibly mangle, in some minds) two more important notions from media theory: narrative and montage.

A proof, as written on a page, is a collection of words, numbers, and other symbols – a space-filling entity, timeless. However, the human processing required to understand goes through not only the physical time to read it, but the “logical” time to mentally digest it. This notion of “double time” is reflected in Seymour Chatman's exposition on narrative:

A salient property of narrative is double time structuring. That is, all narratives, in whatever medium, combine the time sequence of plot events, the time of the historie (“story-time”) with the time of the presentation of those events in the text, which we call “discourse-time.” What is fundamental to narrative, regardless of medium, is that these two time orders are independent. ... This independence of discourse-time is precisely and only possible because of the subsumed story-time. Now of course all texts pass through time: it takes x number of hours to read an essay, a legal brief, or a sermon. But the internal structures of these non-narrative texts are not temporal but logical, so that their discourse-time is irrelevant, just as the viewing time of a painting is irrelevant. We may spend half an hour in front of a Titian, but the aesthetic effect is as if we were taking in the whole painting at a glance. In narratives, on the other hand, the dual time orders function independently. This is true of any medium... in theory, at least, any narrative can be actualized by any medium which can communicate the two time orders.^¹⁰

Chatman denies that a mathematical proof has narrative, instead considering it as having solely logical structure, and, like a Titian, proclaims its discourse-time (or viewing time) irrelevant. But discourse-time is precisely where human understanding occurs – this is the “mental time location” (if you pardon the mixing of psycho-temporal-spatial metaphors) of all peripheral thoughts of which, when the “Eureka!” finally hits, is where learning occurs. To discount discourse-time in favor of only “story-time” in a proof is missing the point of large proofs, which borrow from multiple fields of mathematics. We will stick to our mathematical guns and claim that logic may be considered a “time” apparatus, and so apply temporal properties to logical ones for the sake of expressing proof as narrative.

Depending on the amount of material coming into a proof, a proof's presentation may vary greatly from teacher to teacher (and student to student). Also, like in translating from one medium to another, there may be multiple ways to prove a certain theorem (a good example of the breadth of proofs for certain topics is the infinitude of primes). Hence, it may be prudent to develop a structure that allows the student to tease apart these two “time” structures – the “story-time”, or logical structure, of the proof, and the “discourse-time”, or “actual” time, the time it actually takes to digest the proof – by allowing the student to see the proof in a non-linear format (i.e. off the written page) and traverse outside of its logical structure.

Manovich refers to a dichotomy between narrative and database. The fact that there may be multiple ways to prove a theorem plays directly into his theory of hypernarrative:

The “user” of a narrative is traversing a database, following links between its records as established by the database's creator. An interactive narrative (which can be also called a hypernarrative in an analogy with hypertext) can then be understood as the sum of multiple trajectories throughout a database. A traditional linear narrative is one among many other possible trajectories.^¹¹

We may therefore consider a proof of a theorem to be a linear logical narrative. If we then consider a typical higher mathematics course, with its emphasis on building the “mental toolkit” necessary to prove a few important theorems, as a series of narratives (from definitions to supporting theorems to main theorems), then the mathematics curriculum as a whole is a collection of hypernarratives, integrating different fields of math by displaying the linkages via theorems and proof techniques. An expert mathematician is hence a master of mathematical hypernarrative, and we therefore desire interactive tools to aid in building a student's mastery of these hypernarratives.^¹²

Math as Computer Game, “Mathspace” as Playing Field; Montage of Concepts

automatically privileges certain types of narrative (myths, fairy tales, detective stories, classical Hollywood cinema).... Games structured around first-person navigation through space further challenge this narration-description opposition.^¹³

He proceeds to speak of games in terms of “narrative actions and exploration” – the player (student) acts and explores – directs their own track. As in the first-person games to which Manovich is referring, one cannot underestimate the differences between “search” and “browse” – discovery all too often occurs when randomly selecting where to go next. If this structure is in place in a data-controlled pedagogical environment, the exploration aspect may aid students in thinking about mathematics as a game; the mathematical concepts are the “space”, and the logical connections between concepts are what make the space navigable. In this sense, the individual tasks at hand would be to complete individual proof assignments; the goal of the math-game is to complete the course curriculum, i.e. understand the primary theorems and be able to engage in mathematical research in the field of study.^¹⁴

Before I referred to a math curriculum as a collection of hypernarratives. To examine a portion of the curriculum in our proposed system, one must see it – with one's eyes. There must be a visual element involved. When thinking about a single linear narrative, one may visualize a line with arrows going in one direction, with explanation boxes along it. (This should look like an elementary flowchart – one track only.) Following the boxes from left to right, up down, or whatever the chosen orientation, we discover the ideas in that narrative – from basic definitions through the theorem's hypotheses, to the conclusion. What if we switch to a hypernarrative? We are now in the realm of multiple curves, intersecting in different places, but all have the same beginning and ending. With a collection of hypernarratives, we now have a weblike structure^¹⁵, with many intersections, and many beginning and ending points. We have vertices and (directed) edges; a graph.

On this graph we see logical connections between key concepts – definitions, theorems, proof techniques. Taken as a picture, a birds-eye view of this “mathspace” appears as a montage of terminology and symbol; when traversing nodes, it becomes a navigable space; the student's navigator is logic. Hence, we do not draw connective edges between nodes unless there is a logical reason to do so – a definition is used in a theorem, a concept aids in building a proof technique, etc. These are mental connections (one might wish to draw a connection to neuroscience here, but I hesitate – I'll stay in the abstract) that a mathematician must make before presenting such material to students. It is precisely the quality of being able to visually present this material that we desire in our code.

An expert mathematician has discovered ways to think about proofs so that (like the particle-wave duality of light, to extend our space-time metaphor even further) they may switch from logical process (“time”) to discrete units of mathematical understanding (“space”) at will. The novice, however, still requires the full logical-temporal process to think about proofs. It is the task of the math professor, therefore, to develop this same instinct in students, to make the “mathspace” navigable, by understanding proof techniques and their applicability, overarching concepts in each field of study, and building these nodes and connections in the student's mind's eye.

In summary, an expert mathematician has learned how to mentally “spatialize” proofs (has played a large portion of the game) and can maneuver through mathspace at will; the novice still sees them as primarily temporal. The proposed software will aid the student in traveling through mathspace and develop proof intuitions via the available logical connections (and help create new nodes and connections) in the virtual visual environment of a graph.

Given the extraordinary amount of free, open source, and pay software available on the Internet for visualization and data management and display, there are a host of frameworks and existing knowledge bases to choose from. I will briefly describe the pros and cons of some of the big players in the mathematical community, and propose a method of combining the architectures of some big players, along with recent advances in browsing technology, in building the first iteration of the MathGraph.

In an attempt to construct the MathGraph, we need code and data, and we need them to work together. It should be reasonably clear that the mathematics community is full of support for free, open source, publicly available programming languages, operating systems, and knowledge bases for mathematics study. The web sites we will examine for code and data primarily house four different types of material: article, interactive, pedagogical and encyclopedic. I will briefly review some of the major sites, and describe why no individual site currently existing sufficiently meets the desired model.

There are a wide variety of centralized mathematics article archives online: Cornell University Library's arXiv.org e-Print archive and NEC Labs and Penn State's CiteSeer are two primary candidates for searching when looking for papers. These sites offer hosting and indexing of articles from a wide variety of mathematics and computer science fields, but little else. CiteSeer, as its name implies, displays links to papers cited by a paper, and links to papers citing the paper, on that paper's page. Other than this, no information (other than an abstract) about papers' contents is processed for pedagogical purposes. Therefore these sites will help to locate papers, but not to help students read them.

There are many sites that house interactive simulations of mathematical concepts (most run by programmers that coded these simulations as either explicit teaching tools for donation or as coding practice). ExploreLearning and International Education Software are two sites that offer applets for interactive learning experiences. These will not aid us in our current endeavor.

MIT's OpenCourseWare project is the prime representative of many sites that offer pre-packaged course curricula, lecture notes, problem sets, and the like for pedagogical uses. Ironically, this type of site offers us nothing novel for our project; we are looking for computer software architectures, not notes. Notes to be read are available aplenty; if we need to input data manually (which we will, but this is not the task at hand), we will have no shortage of paper material to read. Hence, we can ignore these sites for now. This leaves us with the last type of site; encyclopedias.

The encyclopedia is the reference type we desire; what we plan to do is code on top of an encyclopedic structure, to observe the graphlike qualities the data already contains. Two of the most important sites currently online that devote themselves to mathematics are MathWorld and PlanetMath. Wikipedia, although not solely based in mathematics, has a significant and well-written amount of material as well, and should be noted.

No list of mathematics sites (especially a math student's bookmarks) would be complete without mentioning MathWorld. The work of Eric Weisstein, MathWorld is part of the Wolfram family of products (including the incredibly popular, and expensive, symbolic manipulation software suite Mathematica). MathWorld was started as “Eric's Treasure Trove of Mathematics” (part of a larger collection of “Treasure Troves”, all of which are now hosted by Wolfram) by Weisstein as a graduate student in 1995, from notes he had collected beginning in the late 1980's. The sad copyright violation battle with CRC Press over the printing of a 1998 snapshot of what had just become Wolfram's MathWorld (in printed form, the CRC Concise Encyclopedia of Mathematics) can be read at Eric's Commentary page. This hard lesson in copyright and contract law caused Wolfram and Weisstein to settle out of court.^¹⁶

MathWorld is a wonderful reference resource, but is deficient for two main reasons: it is useful solely as a reference (no proofs, no pedagogical structure outside of its hypertext linking and indexing), and its copyright issues (all entries are copyrighted, and no copying, mirroring, etc. are allowed). The official author for every entry is Weisstein; an example citation, for the entry “State Diagrams” contributed by Ed Pegg, Jr. (username Pegg – contributor of over 100 entries) is:

This clearly does not seem fair to Pegg, to have contributed a sizable amount of work and not get due credit for it (to be fair, his username is linked to an index of all his entries, but an official citation is not too much to ask for a major contributor). Due to these legal constraints on the data, MathWorld is unsuitable as a launching point for our project.

Part of Weisstein's lawsuit process in 2000-2001 was an injunction against the MathWorld website, forcing it offline for over a year. The mathematics community at large was in an uproar, and Aaron Krowne, a student at Virginia Tech, decided to build a site which would replace, and could not suffer the same legal fate, as MathWorld (without extra information from either side, it was assumed at this point that MathWorld was dead). Krowne proceeded to design and code the Noosphere architecture, and implemented PlanetMath.

PlanetMath improved on the MathWorld concept in three crucial ways: first, while MathWorld allows anyone to contribute entries, all entries pass through Weisstein; in PlanetMath, anyone is to allowed to submit and maintain their own entries. Not only to they maintain the data, but, as a user they have agreed to copyleft their entry under the GNU Free Documentation License, which allows anyone to use it under fair use laws. Second, PlanetMath has implemented LaTeX^¹⁷ support for its entries, which makes entries in a math encyclopedia much more readable, and automatic linking, which places the burden of building every internal link you would want on your page on the server instead of the entry author. Third, PlanetMath is a community of mathematics students around the globe – among the communal activities are discussion boards where problems are discussed, code is contributed by volunteers, and a free encyclopedia of PlanetMath's own is being compiled from PlanetMath's entries (in PDF format). While the third quality is a testament to “great minds thinking alike”, the first two qualities – data and code supply and legalities – make PlanetMath very desirable as a starting point for our new software. There is still one drawback to PlanetMath – a lack of proofs. While PlanetMath actually has many proofs, they are not of sufficient number to aid us with that particular endeavor. However, the fact that PlanetMath's entries are in LaTeX, and that regular snapshots of the entire site are available for download, somewhat make up for these data shortcomings (at least in the beginning, when building the database).

Wikipedia and its sister sites, Wikibooks and Wikiversity, are quickly building large amounts of data to rival MIT's OpenCourseWare, MathWorld, and PlanetMath. Since Wikipedia relies on the Wiki medium and ethic (GPL codebase, all users may input and edit any entry, all entries are GFDL copylefted), it has many qualities that make it worth keeping an eye on for possible future use because of the sheer amount of data available for mining.

It should be noted that in all of these sites, very little is done with the structure of the proof as a teaching aid. The Metamath Proof Explorer exists precisely to break down proofs; however, Metamath is a language for automated theorem proving, which makes the site more of a curiosity for most mathematicians; the proofs are, for pedagogical purposes, ridiculously unreadable (since they require examining every step in every axiom and proof at the basic logical level – mathematicians build symbolic machinery precisely to avoid this approach). So, extremes abound online – either very little is done with proof, or so much is done that it is barely human-readable.

The Semantic Web, an upcoming Internet data framework which will ideally allow greater machine understanding of basic semantic structures, relies on “semantic” links between objects using the Resource Definition Framework (RDF). I call these links (like “is a member of” and “is a book by” – usually a set inclusion semantic statement) “light” semantic links, since they relay only one piece of information per link between two objects, the data type of which is codified in the XML Document Type Definition (DTD). This is a framework for machine-based data processing that goes pretty easy (comparatively) on the human eyes if viewed directly. I am interested in what I call “heavy” semantic links; if we imagine a link like “(Book) is on shelf (shelf number)” as a “light” link, a “heavy” link might be “(Book) was (author)'s first bestseller, which put him to #1 on the (New York Times Bestseller List)”. Edward Tufte would refer to this as a “higher resolution” of information – although for individual instances, sentences would make more sense. When many of these vertices and edges are linked together with heavy links, we begin to see the power of the medium. Heavy links may not be machine-readable, but they are human-readable, and this is precisely what we are trying to achieve. Light links would not be discouraged (in fact, they will aid greatly in linking concepts that aren't as easy to link “heavily”), but this is why we are not currently seeking a Semantic Web solution to our problem.

We desire the ability to examine the linear structure of a logical argument, but retain the inherent non-linearity and multi-linearity of hypertext. The graph allows both of these things to appear at once. Randy Bass' “novice in the archive”^¹⁸ becomes a traveler on the MathGraph – students can traverse long webs of material they haven't seen, and visually – immediately – see connections in proofs.

The TouchGraph WikiBrowser is the proposed graphing solution to the problem of writing or finding open source code to mold into our architecture. One may examine a demo on TouchGraph's website; any more detail on the software itself would delve into implementation issues, which are not the topic of this paper. I will simply state that what is currently implemented is not enough to handle the notion of heavy links, and some additions will have to be made.

As more data is input into the MathGraph, professors (who don't know everything) may learn as well from the process. It may display some novel ways to build curricula – a different approach to proving a theorem that might aid pedagogy, or a different route altogether to achieve the proof of a fundamental theorem. The possibilities, as an optimist might say, are endless. These possibilities, however, rely on the material actually existing in the MathGraph's underlying database... an input task for the mathematical community at large.

It should be strongly emphasized that the MathGraph architecture is just that – an architecture, built upon other architectures. This software may be applied to any field of study (or, for that matter, any set of data at all) that may be laid out in such a fashion (keeping in mind the concerns of the previous section), as long as people are willing and able to convert existing, or manually input non-existing, data into the correct digital electronic format, and double-check the correctness of each piece of data. After that, rudimentary connections may be made by the Noosphere code, and the process of data massaging will continue from there. From here on we will refer to an instance of the MathGraph system as an ArchiveGraph.

On a personal note, I found myself almost drawing a chart, precisely like what I am proposing an ArchiveGraph generate automatically, to present Jeff Pack's Growing Up Digerate^¹⁹. It would be a great aid to all that read hypertext sites to be able to orient oneself in the uncharted web of links that is the Internet. The ArchiveWeb will generate a “map”, not a “site map”, which is actually just a list of links (an index). Text uses an index; hypertext requires an index graph. Much of the data is already collected to generate these graphs (Google's spidering technology relies on links, linkbacks, and clickthroughs); we require software to make this data graphical – useful in a new way.^²⁰ An ArchiveGraph should be able to generate (allowing processing time, of course) a graph of a website's links.

What this ArchiveGraph – index graph – can do for teaching is display highlights of large portions of a subject all at once (in a graphical format, instead of a list) – this allows the user to simultaneously see loose connections and focus on the linearity inherent in certain substructures of the entire map. Randy Bass' “novice in the archive” has a new browsing tool. Without something to help look “above” a website, you can plan mini-narratives ahead of time (active learning); without this aid, you may simply get lost in the web of links, losing track of your train of thought.

If we wish to generalize the Theorem/Proof dichotomy we set up earlier, simply refer to the pairing as Statement/Support. In this way any name, event, etc. can be used as a label, and other data may be given to support that data by way of linking to other nodes and labeling those graph edges. Some quick examples are: Person/Biography (with links to other people, places, etc.), Place/Timeline of Events (with links to those events, other places, people, etc.), Dish/Recipe (with links to other dishes, base ingredients, basic cooking concepts, etc.).

Six Degrees of Kevin Bacon is a party game which is played on a large unseen graph, of (light) semantic links between movie actors and the movies in which they have acted, in the minds of the players – link Kevin Bacon to any selected actor via 6 or less movie links (shorter chains are better). With the help of the Internet Movie Database (IMDb)'s massive movie database, this game can probably be considered “solved” by the University of Virginia's Computer Science Department's Oracle of Bacon ²¹. This instance, viewed in a larger context, is a restriction to a very small subgraph of the graph of all human associations used in the concept of Six Degrees of Separation.

Wikipedia has its own version of Six Degrees, but with Wikipedia articles, called Six Degrees of Wikipedia. This, like SDoKB, is done on an unseen (unless the WikiBrowser is used...) graph of links (still light links), but unlike SDoKB, is done with directed edges (SDoKB links actors to each other through movies, but page A may link to page B which does not link back). The fun is in linking seemingly random concepts with as short a chain as possible.

In the MathGraph, proving theorems is the act of developing heavy links to join two concepts logically (the hypothesis and the conclusion, with direction as logical implication). When many links are present, it is a matter of the same type of graph-traversing game to make proofs shorter, or to find interesting commonalities between seemingly different concepts.

My friends from high school had a game we called “Neural Pinball” – the activity was simply to reference something one knew the others knew (a movie quote, a line from a book we'd all read, etc.), and watch everyone squirm as they tried to discover (vocally) the source of the material. The more obscure, the better; when a good reference was selected, this game required the utmost concentration to traverse the nodes and edges of our internal concept association graphs. It was quite interesting to find out what links others had to connect one notion to another (their own “logical narratives”, to borrow my vocabulary abuse from the previous section) – adding to each others' association matrices. It is this idea that I hope the MathGraph, and the WebGraph in general, can do for the larger educational community. Software exists to make light links – but with heavy links we can see much more of the “why” under each connection – and watch as new logical narratives unfold.

There are inherent constraints^²² to this technology, as clearly there are with any new communications and learning medium. First, there are issues in each field that may be difficult or impossible to translate to this format. Next, there is the concern in every field that too much data would be displayed, paralyzing the user. Another concern is that, if the software in fact succeeds, that students may come to rely on it too much – in effect, not learning, but finding a new way to “copy from the book”. We will briefly touch on each of these notions.

Some subjects, say, hard sciences, linguistics, anything that lends itself well to a structuralist approach, may benefit from this tool. However, professors that do not tend to think in these ways would find the ArchiveGraph more trouble than it is worth. This is a matter of personal investigation, and will not be touched on here. However, a glance at the breadth and depth of the data and links in Wikipedia shows that this may be a useful tool, even if used is to merely display “light” semantic links (such as “referred to in” and “referred from”). I restate: do not underestimate the power of “browse” in discovering interesting things to research.

There should clearly be concerns about too much information being displayed in one of these graphs. In a complete graph (one where every vertex is connected to every other vertex via en edge) on n vertices, there are 1+2+...+n = n(n-1)/2 edges^²³ (for example, a complete graph on 5 vertices has 4*5/2 = 10 edges. Likewise, a complete graph on 10 vertices has 9*10/2 = 45 edges. Assuming that each node will be roughly attached to the same number of nodes as other nodes (say each node has about k < n-1 edges), we have a quadratic increase of data. This would very quickly restrict any learning and lead to confusion. Hence, there must be some sort of filtering mechanism in place to reduce the number of results.

As far as a pedagogical approach is concerned, a particular graph should not contain a very large amount of data in the first place. As time goes on and students learn more about the field(s) in question, more data may be added, and a search tool would aid in finding starting and finishing places to work. Other than that, it is the traversing of the web to make it from start to finish that makes an ArchiveGraph lesson plan work.

If the architecture succeeds, it may succeed “too well” – students may rely too heavily on it to do their homework, and not actually retain the material as a result. This is an overly optimistic view from a programmer's perspective, but a dismal one from the teacher's (unless, of course, the teacher happens to be using the ArchiveGraph as their only source for course notes, which would theoretically be possible).

Students must maintain the notion that this system is just another tool, a supplement, and that it does not hold all the answers they need. There is a bit of counter-programming here, in that ideally one would attempt to build as robust a system as possible, and trying to say that it is not a fix for all pedagogical problems may confuse some.

Roy Rosenzweig displays concern about digital media's efficacy in the long term due to rapid hardware and software changes: “The life expectancy of digital media may be as little as ten years, but very few hardware platforms or software programs last that long.”^²⁴ A concern about the staying power of this system may be that it relies on too many different programming languages and architectures – this could be problematic to system maintenance in that administrators would need to potentially have knowledge of many different computer systems. One potential instance of a MathGraph could use all of the following (and possibly more):

Luckily for all involved, every one of these systems is open source and covered (with the exception of Java^²⁵) by the GNU Public License (GPL) or a similar free-fair-use, open-source license, with all documentation (including the data manually input by hundreds of volunteer contributors) covered by the GNU Free Documentation License (GFDL). What this means is that the construction and maintenance of a MathGraph or similar ArchiveGraph is not a one-person job (unless that person is, say, a full-time paid support for the project, and it is a labor of love, which it should be).

The notion of an index graph is not new (see TouchGraph, ThinkMap, Visual Thesaurus, “Visualize the Wiki”, and “Graph Structure of Concepts” for some examples). In fact, during the research for this paper I came across one site that comes close to doing what I propose for the MathGraph – the Cambridge University's Maths Thesaurus: Connecting Mathematics. This could be considered close to (but not quite) a first stage of the MathGraph concept; however, as with Wikipedia and all other sources addressed, the Thesaurus does not have proofs integrated into the data structures themselves – they instead rely on “light” semantic links. While the amount of data and linkages involved are impressive, the key concept of the proof as part of the architecture is still not present (no proofs could be found in the Thesaurus at all; it is, after all, a thesaurus). There is room for improvement – not just for math, but for information presentation in general. The implementation of such an improvement, though, is the focus of our next course.

1For example, the thought that the definition of “continuous function” as restricted to complete metric spaces (in the typical e-d fashion used in the field of analysis) denies all the portions of the field of topology (which independently leads up to a study of metric spaces; in analysis they are given).

4In the Zermelo-Frankel set theoric framework (which, with the Axiom of Choice, forms the ZFC Axioms, the primary framework for modern mathematical research), the term “space” is used in analysis and topology to describe a set with a certain property or function (such as completeness, a metric, or an inner product) associated with it. In algebra, the term used is simply “algebraic structure”.

5To aid the confusion of this limiting approach, I suggest a method of building examples called “Intuition vs. Pathology” – when presenting a new definition or important theorem, present at least one simple example (the intuition) and one exceedingly difficult example (the pathology) which can be seen to fit within the basic definitions. But this is a pedagogical technique for another paper.

8This method of examining logic as a “temporal” concept occurs, in my personal experience as a student and a teacher of math, most strongly when discovering the notion of logical paradox. For example, try to remember the first time you heard the Barber Paradox (a representation of Russell's Paradox of set theory) – if a barber shaves only everyone in town that does not shave themselves, does he shave himself? Resolving the question first becomes in one's mind an oscillation – a temporal concept – between “yes” and “no” before one decides it is an impossible task and labels the situation a logical paradox.

12So that we agree that a proof is in fact a “narrative” under this framework, we can check Mieke Bal's definition as given by Manovich: it contains actors (sets, elements, functions, whatever is necessary for the proof – generally, you can simply declare they exist!) and a narrator (the mathematician “telling” the proof); the text (the actual words and symbols making up the proof), story (the proof's logical structure), and the fabula (the specific logical steps taken to follow through the proof – what logically “happens to” or is “done by” the actors). This last one is a little sticky – as the story is the logical structure, and we are considering “logic” as a “temporal” construct, “events” “occur” to the actors in a logical sense (in that the qualities of said actors are unfolding before the student's eyes, not that, say, a function “does” something). So, as long as we maintain the “temporality” of logic, we maintain the argument at hand.

14To be fair, a student of mathematics may easily think of math as a game with pencil and paper. This paper is positing creating a computer-based medium (that may be used as a navigable space for a computer game) to teach math. Dry at this point in the paper, perhaps (well, probably dry to the reader always, but that's beside the point), but the graphical ideas will be developed.

15This is reminiscent of Scott McCloud's musings about time and “reading direction” in Understanding Comics, notably comics with multiple threads (like a comic version of a Choose Your Own Adventure book) – many paths can be on one page at once. He states (without the aid of his drawings, or his constant boldface): “In comics, as in film, television and 'real life', it is always NOW. ... Wherever your eyes are focused [which panel in the comic], that's NOW. But at the same time your eyes take in the surrounding landscape of past and future! ... But eyes, like storms, can change direction! Yet we seldom do change direction, except to re-read or review passages. It's left-to-right, up-to-down, page after page. The idea that the reader might choose a direction is still considered exotic.” This is precisely what I am seeking: the ability for a math student to be able to instantly choose a new direction – follow a new logical thread – without breaking the general hypernarrative of a specific field. The key notion is that this is possible in the student's current visual field, without having to flip paper pages, go to another book, or even click to a different web page.

16Wolfram and Weisstein's settlement with CRC Press includes a recurring payment for what CRC considers loss of sales: “In addition to its 'instant win,' CRC will be paid annually for books it doesn't sell, according to a formula that both sides have accepted – although we continue to believe that any past or future failure to achieve projected sales is far more plausibly attributed to CRC's abysmal marketing efforts than to any abuse of the website by people who want to have and hold snapshots of its contents. But in this life we do what we have to do--and what we are willing to do.”

17LaTeX is a typesetting language designed by Donald Knuth, considered the grandfather of computer programming, as a way for technical papers to be written with large amounts of mathematical symbols on a computer, while formatting the text nicely. It is often referred to as the “lingua franca of the worldwide mathematics community”, although it could easily be used for any word processing (note: LaTeX itself is not a word processing program).

20Could this graphing structure be a step towards a “better” version of Claude Lévi-Strauss' index card-writing process from “The Structural Study of Myth”? Only the Structuralists know for sure.

21The Oracle of Bacon is just one instance of UVA CS's greater program, Star Links, which finds the shortest path on the IMDb graph between any two given actors. One may refer to an actor's “Bacon number” as the number of links from him/her to Bacon (the minimum number is 1, in the same movie; the current known maximum is 8). There is a variant of this game for mathematician's – the Erdös number – linking researchers to renowned number theorist and world traveler Paul Erdös via research colleagues or personal contact. Having a low Erdös number holds some cachet in the math community, although many think it a silly idea.

22Manovich parenthetically states in the “What is Cinema?” chapter of The Language of New Media, “I am purposefully using this more technical term ['constraints'] instead of the more ideologically loaded 'limitations'” - a subtle notion that I, as a technophile (ideologue, I suppose), hadn't thought about, even though its use could be construed by an ideologue as inherently technophobic. I have edited my original statement to reflect Manovich's sentiment.

23Tufte (The Cognitive Style of Powerpoint, p. 20) claims that “a table with n cells yields n(n-1)/2 pairwise comparisons of cell entries,” i.e. that a table (with one data column, he should mention) is just as good as a complete graph on n vertices. This may be true for numerical comparisons (where each number is labeled with a name – his “n cell” claim is misleading), but when we are examining semantic connections between words and arbitrary symbols – away from quantitative information – his claim breaks down.