Formalizing spider diagrams

Geared to complement UML and the specification of large software systems by non-mathematicians, spider diagrams are a visual language that generalizes the popular and intuitive Venn diagrams and Euler circles. The language design emphasizes scalability and expressiveness while retaining intuitiveness. In this paper, we describe spider diagrams from a mathematical standpoint and show how their formal semantics can be made in terms of logical expressions. We also claim that all spider diagrams are self-consistent.


Introduction
Circles or closed curves, which we will call contours, have been in use for the representation of classical syllogisms since at least the Middle Ages [11].The Swiss mathematician Leonhard Euler (1707-1783) introduced the notation we now call Euler circles (or Euler diagrams) [1] to illustrate relations between sets.This notation uses the topological properties of enclosure, exclusion and intersection to represent the set-theoretic notions of subset, disjointness, and intersection, respectively.
The ½ th century logician John Venn used contours to represent logical propositions [16].In Venn diagrams all contours must intersect.Moreover, for each non-empty subset of the contours, there must be a connected region of the diagram, such that the contours in this subset intersect at exactly that region.Shading is then used to show that a par-£ Work done in part while the author was at the IBM T. J. Watson Research Center Ý Partially funded by the UK EPSRC, grant numbers GR/K67304 and GR/M02606 ticular region of the resulting map is empty.
In 1896, the logician Charles Peirce augmented Venn diagrams by adding X-sequences as a means for denoting elements [14].An X-Sequences connecting a number of "minimal regions" of a Venn-diagram, indicates that their union is not empty.Peirce also gave a mechanism for writing disjunctive information, which we will not discuss here.
As an indication of the popularity and intuitiveness of Venn-Peirce diagrams is the fact that they are used in elementary schools for teaching set theory as an introduction to mathematics.Still, only recently, full semantics and inference rules have been developed for Venn-Peirce diagrams [15] and Euler circles [4].
As a means for writing constraints on sets and their relationships with other sets, Venn-Peirce diagrams are expressive, but complicated to draw because all possible intersections have to be drawn and then some regions shaded.Drawing the Venn diagram of four or more sets is quite challenging.As shown by More [12] in the late fifties, there is an algorithm for adding a new contour to a Venn diagram.Although it is possible to do so indefinitely, the contours quickly assume weird shapes, with exponential increase in their curvature.The resulting diagram is very complicated and difficult to follow.Indeed, it is rare to see Venn diagrams of four or more contours.On the other hand, Euler circles are intuitive and easier to draw, but are not as expressive as Venn diagrams because they lack provisions for shading and for "X-Sequences".
In view of their relative merits, it seems natural to combine the two notations, by relaxing the demand that all curves in Venn-diagrams must intersect.Doing this and more are spider diagrams.They are named after the "spider shape" which generalizes X-sequences in that a "minimal region" may have more than one spider in it, e.g., to denote that a set has two or more elements.Spiders may be connected by strands or ties in a region, to indicate that elements denoted by the spiders may or must be the same in that region.Further, a region containing spiders may also be shaded to denote that there are no elements in that region other than those shown by the spiders.Thus, spider diagrams allow placing both an upper bound and a lower bound on the size of a set.Among other extensions to traditional notation we find in spider diagrams the notion of "projections" which are used to show the intersection of more than three curves in a clear and uncluttered manner.
Spider diagrams have emerged from a succession of attempts to provide the software designer with precise, yet intuitive tools to specify a system prior to actually coding it.Work along this line of research can be traced all the way back to Harel's seminal state-charts.It was only the sound mathematical foundation which they stand upon that enabled the development of "executing" CASE tools [6].With the advent of the object-oriented paradigm, and the modernization of software design, came a series of notations, for example BON [17], OML [2], and the celebrated UML standard [13].Most of these visual languages are quite restricted in their expressive power in lacking provisions for denoting first order predicate logic formulae, which are so essential for describing system invariants.Such formulae are written using either in-borrowed mathematical notations (as in BON), in an auxiliary specialized textual language designed for that purpose, e.g., OCL [18], or worse, as natural language annotations (as was the case with OML).
Spider diagrams have emerged from work on constraint diagrams [8], which were introduced as an attempt to remedy this situation.The constraint diagram in Fig. 2 expresses (amongst other constraints) an invariant on a model of a library system: for any library object, and any copy of that library which is on hold, that copy's publication must be the same as that associated with the reservation for which it is on hold contain them, and arrows as showing the range of relations when their domain is restricted to the set or element at their source.Spider diagrams are used to show relationships between all the sets and elements involved.In fact, spider diagrams can be thought of as constraint diagrams without the arrows.Constraint diagrams proved to be an intuitive and useful tool for design, and several extensions of these were proposed, including a variant designed for expressing pre-and post-conditions pairs [9], a three dimensional version for behavioral specification [3], and a version used for metaspecification, i.e., the specification of design patterns [10].At the same time, constraint diagrams were employed in the software industry to produce informal descriptions of systems.
Spider diagrams, the most fundamental layer of constraint diagrams, are the subject of this treatise.Our efforts towards upgrading the language into a true "visual formalism" [5], include the discussion of the essential syntax, semantics and properties of spider diagrams, treating them both as topological and notational creatures.Following the work of Shin [15] and Hammer [4] we show how formal semantics in the form of logical formulae for spider diagrams can be given.
Beyond visual software modeling, spider diagrams have an impact on the field of diagrammatic reasoning, a relatively new field which looks at logic and logical reasoning from a wholly new perspective.Such extensions are discussed in [7].
Outline.The remainder of this paper is organized as follows.Sec. 2 defines the syntax, and provides an informal semantics which helps to motivate and explain the various syntactic elements.This section treats spider diagrams as topological creatures.A formal definition of spider diagrams as a notational device devoid of geometrical and topological undertones is given in Sec. 3.This definition is then used in Sec. 4 to provide formal semantics of the notation.Sec. 4 also quotes the main result in the study of spider diagrams, namely that all legal diagrams are consistent.Finally, Sec. 5 concludes the paper and describes future research.Due to space limitations proofs are omitted from this paper.Similarly, some definitions, which were overly technical, were abridged.

Contours
Contours are shapes used in a spider diagram to denote sets.Formally, a contour is a closed non-self-intersecting plane curve.
Although it is convenient to draw contours as ovals, this is not mandatory.Other topologies may be used for making a visual distinction between different kinds of contour.For example, in object-oriented modeling, rectangle contours are used to indicate that a set corresponds to a class of objects.Since convex contours tend to be more visually pleasing we usually try to draw them as such.
Different iconic representations, or line styles, may be used to distinguish between different kinds of contours.In state-chart diagrams for example, thick lines may be used to denote initial states.We will also use dashed lines to denote projections, which are a special kind of contour.All concepts described in this section are independent of the chosen shape or the iconic representation of a contour.
A diagram contains at least one contour, called the boundary contour.The boundary is a contour which is not contained in, nor does it intersect with, any other contour.We do not usually bother to draw the boundary contour: it is assumed to be the bounding box for the diagram, be it the edges of the drawing surface, the edges of a figure, etc.
Excepting the boundary, all contours in a Venn diagram must intersect.We do not require this property in spider diagrams.In spider diagrams, just as in Euler circles, two contours can stand in one of three relations (Fig. 3).An intersection between the contours will mean that the sets they denote may intersect.Thus, with the absence of any other information, no statement is being made of the relationship between these two sets.If the contours do not intersect, then they are either disjoint, with the implication that their denoted sets are disjoint, or one of of them is contained in the other, with a corresponding implication on the relationship between the sets they denote.
There is a much greater variety of relationships among more than two contours.Consider for example Fig. 4, in which one contour is contained in the "union" of two others.The intended semantics is of course that .Part of the challenge in giving precise semantics to spider diagrams is in systematically dealing with the general case of relationships among any number of contours.A contour can be labeled.By convention, contour labels are initially capitalized.

Districts, Regions and Zones
The meaning of a diagram is obtained from the topology of the contours in it.The terms district, region, and zone will become pertinent in the study of this topology.
A district (sometimes called a basic region) is the set of points in the plane enclosed by, or lying on, a contour.The district of the boundary contour is called the domain since it consists of all points of the diagram.By definition, a district is a connected set.Also, since a district includes the points on the contour itself, topologically it is a closed set.Fig. 5a shows the districts of a clover-three mutually intersecting non-boundary contours.
The regions of a diagram are the non-empty sets which can be formed from its districts by means of set union, intersection and difference operations.More formally, a region has the following recursive definition: Any district is a region, and in addition, for any two regions Ö ½ and Ö ¾ , the set Ö ½ Ö ¾ is a region, as are the sets Ö ½ Ö ¾ and Ö ½ Ö ¾ provided they are not empty.
Regions are not necessarily closed, nor do they have to be connected.For example, the region corresponding to ´ µ ´ µ in Fig. 4 is not connected, neither it is a closed set, nor an open one.We insist that the number of contours in a diagram is finite.It follows that there are regions which are minimal with respect to set containment.This special kind of region Fig. 6: An example of using zone labels.is a called zone.Formally, a zone (or a minimal region) is a region having no other region as a proper subset.Notice that a zone does not have to be connected.For example, in Fig. 4, the zone corresponding to ´ µ is not connected.
Fig. 5 shows all but one of the zones of the clover.The zone not shown is that which is formed by subtracting the districts of all non boundary contours from the domain.
It is not difficult to see that contours partition the domain into disjoint zones.Accordingly, we have that all the regions can be generated by taking the union of any non-empty collection of zones.The clover, for example, has ¾ ½ ¾ regions in total, which is the number of non-empty ways its eight zones can be combined.
A zone can also be a district, as it the case for two out of the three zones of Fig. 3b.However, it follows from the definition that a zone is either completely contained in a district, or it disjoint from it.This dichotomy suggests a canonical method for denoting zones.
Definition 1 Let be a set of contours, and let ¾ be a special boundary contour.Then, the pair ½ ¾ is called a contour division if ¾ ½ , ½ ¾ and ½ ¾ .For a zone Þ, let • ´Þµ be the set of contours that contain it, and let ´Þµ be the set of contours that don't.Then, • ´Þµ ´Þµ is a contour division of the contours of the diagram which uniquely identifies Þ.The converse, namely that all contours divisions correspond to zones is true in Venn diagrams, but may not hold in spider diagrams.
Topologically, a zone Þ is the intersection of the districts of • ´Þµ, minus the union of districts of ´Þµ.For example, the zone highlighted in Fig. 5b is calculated by taking the intersection of the sets denoted by the contours highlighted with solid border in Fig. 5a and subtracting the set denoted by the contour highlighted with a dashed border.
Zones can be optionally labeled.By convention a zone label is underlined, initially uppercase and placed inside its zone.Using this convention we can read now read Fig. 6 and determine that its semantics is (among other things) ´ µ The spider diagram Fig. 7 summarizes (in a reflective fashion) some of the terms introduced above: regions are sets of points, districts and zones are regions, and there might be zones which are also districts.

Spiders
So far, our expressive power was limited to sets and their relationships.In order to be able to make statements about set

Districts Zones
, domain members, we need a new shape: the spider, which is used to denote elements.Spiders are similar to X-sequences, or chains, in Peirce diagrams, except that unlike X-sequences, spiders must be distinct (unless they are joined by a tie or by a strand, see below).
The visual representation of a spider is as a tree with nodes, called feet.A foot is a drawn as a little black circle or square, and the connecting edges, the legs of the spider, are drawn as straight lines.We chose the tree representation instead of the linear structure of X-sequence used in traditional Venn-Peirce diagrams since we found that the greater flexibility enables more visually pleasing diagrams.
We say that a spider touches a zone if it has a foot in that zone.No spider may touch the same zone twice.A spider inhabits the region formed by the union of all the zones that the spider touches.This region, called the habitat of the spider, is intuitively where the element denoted by the spider might reside.
Spiders can be labeled.By convention, spider labels are all lower case.As a simple reflective example, Fig. 8 shows that the domain is a district, but it may be in addition a zone (this happens if a diagram has no contours other than the boundary).
There is a slight semantical difference between spiders with circles as feet and spiders with squares as feet.A spider whose feet are circles corresponds to existential quantification.Thus, in Fig. 1, the semantics of the two left most spiders is that there are two distinct anonymous elements in .A spider whose feet are squares represents a given element.The semantics of Fig. 8 is that the specific element called "domain" is a member of the set "Districts".
A more elaborate example is given in Fig. 9.The semantics is that there exists a, b, c such that where Í is the universal set denoted by the boundary contour.In addition, the semantics includes the condition that all elements are distinct, i.e., , , and .

Strands and Ties
We introduce the notion of strands and ties to provide a means for denoting that spiders may (or must) be the same should they occur in a certain zone.The increased expressivity does not come at the cost of visual clarity.There is an intuitive and concise iconic representation for both strands and ties.Suppose that nodes of two spiders placed in the same zone, are connected by a "strand", drawn as a wavy line, as shown in Fig. 10a.Then, this means that the elements that these spiders denote may be the same if they occur in this zone.In Fig. 10a elements and are required to be distinct only if they are not members of the zone .
Dually, if the same two nodes are connected by a "tie", drawn as a double straight line resembling an equal sign (see Fig. 10b for an example), then the two elements must be the same if they occur in the same zone.Thus, the semantics of Fig. 10b is also that

´ ¾ ¾ µ µ
The nest of spiders s and t is the union of those zones z having the property that there is a sequence of spiders Clearly, if there is a tie between feet, then a strand between those feet is redundant.Similarly, multiple strands or ties between the same pairs of feet are redundant.

Shading and Schrödinger Spiders
Venn-Peirce diagrams use shading to specify that a zone is empty.Hence, a shaded zone in Venn-Peirce diagrams may not contain an X-sequence.This condition is relaxed in spider diagrams, and a shaded zone may contain spiders.The semantics of a shaded zone is that the set it denotes may not contain elements other than those indicated by the spiders which touch that zone.A shaded zone which has no spiders is thus empty, in agreement with Venn-Peirce notation.As shown in Fig. 11, zones are either shaded or unshaded.Just like labels, shading is not technically part of the geometry of a diagram.They are rather a property of a set of its points.It is still useful to render shaded zones by actually shading them as in Fig. 11.Although visually appealing, rendering a zone shaded is difficult to draw freehand.Another alternative for a visual indication of shading is placing a ¢ symbol in the zone.
Spiders can be used to place a lower bound on the number of elements in a set.In Fig. 12, we have that Shading a zone which includes spiders has the effect of placing an upper bound on the cardinality of elements in the set denoted by that zone.In Fig. 13a, the set contains exactly 2 elements; the two spiders in the zone mean lowerbound of 2 on its size, the shading of the zone ensures that this is also an upper-bound.Also, contains at most 3 elements; it may contain less as the elements denoted by spiders , and may be selected from other zones.In the same fashion we have that ½ ¿.
In Fig. 13a, and are related: the more elements in the less in , and vice-versa.To avoid this dependency, an exclusive element could be introduced into the universal set, i.e., an element which is not a member of any of the sets represented by the other contours.In Fig. 13b, the same restrictions on and are in force, but this time if , and denote elements in this has no impact on on , as the habitats of these spiders do not include .The price is the introduction of the constraint that ½ when ¼ .
Besides being awkward to draw, and being less than immediately intuitive, the exclusive element of the universal set is undesirable since it forces an unnatural constraint on a diagram.We fix the situation by introducing a new symbol, the Schrödinger spider, which denotes a set whose size is either zero or one.Just like Schrödinger's cat one is not sure whether the element exists or not.Formally, a Schrödinger spider is nothing but a kind of a spider, whose semantics is specialized.
The feet of a Schrödinger spider are rendered differently from normal non-Schrödinger spiders.An example is given in Fig. 14, in which zone contains three Schrödinger spiders, labeled , , and , while region contains two unlabeled Schrödinger spiders.In comparison with Fig. 13b, Fig. 14 is less cluttered, and does not force an element into zone .
There is a limit to the amount we can express in this notation about the cardinality of sets.For example, we are unable to say that for a disjoint sets and .In order to retain the intuitiveness of spider diagrams, constraints such as this should be placed in an auxiliary textual annotation, not burdening the visual notation.
X Fig. 15: A projected set (a) and its semantics (b).

Projections
Sometimes it is necessary to show a set in a certain context.Intersection can be used for just this purpose: an intersection of and shows the set in the context of and vice-versa.However, intersections also introduce regions which may not be of interest.Projections are equivalent to taking the intersection of sets, except that they introduce fewer regions, with the effect that regions which are not the focus of attention are not shown resulting in less cluttered diagrams.
A projection is a contour which is used to denote an intersection of a set with a "context".By convention, we use dashed iconic representation to make the distinction between projections and other contours.
A determining label must be associated with any projection Ô.This label is used to denote the set which is being projected.The convention is that determining labels are rendered within parenthesis when drawn in a diagram.A projection can also have a contour label.Consider Fig. 15(a) for example.The dashed contour, labeled , denotes the set obtained by "projecting" the set onto the context ´ µ , i.e.,

´´ µ µ
The same semantics could have been obtained by using More's algorithm [12] to draw the Venn diagram of four contours, as in Fig. 15(b) ½ ¾ The simplicity of Fig. 15(a) when compared to Fig. 15(b) is self evident.As noted above, the semantics of a projection is determined by its context, defined by: Definition 2 The context of a contour , denoted ´ µ is the smallest region that strictly contains the district of .
Strictness ensures that the district itself is not the region containing the contour.A projection Ô denotes the set obtained by intersecting the set denoted by its determining label with the set denoted by ´Ôµ.It must be possible to calculate the set denoted by ´Ôµ from the sets denoted by contours other than Ô itself.
There are fascinating mathematical intricacies involving the intersection of several projections.Due to space constraints, we will not be able to discuss these here, nor shall we be able to give the full semantics or even the exact definition of projections.

Formal Syntax of Spider Diagrams
In this section we give spider diagrams a formal definition which is independent of any topological and visual representations thereof.For space reasons, the definition of projections is omitted from this paper.
A spider diagram is a tuple ¬ £ Ê Ë Ë £ whose components are defined as follows: (i) is a finite set whose members are called contours.
The element ¬, which is not a member of , is called the boundary contour.In addition, we use the value , which is not a member of any of the sets we mention, to denote undefined values.
We use the dot notation to extract components of a tuple.Thus, denotes the zones of a diagram .A spider multi-diagram is a collection of spider diagrams.

Formal Semantics
where É is a predicate involving no quantifiers defined by and the predicates É , ½ are expanded below.
The function © maps the contours of to subsets of Í.

©´ µ
By letting the set intersection operation range over the boundary contour, we make sure that even the zone that is external to all contours has a well-defined semantics.
(iii) Regions.The value of © of a region, or more generally any other collection of zones is simply the union of the semantics of the zones in the collection: For Ö ¾ ¾ , let ©´Öµ Þ¾ ´Öµ ©´Þµ where, for any region Ö, ´Öµ is the set of zones contained in Ö.
Predicate É ½ (the plane tiling condition) ensures that all elements fall within sets denoted by zones: Þ¾ ©´Þµ Í This predicate, means that that an intersection of contours that doesn't appear as a zone must be empty.From it follows our intuitive interpretation of disjoint contours and containing contours: Let ½ and ¾ be two distinct contours in a topological diagram.Then, it follows from the plane tiling condition that if ½ is disjoint from ¾ , then ©´ ½ µ ©´ ¾ µ .Similarly, if ½ is contained in ¾ then it follows from that condition that ©´ ½ µ ©´ ¾ µ.
Predicates É ¾ (the spider condition) and É ¿ (the Schrödinger spider condition) ensure that an element denoted by a spider is in the set denoted by the habitat of the spider: Here we adopt the standard convention that a union over an empty range results in the empty set.Together with the spider condition, É ensures that the only elements in a set denoted by a shaded zone are the elements represented by any spiders impinging on that zone.Specifically, the set denoted by a shaded zone not containing feet of any spiders is empty.

Conclusions and Further Work
This is the first of a two part semantics for constraint diagrams.The second part will deal with the semantics of arrows.We have begun to explore rules for reasoning directly with the diagrams, building on recent work in reasoning with Venn diagrams and Euler circles.Early results look promising, see [7] for details.It is our eventual aim to develop tools to support conceptual modeling, including the modeling of software, based on these formal results.The formal work goes hand in hand with attempts to popularize and obtain feedback on the utility of the notation.

Fig. 1
Fig. 1 is a simple spider diagram consisting of three spiders and three contours.The figure uses various visualization techniques to specify that ¾ ¾ ´´ µ µ ´´ µ µ
the set of regions, and let Ê ¼ Ê .(iv) Ë is a set of spiders, while Ë £ Ë is a set of Schrödinger spiders.(v) The function Ë Ê returns the habitat of a spider.The function Ë ¢ Ë Ê ¼ returns the nest of any two spiders, while Ë ¢ Ë Ê ¼ is a function that returns the web of any two spiders.
´×µµ Predicate É (the strangers condition) ensures that elements denoted by two distinct spiders are equal then they must fall within the set denoted by their web: × Ø¾Ë × Ø ´×µ ´Øµ µ ´×µ ´×µ ¾ © ´ ´× Øµµ Predicate É (the mating condition) ensures that if the elements denoted by two distinct spiders fall within the set denoted by their nest, then these elements must be equal: × Ø¾Ë Þ¾ ´ ´× Øµµ ´×µ ´Øµ ¾ ©´Þµ µ ´×µ ´Øµ Finally, predicate É (the shading condition) maintains that the set denoted by a shaded zone contains no elements other than those denoted by spiders Þ¾ £ ©´Þµ ×¾Ë ´×µ , × and × •½ are connected by a tie in Þ.Two spiders which have a non-empty nest are referred to as mates.If both the elements denoted by spiders × and Ø are in the set denoted by the same zone in the nest of × and Ø, then × and Ø denote the same element.A strand is a wavy line connecting two feet, from different spiders, placed in the same zone.The web of spiders × Ò ½, × and × •½ are connected by a tie or by a strand in Þ.So the nest of spiders s and t is a subregion of the web of spiders × and Ø.Two spiders × and Ø may (but not necessarily must) denote the same element if that element is in the set denoted by the web of × and Ø. ¼ All spider diagrams are self-consistent.The proof is based on the construction of the topological model of a spider diagrams.The details are omitted.The definition of a model and compliance have a straightforward extension to a spider multi-diagram .The semantics predicate È ´Íµ is simply the conjunction of the semantics predicate of the individual diagrams: Thm. 1 does not extend to multi-diagrams.The reason is that it is possible to simultaneously denote that a certain set is empty in one diagram and non-empty in another diagram.Since non-given spiders are existentially quantified, we must map them into formal variables.A non-given spider is mapped into a formal variable which will existentially range over Í.