Author: Kurt Cagle
Over the years, I’ve had people ask me how a taxonomy differs from an ontology. The answer (or at least a reasonably simple answer) is that “a taxonomy is a tree shaped ontology”.
It is worth digging bit deeper to understand what that means, however:
Way back in the early 18th century, a Swedish biologist by the name of Carl Linnaeus began a fairly ambitious project. He wanted to build a way of indexing animals and plants by their phenotypes – the ways that they are alike. His original taxonomy was fairly basic, but over the course of decades, he eventually created a system with a few thousand living entities. He used the French word Taxonomie, derived from the Greek term for an arrangement of knowledge, and the Linnaeus Taxonomy would go on to spark a revolution in biology as generations of scientists fitted new species into it.
This is the kind of taxonomy that most people are immediately aware of, notable primarily because of it’s tree-like organization. More general terms were located towards the root of the tree, then at every branch (node) Linnaeus put more detailed species:
Animals => Vertebrates => Mammals => Carnivores => Cat-Like Animals => House Cats.
The theory behind this organization was that animals were more specialized the more advanced they were (the farther from the trunk). Not surprisingly, it was one of the foundations for the groundwork of natural selection, though this model was fairly primitive. The tree also established a ranking system of specificity – with each degree of the tree indicating more specificity until you hit the leaves, which contained the associated species.
This taxonomy is important because it contains both the terms of interest – Carnivores, for instance – and a general ruleset that indicates how the terms relate to one another. There are similar taxonomies – the Dewey Decimal System or the more recent Library of Congress organizational scheme, which similarly breaks down general branches of knowledge or understanding in a particular scheme or arrangement.
Classification schemas are of particular importance to both programming and artificial intelligence. Most objects, when you get right down to it, can be thought of as lists of atomic properties (such as length) and categorical properties (such as colors or textures) – or of composite structures that can similarly be broken down (such as the address of a house). A given categorical property can be thought of as a list of strings, though semantically, those lists may be entities in their own right. In XML and JSON. (For instance, a list of US States or Canadian provinces are considered enumerations or categorical values, but each state or province also may have additional annotation information such as founding date or population at a given time.
Ontology as a general subject means the study of things (Greek logos + ontos). As a mathematic concept, an ontology is the collection of the terms or concepts within the scope of the ontology plus the model that describes the relationships and constraints of those concepts Ontologies are usually expressed mathematically as graphs, with the concept that a graph could (though doesn’t have to) create a closed loop. These graphs are also directional – there is a preferred direction for traversing the graph. Mathematicians would refer to these as Directed Cyclic Graphs (DCGs).
A taxonomy is, in general, a Directed Acyclic Graph (DAG), which means that no loops are possible in the graph. DAGs re also known as trees. Moreover both DAGs and DCGs can be labeled – each node and edge can have a specific identifier.
Most taxonomies tend to be tree-like in nature, working from a very general root to very specialized leaves. They are consequently acyclic graphs, or DAGs. An ontology is more generalized (you can have loops within the graph of an ontology), and as such are DCGs. The important thing to realize here is that all DAGs are subsets of DCGs. This means two things: first, a taxonomy is an ontology, second, not all ontologies can be represented as taxonomies without introducing the concept of referencing.
In any modern taxonomy, each node in the graph should be identified by a unique address or GUID of some sort. In database parlance, this is what’s known as a primary key. If another node has a connection or edge that has the value of this key, this node is said to have a reference to the key (or, have a foreign key relationship). A table in a relational database should have (if designed properly) at least a primary key, but may have multiple foreign key references. If no loop is created, then the database is said to be acyclic. In a relational database, the database engine will actively disallow anything loop that is created, so a relational database in theory should also be a DAG.
Ironically, an XML document, a JSON document, or an RDF set of triples can have relationships that form cyclic graphs. However, once this happens, they are no longer tree-like. In taxonomic terms, this would be like adding a “See also” relationship that would connect Cat with Dog. Most taxonomies that are strictly tree-like also have the same relationship spreading out in that tree – such as Cat “is more specialized than” Carnivore, Carnivore is more specialized than Mammal and so forth. (Mathematicians would say that the graph has a transitive closure over the property “is more specialized than”.
So, a good working definition is that a taxonomy is an ontology that is (mostly) a labeled directed acyclic graph with a single transitive closure that contains all of the nodes in that graph. Before the eyes of the person you’re talking to glaze over, you can put that more simply as “a taxonomy is a tree-shaped ontology”.