Ontology Portal

FAQ

How do I get started using logic in general?

Having a basic familiarity with logic and logical languages is a prerequisite for working with SUMO. Having past experience with object modeling in UML or schema creation in XML will be helpful, but not sufficient. One of the better books I've seen from the standpoint of getting up to speed on the practical issues of writing logic expressions is Schaum's Outline of Logic. On line, I suggest Waner and Costanoble's introduction or chapters 9 and 10 of an MIT open course. You might also look at the following: [1, 2]

How do I get started using SUMO?

The place to start is the introductory tutorial with audio, available on the ontology portal home page. If you are going to be creating your own ontology content, or even using some of the domain ontologies, you'll want to run the Sigma browser locally, so you should download and install a copy.

Why doesn't SUMO have the concept of X?

It probably does have it, at least at some level of generality. Try searching the WordNet mappings (enter a word in the English Word box). It is likely you're just searching SUMO and the name you expect for a concept doesn't match the name that was used. A related issue is that the name for a term is just a comment. A term means what its axioms say it means; no more, no less. It may well be that a particular term doesn't accord with the meaning you might intend for its given name. That doesn't mean that it is wrong, just that you might have named it differently. Someone else might have named it differently still. There is no objective basis for deciding on a name. Better to treat each name like an arbitrary symbol, such as GENSYM345432, if the term name doesn't seem evocative for you. If you find a term in SUMO or MILO that is what you are looking for, but too general, also try looking in one of the domain ontologies.

What was the methodology used to develop SUMO?

The development of SUMO has involved several approaches, in the rough sequence given below. In philosophical terms, it is closest to "Naturalism". It is both top-down and bottom-up (as well as middle-out). We took a pragmatic and empirical approach by virtue of the domain ontology construction and WordNet mapping, but also a theoretical and philosophically informed approach by working top-down from theories and principles from philosophy. Any comprehensive ontology needs work from top-down and bottom up. It must be cognizant of theory and yet focused on pragmatic practice and utility. To take one viewpoint over the other is to ignore an influence that can help create a better product. Some of the steps taken were to:

Collect all the available general purpose ontologies that have been formalized in logic, including rules
Merge those axioms into a common product
Identify major areas of general knowledge that are not covered in existing research projects
Ontologize those areas
Map SUMO to WordNet
Augment SUMO with new ontological areas identified as missing in the process of doing the WordNet mappings. WordNet word meanings that do not have a "home" at a reasonably specific level of SUMO indicate a potential ontological gap. Follow an arbitrary cutoff of 1000 terms in SUMO, if the 1001st term is needed, put it in an appropriate domain ontology, or the MId-Level Ontology.
Develop domain ontologies as needed for a variety of customers
Add concepts to SUMO for domain specific content that isn't well supported at the upper level
Run a formal theorem prover on SUMO and the domain ontologies in order to find contradictions. Correct contradictions and add additional content as needed
Conduct an effort to extend SUMO by identifying the more common synsets in WordNet (those with an occurrance of greater than 3 in statistical count with respect to the Brown Corpus, otherwise known as WordNet SemCor) and create an equivalent concept if none already exists.
Repeat steps 7-9

Isn't first order logic too complicated to expect people to use it?

The simple answer is that logic solves an essential problem that other approaches do not. For one explanation of this issue, look at my article "Why Use OWL?". One version of the contrary argument is "looking under the lamppost". Taxonomies, object models, database schemata, controlled vocabularies and the like simply don't capture the meaning of concepts in an unambiguous way. Those approaches don't capture meaning in a way that computers can understand. Current approaches don't address the need to capture meaning. It's true that FOL is unfamilliar to many people. So was Java when it was first introduced. For that matter, so was assembly language. If a technology has value and can't reasonably be handled by a simpler approach, people will learn it. An end user however should no more expect to see SUMO or KIF expressions than he should see the Java code underlying an application.

Why do we need semantics? Isn't XML good enough?

Here's a look at the issues that somewhat parallels my explanation from the article "Why Use OWL?". In logic we might state

Isa(Y,X) ^ Isa(Z,Y)

"Y is a kind of X, and Z is a kind of Y"

The mathematics of logic allows us to conclude

Isa(Z,X)

This works in just the same way that 2 + 2 = 4. "2 + 2" means "4". It's not just a procedure for stating a problem solving process in arithmetic. It's not just a syntax, because we could change the symbols as long as we defined them with a formal theory (like Russell & Whitehead's Principia Mathematica). "2 + 2" necessarily entails "4" whether I have some system to state it or prove it or calculate it or not.

XML syntax doesn't have that inherent property. The following expression

<and>
  <isa "Y" "X">
  <isa "Z" "Y">
</and>

semantics

<rdfs:Class rdf:about="http://a.b.c/my-schema#Y">
  <rdfs:subClassOf rdf:resource="http://a.b.c/my-schema#X"/>
</rdfs:Class>

<rdfs:Class rdf:about="http://a.b.c/my-schema#Z">
  <rdfs:subClassOf rdf:resource="http://a.b.c/my-schema#Y"/>
</rdfs:Class>

<rdfs:Class rdf:about="http://a.b.c/my-schema#Z">
  <rdfs:subClassOf rdf:resource="http://a.b.c/my-schema#X"/>
</rdfs:Class>

Isn't first order logic too slow to use?

It is important not to confuse representation with implementation. Performing representation in the same language as the implementation risks using a language that makes it impossible (or at least very difficult or awkward) to capture certain kinds of information. For example if your implementation language doesn't allow for stating if..then rules, then you won't be able to capture that kind of information. But such rules are almost certainly needed to define each term precisely. A better approach is to capture the information and then decide how, and how much, of that knowledge can be expressed and used efficiently in your application. At least you'll have documented carefully what your concepts mean. Just because implementations can't directly reason with English, doesn't mean we shouldn't have English definitions in our data dictionaries.

X is/isn't an upper level concept. Why is/isn't it in SUMO?

This is the sort of question that lacks any objective basis for fruitful discussion. We set an arbitrary limit of 1000 terms in SUMO because much more is likely to be too hard to learn in a reasonable amount of time, and much less is likely not to cover a broad enough space of concepts to be useful by itself. As to whether something belongs in SUMO or MILO, it's a judgement call on which reasonable people can disagree.

How do you know SUMO is right?

How do you know your operating system is "right"? It's a combination of a priori formal methods to ensure internal consistency, empirical tests of utility and coverage, and a lot of human testing and inspection. There are no shortcuts here. The specific tasks have been

Testing with a first order logic theorem prover to identify logical contradictions. While this is undecidable in the limit, we did incremental testing with longer and longer search times. First we tried to prove the negation of every statement with a 30 second search bound. We found a number of problems. With a few minutes limit we found a small number more. We then increased the limits so it was taking days to run through the entirety of SUMO, without finding more problems. It's possible certainly that there are undiscovered problems, but unlikely. More likely is that some axioms may not be formulated in a way that a conventional FOL prover can find all the problems. A good example of this is that the ListFn axioms need a procedureal attachment in order to function properly.
Mapping to WordNet as an empirical test of coverage
Creation of domain ontologies as an empirical test of coverage
Release of every version of SUMO to the public, for peer review

How do you know SUMO is suitable for all domains or tasks?

We don't, exactly, but we don't have any concrete examples to the contrary either. It's a reasonable conjecture that SUMO might not be optimal for some new task or domain, but that's only a conjecture until a pragmatic and specific example is found and formalized. The range of domains to which SUMO has been applied gives us some empirical evidence to the contrary. There are many ongoing debates in metaphysics about different ways of carving up the world at a high level, but the very existence of such debates should show that there are no critical flaws with the major positions. A typical debate is on models for action and change.

Is SUMO biased toward English, or western culture in general?

SUMO is language independent. The original SUMO term names are in English, but they are only often and coincidentally equivalent to English words. SUMO terms have been translated into a variety of different languages, which are primary in some very different and non-western cultures. These languages include Hindi, Chinese and Czech. The ease with which these translations have been performed, and the extent to which SUMO is in regular use by non-English users, gives us considerable confidence that there is no deep-seated linguistic or cultural bias in SUMO, any more than there is linguistic or cultural bias in areas of mathematics discovered by the ancient Greeks or Chinese.

Is SUMO done?

It would be better to say that SUMO is stable. The structure of SUMO has not changed appreciably in several years. However, there are still many things which could be improved and elaborated. From time to time we get reports of typos in rules, or other problems which although isolated, do need fixing. SUMO is likely to continue to evolve, especially as it gets wider usage in reasoning applications. It's also likely that SUMO will not change much compared to the level of change and evolution of the domain ontologies.

How are sorts defined in SUO-KIF and SUMO?

In SUO-KIF variables are not typed. In SUMO, all relations have defined argument types. SUMO uses domain and domainSubclass (as well as range and rangeSubclass) for this. Some logical languages use an explicit syntax such as

(forall (?X:Object, ?Y:Process) ...)

(forall (?X ?Y)
  (and
    (instance ?X Object)
    (instance ?Y Process) 
  ...))

(forall (?X ?Y)
  (and
    (instrument ?Y ?X)
    ...))

How does SUMO employ higher order logic?

This is a difficult issue, since higher order logic is very difficult to reason with efficiently, but very hard to do without from a representational standpoint. Higher order expressions are used when necessary in SUMO. A practical reasoning system may also wish to define concepts which effectively incorporate modal or temporal parameters into domain specific predicates. The tradeoff though is that such predicates will be less reusable. It is a delicate balance that must be maintained. Specifically SUMO does include a number of predicates that take formulae as arguments, such as holdsDuring, believes, and KappaFn. In Sigma, we also perform a number of "tricks" which allow the user to state things which appears to be higher order, but which are in fact first order and have a simple syntactic transformation to standard first order form. We also integrate with the THF language to do real HOL with LEO-II and other HOL provers.

Webmaster