Articles / OpenCyc Calculator

OpenCyc Calculator

In his book "Integration-Ready Architecture and Design", Jeff Zhuk states that today's software engineering practices suffer from one serious drawback: the non-reuse of common algorithmic knowledge.

For instance, any time an accounting application is written, it is written completely from scratch, despite the fact that:

  1. Underlying algorithms, i.e., principles of double-entry accounting, have not changed for several hundred years, and
  2. these algorithms are already incorporated in dozens of other accounting applications.

Jeff Zhuk proposes to solve this problem by using knowledge technologies. As I understand this, algorithms which are used by an application should be extracted and put into a database. Then, when an application needs those algorithms, it connects to that database and uses them. In this way, different applications written in different programming languages benefit from reuse of algorithms.

The idea seems promising to me, and I think that it is beneficial to try it on a simple application. In this article, I will present my thoughts about how algorithms can be extracted from the simplest application I can think of (apart from "Hello, world!"), a calculator. I will take an existing calculator application, JCalculator, "cut out" algorithmic parts, and put them into an OpenCyc knowledge database.

Calculator application before change

Below is a screenshot of the window of JCalculator.

JCalculator Screenshot

The source code of the application consists of a single file, JCalc.java.

Migration strategy

Our strategy in "cutting out" algorithmic parts will be:

  1. Find all algorithmic parts in the original JCalculator source code.
  2. Refactor the source code of JCalculator so that all algorithmic parts are encapsulated in methods.
  3. Replace Java-based implementations of algorithmic parts with OpenCyc-based implementations.

1: Find all algorithmic parts in the original JCalculator source code

We will work more quickly if we determine what algorithmic parts we are searching for. In our case, algorithms are just arithmetic operations:

  1. addition (+)
  2. subtraction (-)
  3. multiplication (*)
  4. division (/)
  5. square (x^2)
  6. exponentiate (x^y)
  7. factorial (!)
  8. modulo (%).

2: Refactor the source code so that all algorithmic parts are encapsulated in methods

We put all arithmetical operations into the class AlgorithmicsSingleton and replace all calls to arithmetic operators in the file JCalc.java.

3: Replace Java-based implementations of algorithmic parts with OpenCyc-based implementations

This step is the most interesting. I want to show you several ways of doing things in OpenCyc. Therefore, the solutions presented here may be sub-optimal in real life. They are designed as examples for learning, not as production code. In the following sections, you will learn how to:

  1. Use built-in SubL functions
  2. Create new SubL functions and use them
  3. Execute CycL queries.

If you prefer reading source code to reading natural language texts, you may look at the file AlgorithmicsSingleton.java.

Performing simple arithmetical calculations in OpenCyc

First, we need to know how to perform arithmetical calculations with OpenCyc. There is a language called SubL in OpenCyc which can be used to perform simple arithmetical operations.

Perhaps the best way to tell you how to work with SubL is to demonstrate it on practical examples. So, if you can, perform the steps described below on your machine. Note that while preparing this paper I used OpenCyc 0.7.0b for Windows.

  1. Launch OpenCyc by executing the file opencyc-0.7.0/scripts/win/run-cyc.bat.
  2. Wait until the CYC(1) prompt appears.
  3. Enter SubL expressions and evaluate them.

In the following table, in the column Example, you can see expressions which must be entered at the CYC prompt in order to execute a particular operation.

Operation SubL expression Example
Addition (+ a b) (+ 2 3)
Subtraction (- a b) (- 2 3)
Multiplication (* a b) (* 2 3)
Division (/ a b) (/ 6 3)
Modulo (MOD a b) (MOD 6 3)

In order to execute such computations in OpenCyc directly in Java, we have to use the CycAccess class. This class provides access to OpenCyc functionality. The following code fragment (part of the AlgorithmicsSingleton.java file) is used to perform an arithmetical operation using OpenCyc.

private double calculateWithCyc(String op, double number1, 
double number2)
{
	Number result=null;
	
	try
	{
	    result=(Number)this.cycAccess.converseObject("(" + op + 
	    " " + number1 + " " + number2 + ")");
	    return result.doubleValue();
	}
	catch (CycApiException exception)
	{
	    this.logger.error("", exception);
	}
	catch (UnknownHostException exception)
	{
	    this.logger.error("", exception);   
	}
	catch (IOException exception)
	{
	    this.logger.error("", exception);
	}
	return 0.;            
}

The really important line is the following:

result=(Number)this.cycAccess.converseObject("(" + op + " " + 
number1 + " " + number2 + ")");

For clarity, let's reformulate this line into:

result=(Number)this.cycAccess.converseObject(subLExpression);

subLExpression is the SubL expression, which follows the same syntax as our examples above. If, for instance, you want to calculate the sum of 2 and 3, use the following call:

result=(Number)this.cycAccess.converseObject("(+ 2 3)");

The converseObject method returns an instance of class java.lang.Number. Using SubL expressions, we can implement the majority of our "algorithms": addition, subtraction, multiplication, division, square, and modulo.

Implementing factorial and exponentiation is a bit more complex, requiring programming in SubL and CycL. We discuss this issue in the following section.

Programming with SubL

The SubL language is based on LISP and, according to my first impression, enables the programmer to implement routines of any complexity. Let me give a short bit of background information for readers who are not familiar with the declarative style of programming. At the beginning of the information age (the 1960s), two languages were born: FORTRAN and LISP. FORTRAN was the flagship of imperative programming. In imperative programming languages, the programmer tells the machine what instructions have to be executed in what order. The order of instruction matters, and incorrect ordering of instructions is a frequent cause of errors in imperative programming languages. FORTRAN is the root of a large family of imperative languages, to which C, C++, C#, and Java belong.

LISP was the flagship of declarative or functional programming languages. In these languages, programs are similar to mathematical models (collections of formulae). They describe the final result of the calculation and specify all functions (in a mathematical sense) which are necessary to calculate this final result. The order of execution of instructions does not matter. Several programming tasks can be solved in declarative programming languages more quickly (i.e., with less code) than in imperative programming languages. LISP was the foundation for such languages as PROLOG and Haskell.

Let's return from distant history to our current task: we need to define the factorial function. In SubL, this definition looks like this:

(DEFINE FACTORIAL (x)
	(PIF (> x 1)
	(* x (FACTORIAL (- x 1)))
	      1))";

This code fragment defines function FACTORIAL, which takes one value x as an argument. This function checks whether x is greater than one ((> x 1)). If this condition is true, the return value of FACTORIAL is equal to x times the factorial of (x-1). In infix notation, this is equivalent to (* x (FACTORIAL (- x 1))). If x is equal to or less than one, the value of FACTORIAL is equal to 1.

Calling CycL functions from Java

There remains the last task, to implement exponentiation in OpenCyc. There is a predefined function #$ExponentFn in the OpenCyc database. All we have to do is to invoke this function from Java and fetch its result. In order to do this, we have to

  1. Formulate the query in the OpenCyc query language CycL.
  2. Execute this query in Java.

The CycL query for exponentiation is (#$evaluate ?SUM (#$ExponentFn a b)). This query returns a^b, so the first step is done.

The second step is incorporated in the following Java method:

public double exp(double number1, double number2) {
Number result = null;
CycList query = null;
CycVariable sumVariable = null;
CycConstant microTheory=null;
CycList response=null;

try {
    query = CycAccess.current().makeCycList("(#$evaluate ?SUM " + 
	    "(#$ExponentFn " + number1 + " " +  number2 + "))");
    sumVariable = CycObjectFactory.makeCycVariable("?SUM");
    
    microTheory=this.cycAccess.getConstantByName("UniversalVocabularyMt");
    response = CycAccess.current().askWithVariable(query, 
	    sumVariable, microTheory);

    result = (Number) response.iterator().next();
    
    return result.intValue();
} catch (CycApiException exception) {
    this.logger.error("", exception);
} catch (UnknownHostException exception) {
    this.logger.error("", exception);
} catch (IOException exception) {
    this.logger.error("", exception);
}

return 0.;
}

This method simply executes the aforementioned query and returns the result as a double value. For more details, I recommend you read the OpenCyc documentation. But if you really want to learn how to work with OpenCyc, reading the docs won't help much. In this case, I recommend you study the files:

At least for me, I gained more by reading source code rather than reading docs.

Final words

So now, we have attained our goal. We took an existing application, extracted its algorithms, and put them into the OpenCyc database. It's time to think about what to do next. Knowledge-driven software architecture as proposed by Jeff Zhuk and demonstrated in this example is a rather new technology. As with everything new, it seems very promising. One can read papers on recent software engineering trends like a thriller. There is a constant feeling that something very big, even revolutionary is crystallizing in the minds of the authors of those publications. At the same time, these feelings are very abstract. I often have only a vague idea about how to apply those revolutionary ideas.

In my opinion, the time has come to reason about knowledge-driven architectures by means of practical examples. One line of code tells more than a dozen natural language words (except, perhaps, if you are coding in COBOL). So, currently, the most needed things are simple examples which demonstrate what one can do with OpenCyc. This paper is an answer to the question "What would a calculator application look like, if it were implemented with OpenCyc?".

There are many other questions which require answers in the form of code examples:

  • Can we create a calculator application in another programming language (for instance, C#) which uses the same OpenCyc database?
  • Can we extract not only algorithms, but also other properties which are common to all calculator applications, e.g., the fact that all calculator applications have number buttons? Does it make sense?
  • Are there better ways to extract algorithms from the calculator application than those shown here?
  • Can we develop OpenCyc-based foundations for other frequent types of applications (address book, email client, accounting program, application for editing a relational database)?
  • Should we create a catalog of standard applications whose common properties are stored in an OpenCyc database and only platform- and language-specific things are programmed outside OpenCyc?
  • Can we, in this way, save programming efforts of future generations?
  • Can we use OpenCyc for writing functional routines (in SubL) in imperative languages? In other words, can we use OpenCyc as a substitute for "inline Haskell" or "inline LISP"?
  • When programming in OpenCyc, when should we use SubL, and when CycL?
  • Can we solve the problem of automatic translation of texts by storing the statements of the text in OpenCyc, and then use its paraphrase functionalities to create natural language texts in different languages? Could you write OpenCyc text once and have its English, Russian, and German translations generated automatically? Is it more efficient than currently-used approaches?
  • Can we use OpenCyc for efficient writing of user interface test cases?
  • What is the next most sensible step in exploring OpenCyc capabilities for practical work?

After reading this article, you can think about your own questions which were triggered by reading it. It would be great if you shared your thoughts with other people interested in improving current software engineering practices (including me).

Appendix: Related materials

Here is a list of materials related to this article:

Recent comments

16 Oct 2005 12:01 Avatar bstarynk

Re: Programming is not about algorithms


quoting Mike:
Trying to replace programmers with a machine again?


But the goal of partly automating programming tasks has been sucessfully pursued since about 50 years. In 1960s, compilers like FOR[mula] TRAN[slation] had exactly this goal (avoiding human assembler programming, and having the assembler code generated by a program, which is now called a compiler, and today human coders don't code in assembler anymore...). Likewise, a compiler made in France in 1959 was called Programmation Automatique des Formules (automatic formula programming), because its translated a Basic-like language into machine code (on a drum computer, the CAB500).


So my point is that software engineering, and programming language design, and compiler implementations is all about somehow automate or assist the task of programming computers (remember, computers only understand machine code).


Some knowledge based systems may help in this task. There are lots of code generators today, and many areas have domain specific higher-level languages.


Overall, I believe we agree. I don't claim that any fully automatic software will do all the human work in a foreseab le future. But the half-century trend is in assisting human programmers in working faster and better, and I do feel that some progress has been made since the 1950s (we human programmers are not working the same way and with the same tools as 50 years ago).


BTW, the PAF language was designed and implemented by my late father in 1959 (the year I was born), and I know for sure that I am not programming the same way as my father did. He coded in machine code, mostly with a pencil and paper; I'm coding in Ocaml or C, mosty with Emacs (on a PC/Linux computer costing 50 times less, and running 100 000 times faster, with 1 000 times more memory than in 1958).

So we both agree. We might have a disagrement on what is called the strong A.I. hypothesis, but this is not very relevant here.


Respectful regards.

16 Oct 2005 09:03 Avatar msharov

Programming is not about algorithms


> Also, most of the programming effort is

> not in designing new algorithms, but in

> choosing and mixing (i.e. combining)

> existing algorithms. Here, a reflective

Spoken like a true academic! Or, perhaps, a pointy haired boss. "Algorithm" is in itself an academic word, referring to canned procedures like sorting, searching, compression, etc. All of them form a truly miniscule portion of any software project, and are usually just copied from some textbook when required. I have a few myself, from my ancient college CS texts, to the more modern compendiums like "Numerical Recipies in C". It takes maybe an hour to type one in. The bulk of the code, however, is usually about the UI, "business logic", and sundry "glue" code. Choosing algorithms is for portions of the program that need to be fast. That used to be graphics, when software rendering was the rule and you had to implement textured polygon rendering in nine cycles per texel or die. With the modern GPUs and DirectX, nobody does that anymore (except on Linux, where only OpenGL on X is allowed to use the GPU) It also applies to search, if you are Google, or have a billion-record database, both of which usually require a task-specific solution.

In the rest of the code cleanliness and maintainability are far more important that what algorithm you are using. When you're sorting a 100-element array, bubble sort will do just as well as heapsort, but can be coded in a dozen lines (yeah, STL has the sort algorithm, but I'm trying to give an example here :) which any idiot can understand.

When you take someone else's code, it is almost never possible to just drop it in and have it work. Chances are, it uses different argument types, like QString instead of std::string, or requires malloc-allocated memory (*gasp*). Chances are it is written in C and has some horrible garbage hung on it that every C program can't live without, like running the algorithm on a file (zlib, graphics format libs, etc.) or reporting errors via callbacks (libjpeg). When you put something like that into your nice and clean C++ design, it looks like an open sore begging to be wrapped.

So you spend some time on an object wrapper, with the intention of hiding all the unpalatable stuff in one bloated object. Then you spend time debugging the wrapper, since all those translations always go wrong somewhere, exception handling doesn't go through a C callstack (requring a state machine modified from the callback [see Xlib, the vilest and the most hostile UI library in existence]), and thread safety is just not there. And then you stop, throw up your hands in disguist, and rewrite the whole bloody thing from scratch in a day or so. The next time you won't even bother doing the wrapper. It's just not worth it.

This was about the algorithms. The rest of the code is usually not reusable. The UI portions are specific to the UI package you are using. Anyone writing a similar program would probably be doing it for a different UI (or else, why not just use your program?). Even if they use the same UI, their programming environment may differ markedly from yours. For example, I use STL for containers, while another programmer may be stuck with Qt, out of preference or because of legacy code. He'd have to wrap or modify to fit. But even that is an ideal scenario. One program's UI often has no relation to another program's UI at all. The structure may be completely different, preventing any code reuse at all. Finally, there's the "business logic" code, which is seldom logical, depends entirely on the customer's whim, and is completely useless to anyone but him.

> But generating code by knowledge system

> is in my opinion a very interesting and

> fruitful idea (but it is hard

Trying to replace programmers with a machine again? :) Not gonna work. The problem domain is just too wide. You can write specialty code generators for simple, well-defined tasks. Visual Studio has a whole bunch of Wizards like that. But for general purpose programming, you can forget it. Programming is not about implementing algorithms, contrary to what ivory tower academics think; it's about figuring out what the user wants and about designing components with good interfaces. It's an art, not a science, since it is about people as much as about computers. That's why it requires a human mind, human thought, and human experience. Yes, it may be possible to create an AI programmer some day, but it would not be some mindless "database". It would have to be a true sentient being just like us.

15 Oct 2005 13:10 Avatar bstarynk

Re: A fine example of what not to do
See also the Speaking about oneself (.doc) (http://www-poleia.lip6.fr/~pitrat/Speaking_about_oneself.doc) Pitrat's paper, and Implementation of a Reflexive System (http://www.sciencedirect.com/science?_ob=IssueURL&_tockey=%23TOC%235638%231996%23999879997%23101983%23FLP%23&_auth=y&view=c&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=145756973eb7da12f5d48c2858b30968) paper (not freely online, Future Generation Computer Systems 12, pp. 235-242, 1996) both by J.Pitrat

15 Oct 2005 12:47 Avatar bstarynk

Re: A fine example of what not to do

Mike wrote: This article provides the perfect
illustration why algorithms are seldom
reused. The author takes simple
algorithms like a+b which can literally
be implemented in three characters and
blows them up into an object-oriented
API a few hundred lines long. ....

I understand Mike's position (and I even could share a bit of it) but the whole idea of the original article by Dmitri Pissarenko is not to generate human readable (or re-usable) code (even in Java) but runnable code.


a Java compiler already generate lots of complex code. If you define in Java a method on your subclass of Integer-s which add them, the generated machine code will be hundred times longer that the assembler code produced by a seasoned human assembly-language programmer which does an addition (this being probably a few assembler instructions).


The good insight in Dmitri Pissarenko is to use knowledge systems to generate programs. This has already been tried (in particular, Jacques Pitrat is working on this theme for more than a dozen years with his Maciste system). Unfortunately his system is not "open source", and is not very documented (but he published some interesting papers on it).


I do agree with Mike that Dmitri's example is not a particularily good one (a calculator is quite simple, and the issues involved are not algorithmic, but of software design, and highly related to graphical interfaces, not to additions!); however, to give good examples for this you need a lot of work, and the article you would write would be much too long to be accepted on Slashdot (or even elsewhere).


But generating code by knowledge system is in my opinion a very interesting and fruitful idea (but it is hard to implement in practice, because only big examples are credible).


Also, most of the programming effort is not in designing new algorithms, but in choosing and mixing (i.e. combining) existing algorithms. Here, a reflective based [meta-] knowledge system would help a lot, and such a system would generate code.


I actually could write a lot more on this, but I don't have the time and the incentive; however, you might look for Jacques Pitrat's papers and books, and also look into the Tunes (http://tunes.org/) site (which contain a lot of interesting, even if it is old, blurb, but no real working code).

15 Oct 2005 09:14 Avatar bsavoie

Basic Research vs Applied Science
Thank you for your deeply thought article. Yes it is too early yet to see practical applications using Occams test, but breaking out of boxes is what prevents us from progress. Think about how many light bulbs Edison tried before he found carbon filament! We are simply too lazy to ask big questions. If you keep working in the direction of your intuition, you may invent the next computer language. Those people who learn to think out-side-the-box unfortunately have to put up with lots of negative energy. That is the main reason why leadership is so hard. It is also the satisfaction. Keep up the good work..

Screenshot

Project Spotlight

Kigo Video Converter Ultimate for Mac

A tool for converting and editing videos.

Screenshot

Project Spotlight

Kid3

An efficient tagger for MP3, Ogg/Vorbis, and FLAC files.