Articles / The Importance of Bug Testi…

The Importance of Bug Testing

Luke Andrews writes: "The following whitepaper discusses the importance of bug testing with respect to client and vendor environments. Various responsibilities are placed on either side of product development, and it is necessary to understand the reasons behind practicing secure coding and ethical loyalty."

Why bug test?

Why people must test programs is often debated. If people were ethical, we would not have to test for security risks, but the world is not entirely full of ethical people who ensure that correct data is computed into a system -- that is why safe practices need to be developed. The only way for this to take place is through bug testing. There are two categories that effect the client and the programmer, as each have different needs and wants.

From the client's perspective, having a stable program that is guaranteed to perform its desired task is not only a reflection of the program, but also of the company itself. Poor products shine the light dimly on the company, so a solid and well-tested product needs to be ensured through bug testing, before manufacturing takes place.

Management doesn't always know about product flaws; company directors assume that every function works smoothly without any defects. However, experience shows that no product/system can be deemed completely secure without controversy. There will always be bugs in a program; whether they are found or not is another question. Open Source makes it much easier to spot bugs and code flaws, and active security checks by the public help create a much more stable and operable program. This is one of the reasons why Microsoft products fail consistently when it comes to testing; their products are not Open Source, and therefore it is much harder to create a secure and flexible program without the aid of the programming community to help optimize code.

The importance to the client, purchaser of the software, is without doubt a key aspect in performing their daily tasks successfully. If the program is vulnerable to overflows, lack of input checks, or even lack of encryption, it will quickly become known for its instability, and product sales will drop dramatically. Customers will purchase alternative products that perform the same task and that have been carefully checked by multiple tests, as will be seen in the testing section of this document.

There is a high level of ethics involved when a programmer is contracted to develop a program. The programmer is the top of the chain for importance in testing and coding a proficient software application. He/she is responsible for ensuring that all functions of the program work, and work efficiently; code optimization should be at its peak, with security functions in check. Better programs are known to have been thoroughly tested, with all sorts of data sets being properly dealt with. Operating systems like Linux are tested every day by programmers and crackers alike. Yes, security problems do exist in this environment, but most have are patched or fixed, pushing towards one of the most stable systems currently around.

Sloppy programmers will not care about ethics, and will simply code the program to minimally function with all its client side requirements implemented. Some programmers deem financial security more important than ethical security -- be careful whom you contract to fulfill your programming requirements.

Development Goals

Goals should be adopted by programmers to ensure software quality assurance, but the customer has a responsibility to communicate to the programmer once a bug has been found.

The vendor's goals

The primary goal of a programmer is to complete a working program that serves its purpose to client-side requirements. Once this stage has been reached, the more advanced and less known methods should be then put into practice, adding functionality such as:

  • Security features
  • Help support
  • Contact addresses

Adding security features is a must, and assures that code quality is evident within a program. Use of secure functions and methodologies/implementations should at this stage make itself known. This is where a gap between sloppy and aware programmers becomes apparent. All programs should aim for a level of code quality by utilizing the secure function calls within their specific programming languages, which helps create a more reliable and flexible program. Of course, one of the only certain ways to determine a program's reliability is through testing. Testing focuses on the need for rapid feedback and the evolving nature of the program under test. This is where clients/customers come into the picture.

The public's goal as bug testers

Although programmers bear the most responsibility in terms of code reliance, clients and customers also need to be prepared to communicate with software engineers if a bug or flaw is observed in a program. If the expected output is different from what is given, it's time to get in contact by means of a bug discussion list, email, phone -- whatever, but be sure to advise the correct people. It is important to inform product vendors before the public knows about it, especially if the bug could lead to increased privileges. This gives the vendors time to write patches/advisories for their clients before any damage can be done.

Testing software is always a step in the right direction. Effective bug testing by customers/clients will force the programmer to improve code quality and security in future products; that's why we must tolerate and thank the software task forces out there that make software vulnerabilities public, such as BUGTRAQ.

When reporting a bug, always be sure you can reproduce it, and always include a detailed description of exactly how the bug was found and the type of system that you tested the software on. The more information the better, but be sure not to obfuscate the description -- get as many of the basic facts down as possible. In particular, segmentation faults generally cause core dumps (a memory image of the terminated process when any of a variety of errors occur) which hold vast amounts of information to help the programmer locate where the bug took place. Remember, full disclosure is bliss.

Software Testing Strategies

Developing a program or system effectively needs to be thoroughly thought out before any raw code is actually written down. One of the most important methods of establishing functional requirements is a storyboard. Prototypes may consist of a storyboard, a sequence and series of screens, showing the end-user a typical scenario of using the program/system.

Functional prototypes

This is one of the most useful methods for making sure the programmer understands just what a program is intended to do. A functional prototype is a very limited version of the final program which gives some idea of the appearance of the final product, but with a lot of functions missing. Displaying a simple storyboard to a client or bug tester is necessary, as they will be able to comment on whether the "expected input" takes the "observed output" resulting from running the program. This will also force the programmer to think through many of the details of what the program is meant to do.

Test sets

Creating workable and effective sets of tests is intellectually challenging. Testing can almost never be exhaustive, and it may even be possible that not all programming flaws are evaluated even after very stringent testing has been covered. In the real "commercial" world, a significant source of program defects is created by people running tests and not checking the results carefully; the programmers run the tests, but do not take enough care in reviewing the results to see that the tests showed unexpected flaws in the programs.

Tests must be convincing, and must demonstrate a successful performance of the program. In a commercial setting, there are many methodologies used to produce a set of tests. One of the necessary tests that should be first evaluated covers the main function of the program. The programmer must decide on a set of tests that enable him/her to see if the code achieves its desired outcome.

All conditions of the program need to be thoroughly checked, including:

  • case, loop, and if/then/else structures
  • boundary conditions (e.g., with "IF $i<100 THEN ...", make sure that 99, 100, and 101 values for $i are properly dealt with)
  • exercises of all parts of the code (by designing a rigorous set of tests)

Naturally, sets of tests will assess the same parts of the program repeatedly. Known as "equivalence partitioning" for tests, it may seem like duplication, but it is standard economical testing. Perhaps part of the code works in one scenario, but not another -- this needs to be carefully checked. The first thing a programmer needs to understand is that testing will demonstrate the presence of bugs, but it will not demonstrate the absence of bugs. Semantic errors fall into this category -- that is, errors in the logic of the program, that the compiler or interpreter is unable to help you with.

Testings falls into two broad categories:

  • Defect testing
  • Acceptance testing

Defect Testing

This type of test tries to detect all the defects the program may have. All parts of the program should be tested, and if the programmer feels that one part of the code may not properly deal with unexpected input, more rigorous tests should be performed on that area of the code. One key point to remember in this is that nobody knows a program better than the programmer himself. The programmer will know the area of the program that is most likely defective, so a designed set of tests on that area should be practiced before a Beta release is produced.

Regression testing stems from defect testing, and is the process of testing changes within the programming environment to make sure the older program still works with the new implemented changes. Regression testing is a normal part of the program development process and, in the commercial world, is performed by code testing specialists. Test department coders develop code test scenarios and exercises that will test new units of code after they have been written. These test cases form what becomes the test bucket. Before a new version of a software product is released, the old test cases are run against the new version to make sure all the old capabilities still work. The reason they might not work is that changing or adding new code to a program can easily introduce errors into code that was not supposed to be changed, and thus will obscure test results. Recursive regression testing is a must!

Acceptance Testing

Acceptance testing is done in conjunction with defect testing, and runs an agreed set of sets with an agreed output. These should demonstrate that the code does an agreed task well enough for the programmer and client to be satisfied. In the commercial world, acceptance tests are part of the contract for defining what the customer insists on before money ever changes hands.

Structural Prototyping

Prototyping of this nature is relatively simple. Structural prototyping is a stripped-down version of a program that shows a structure, in skeleton form, of the complete version. All major aspects of the code are written, but routines and sub programs are written only as stubs, comments/statements within the program that show the programmer that the actual routine has been called or executed.

Maintaining effective code that is easily interpreted by the programmer and other developers (and allows further extensions to be added easily) requires three code characteristics:

  • Understandability
  • Adaptability
  • Cohesion

Understandability means that programs that are easier to understand are considered to be better designed than ones that do the same task but are harder to understand. A key to developing stable code is a good functional prototype that allows the general idea of the program to be observed before code writing takes place. It may also be necessary to note that better code is clear and neatly presented, spaced out where necessary with comments to let the reader understand what is going on.

Adaptability refers to how easy it is to modify areas of the code to perform alternate tasks. This is directly linked to understandability. The more understandable the code, the easier it is to adapt.

Cohesion refers to a routine or sub program that does one clear task which is apparent to the reader and programmer. A well-defined task should give a clear indication of what the program is intended to do; this includes well-chosen names for variables, constants, headers, etc. As small as this concept may seem, it allows any coder to pick up the source and quickly scan through and understand what the program is about.

Signs to observe

Whether you are checking the source for bugs or testing the executable for flaws, all of the above tests need to be considered and exercised. It's most common that bugs present themselves in boundary structure conditions. When designing a set of tests, it cannot be stressed enough that boundaries need to be checked on both sides of their "walls". Other flaws that should checked before releasing a beta include the malpractice of dealing with format control bugs such as %s. The programmer must employ capable input routines/parameters to correctly deal with user-supplied input, ensuring that all possible scenarios have been considered before adopting the most suitable code to perform the given command. This includes identifiers themselves, avoiding use of getenv(), strcpy(), and sprint() wherever possible in exchange for more secure methods like strncpy() or snprintf(); the "n" refers to the number of bytes allowed to be copied to a buffer. Avoid common mistakes often used by sloppy programmers to get user-supplied environment variables from the terminal or environment. Establish your own method of setting or checking the environment, and make it insusceptible to malformed data that could lead to unexpected outcomes such as spawning a shell -- a definite security risk, and one that is often observed in many UNIX environments. (Early ZGV [a console graphics viewer] releases were always victim to getenv('HOME') problems.)

Another way to use acceptance testing to expose flaws is to use the proper data set meant to be sent to the program, but to send extensive data to a particular input command, such as sending 1024 bytes to a 512-byte buffer, causing an overflow.

Sometimes, when a program appears to have decreased its efficiency in terms of speed or processing of the data, it may be directly linked to a heap or stack overflow caused by corrupt data being entered. At this stage, vital tests need to be conducted by the bug tester.

Let's take a real life example of a program that I exposed with a flaw not long ago, the WinSMTPD mailer/pop3d daemon, versions 1.06f and 2.X.

After acceptance testing this program, everything worked well. All the desired tasks of the program were fulfilled, and the smptd and pop3d servers performed their tasks efficiently. Now, here is where defect testing came in to play:

To start an SMTP transaction, the client needs to send a "HELO %s" call, where the format string "%s" is your hostname. WinSTMPD only allowed a fixed buffer of 170 bytes before the expected output became unexpected. When I sent 150 bytes after the HELO field, the program noticeably paused before proceeding to function as normal. That told me that one of two things had happened:

  1. It had been poorly coded in terms of speed, OR
  2. It didn't deal with boundary tests, with excessive data being entered.

As it turned out, WinSTMPD was vulnerable to a stack overflow. By sending 170+ bytes to the HELO field, I got:

WINSMTP caused a general protection fault
in module WINSMTP.EXE at 0003:00002359.
Registers:
EAX=461e0001 CS=42e7 EIP=00002359 EFLGS=00000246
EBX=00807fe0 SS=4207 ESP=00007e36 EBP=00004141
ECX=00010283 DS=4207 ESI=0000544c FS=05c7
EDX=58600000 ES=461e EDI=00001547 GS=0000
Bytes at CS:EIP:
cb 49 73 49 63 6f 6e 69 63 00 00 58 4c 6f 63 00 
Stack dump:
41414141 41414141 41414141 41414141 41414141 41414141 
41414141 41414141 41414141 41414141 41414141 41414141 
41414141 41414141 41414141 41414141

Obviously, this isn't what the programmer had in mind when performing an SMTP transaction. The 41414141 that appears on the stack is "A" binary value, which I had filled the buffer with. From this general protection fault, we as bug testers and programmers are able to ascertain that this 16-bit program (judged by the leading 0s within the memory registers) has successfully overwritten the EBP register (+4 bytes for EIP), and as ethical programmers/bug testers, that's all we need to know to fix this bug. If there were, say, an unethical cracker out there, loading up the stack with malicious data could allow arbitrary code to be executed from the stack, and anything is possible from there. This is why it is important to test for bugs, and especially to check the boundaries and the data that is allowed to be sent by the client/user.

Although I approve of people writing "proof of concept" exploits to expose the existence of a bug in a program (as I am a firm believer in full disclosure and an advocate for Open Source), it is not ethical or urged to run these scripts without the direct consent of those you are exploiting. (POC exploits are necessary in whitehat cracker security firms to prove and demonstrate a code flaw.)

Data sets and tests computed to the program/system are effectively system calls executed by active processes. These include different kinds of programs (e.g., programs that run as daemons and those that do not), programs that vary widely in their size and complexity, and programs with different purposes. Spawns or fork()s by applications are tested when the maximum process limit is exhausted by various resource-depleting exploits; this too needs to be prepared for when making a heavily-used program. Normal computed data can be "synthetic" or "live". Synthetic traces are collected in production environments by running a prepared script, often called a driver program. The program options are chosen solely for the purpose of exercising the program (acceptance testing), not to meet any real user's requests. Live normal data traces of programs are computed during normal usage of a production computer system (manual specificities of code testing and boundary testing). Both methods are often put to test when processing en masse software applications.

Bug Discovery?

So, you think you've found a bug? Then read on, here's what to do next:

Alert the vendor

If a user has somehow stumbled on a logical error or security vulnerability within a tested (beta or stable) product, it's necessary to report the bug immediately to the vendor. More of this was discussed in the "Development Goals" subtopic, but visually displaying a practical advisory was not. The bug report should include most, if not all, of the following information, generally in brief conceptual form:

Bug synopsis: a brief paragraph explaining the vulnerability
Description: the sequential steps taken to produce the bug
Attachments: any relevant materials, such as core dumps and message logs
Environment: system specifications and conditions used to test the bug
Contact info: how the vendor can contact you with further comments/queries

Alert the clients

If the bug has been accepted by the vendor as being a vulnerability that could lead to such problems as network/software penetration, increased privileges, or excessive system resource usage, the vendor should issue a public advisory through mailing lists, the vendor's Web site, and/or direct email to customers. It's then the responsibility of the programmer/manufacturer to offer instructions to the client to patch his/her software/system so the vulnerability is removed. The advisory should include the following information:

Date: date of the advisory's release
Affected systems: a list of the environments/settings in which the bug may occur
Description: similar to the client's description, but with more technical inside info
Patch: the URL of the patch or description of how to correct the bug
Contact: how clients can contact the vendor for more info -- phone, e-mail, URL

This communication link creates a much friendlier atmosphere between users and vendors, which helps software development become a more stable and reliable community -- one that excels in safe security practices.

Final Note

I made a generic resource kit earlier this year. It consists of seven skeletal template scripts coded in Perl for various purposes of testing network services in a Linux/Unix environment. It includes tests for malformed HTTP "GET" requests, multiple thread connections, random data streaming, ICMP error generation, etc. It's mainly used as a research and development kit to help spot bugs more easily, particularly in server/router software; feel free to expand it. It can be downloaded from http://dethy.synnergy.net/reskit.tar.


Luke Andrews (dethy@synnergy.net) works as a UNIX systems administrator for Errata Internet Solutions (a new Web hosting/shell/security company), and does Research and Development for Synnergy Research Labs as a hobby. His own code (and advisories) can be found at http://dethy.synnergy.net/.


T-Shirts and Fame!

We're eager to find people interested in writing editorials on software-related topics. We're flexible on length, style, and topic, so long as you know what you're talking about and back up your opinions with facts. Anyone who writes an editorial gets a freshmeat t-shirt from ThinkGeek in addition to 15 minutes of fame. If you think you'd like to try your hand at it, let jeff.covey@freshmeat.net know what you'd like to write about.

Recent comments

17 Oct 2000 14:38 Avatar charlesmiller

Test First
One of the very best coding/testing techniques I've ever come across, is taken from Extreme Programming (http://www.extremeprogramming.org/). It's not a magic bullet - nothing is, and while it won't prevent all bugs, it's the best way I've seen yet to reduce their number before the code is released into the wild. The technique is simple - Write the tests before you write the feature.

The way it works is this:

Consider the functionality of the code you want to write. Decide what you want it to do, and what you don't want it to do.
Pick one thing you either want, or don't want your code to do, and code a test around it. The test must be fully automated - able to tell by itself whether it was successful or not.
Write enough code for your application in order that the test will compile. If after doing this, all your tests succeed, including the new one, return to step one. If the test fails, continue.
Write enough code that the new test succeeds. Ensure also that all previous tests still succeed.
Return to step one.


The advantages of this system are legion.
Testing first means more testable code.
Writing tests for previously written code can be a total nightmare. Often, it's impossible to write an automated test for something vital, because the code you're testing isn't built so that it can be tested. By writing the test first, you ensure that everything you need to test can be tested, easily and automatically. Those parts of a system that are really difficult to test automatically(GUI for example) are more likely to be properly separated from things that can be tested.
Often, you will find this leading to cleaner design of your code overall.
Testing first can prevent over-engineering
You know that you have finished programming when you run out of things to test. Because at each step, you have only done what was necessary in order to cause the test you wrote to succeed, this means that you haven't coded anything into your application that is not necessary.
Note that when programming incrementally like this, you must always take time to refactor your code, in order to make sure you don't end up with an unreadable mass of spaghetti.

Testing first leads to more maintainable code
When you make a modification to something you haven't looked at for half a year, you still have all the tests that convinced you then that it was working six months ago. If they still run, that means that you are as convinced now that the code works as you were before. You have a complete set of regression tests as an integral part of the coding process, not as some tacked-on-as-an-afterthought test suite.

This means that you can allow more people to modify more of your program, provided that they also follow the "test-first" rule, and that they make sure that all tests are green before they check any code in.

Testing first can help you program

Often, by the time you've finished writing the test, you've already worked out the best way to code the feature. Because you've already thought of what could go wrong, and written code to test it, when it comes to write the feature, you've already thought through how you are going to implement it, and what pitfalls you might have to avoid.

Try it some time. If you're stuck how to code something, change your mindset, and try to think of how you'd automate a test to see if it was successful. You'll find that when you do that, you'll have solved your problem indirectly.

Testing first forces programmers to think defensively:

Because the way that you determine when you've finished coding is "can I think of anything more that needs testing?", programmers are forced to continually ask themselves, "What could I do to break this?" These questions are asked right at the time the programmer is closest to the code in question, not some time later during quality control or security audit sessions. You're much more likely to notice an unchecked buffer or too-trusted input in this state.

You also find yourself more likely to think in the middle of the night, "Oh! There's something I didn't test!".

Bugs immediately become part of the testing system

When someone finds a bug in your code, write a test to reproduce the error. If the test inspires you to think of other things that could go wrong, then write more tests. Writing the tests will often show you how the bug can be fixed. These tests then become part of your regression suite, and you are protected in the future.

Charles Miller

25 Sep 2000 02:28 Avatar dethy

commercial realities behind bug testing
The article was about &quot;The importance of bug testing&quot; not &quot;finding every possible bug&quot;.
You said &quot;commerical realities don&#039;t match&quot;, but in fact they do, unless you are apart
of a firm that doesn&#039;t consider security and consumer satisfaction as one of it&#039;s
priorities. Leading software companies, will reply to bugs within a few hours, and will
establish a patch after suitable testing has been attributed to various systems. I will
point that this was the case for the recent telnet client buffer overflow in BSD/Linux
that Synnergy Networks (www.synnergy.net) stumbled upon. The FreeBSD and Slackware security
advisors were quite prompt in answering and working on a patch.
You will find that if say patch A is applied to correct a bug, and patch A is
superceeded by patch B to fix a bug that was present in patch A, that this kind
of remedy was a &quot;hot fix&quot; - something to quickly solve the problem. Microsoft
often releases these, but conglomerates them all into a Service Pack(sp) later on -
which will in turn fix the problem.
Testing software is important, no matter what the scenario, or eations
there may be. Although you are quite true in saying *some* circumstances may go
untested because of the potential danger involved, but this article is not specifcally
dealing with life threatening mechanisms such as Nuclear power plant meltdown. You will
probably also find that these larger machines are fault tolerant, most software (the kind
available on the net or stores) are not fault tolerant because of the exceeding monetary
pricing, far beyond reach for the normal consumer. Additionally, these fault tolerant
systems are often run through simulations to test for presence of bugs, but be assured
life critical systems like this, are thouroughly tested before being implemented.

24 Sep 2000 23:54 Avatar aebrain

Impotence(sic) of Bug Testing

Good article on the ethics of fault finding. But the commercial realities don't match. Buggy programs do not lead to commercial failure. The current pace of change means that (buggy) version 2 is replaced by (buggy) version 3 (with a much bally-hooed 2 extra bells and a whistle) before people can get too frustrated with version 2's many failures.

To add insult to injury, any commercially successful firm then has a Helpline where they charge the customer for reporting bugs, with no guarantee of a fix - which if it occurs will be in the next version, available RSN at a modest cost. Add a few (hundred) patches available through the net, and all responsibility can be evaded, as no two systems are likely to have the same configuration.

As an aside, safety-critical systems such as airliner avionics require quite sophisticated testing techniques. This is often because they are extremely fault-tolerant, so much so that there may be many, many bugs which are asymptomatic - only subtle tests and instrumentation of the code will show a markedly reduced efficiency as the back-up to the back-up to the back-up gets triggered. Related to this problem are the many parts of the system which cannot easily be tested: they are for contingencies that both may never occur in a system's lifetime, nor are easily simulated. For example how DO you test whether a Nuclear power plant will survive a magnitude 7 earthquake?

24 Sep 2000 10:20 Avatar izar

Nice job!
Clear, concise, straight to the point. A bit on to the security side of
bug hunting, somewhat ignoring the obvious facet of bug testing in regards to algorithm completeness, functionality, and overall integration in the environment (so important in these inter-operability times we live in). But then...that would be a book :)

24 Sep 2000 09:37 Avatar hiro45

Bug Testing
Thank you for this clear and well said article on bug testing,
I am now fully aware of the importance of bug testing and open source!!

In your face MICROSOFT!

Screenshot

Project Spotlight

Kigo Video Converter Ultimate for Mac

A tool for converting and editing videos.

Screenshot

Project Spotlight

Kid3

An efficient tagger for MP3, Ogg/Vorbis, and FLAC files.