Articles / Is Software Testing Product...

Is Software Testing Production or Service?

Chang Liu writes: "Most of us would probably agree that any software package, open-source or not, can gain high quality only after rigorous testing. One source of the credibility of open-source software is the fact that it is tested by a large number of knowledgeable testers who have access to the source code and know what's going on. Yet we seldom discuss what contributes to good testing practices. Sure, everybody tests in a different way, just as everybody codes in a different fashion. But in the end, there are good practices and bad practices. It benefits the community to spread the word about good testing practices."

So what exactly is software testing? The traditional academic view thinks that software testing takes programs and specifications as input, and produces a bug list as output. In other words, software testing produces bug lists for development teams. Others, especially those in the commercial world, have different expectations. They view testing as a service to development teams. Testers are expected to provide almost instant feedback at all times while programs and specifications keep evolving.

This article discusses the "production" view and the "service" view of software testing and tries to find out their impacts on software testing techniques.

The "production" view of software testing

Traditionally, the problem of software testing was stated as such: given a program and a description or specification of what the program does, find out under which conditions the program does not behave as expected. There are generally two types of testing techniques used to solve this problem. One type is program-based techniques, also known as white-box testing. The other type is specification-based techniques, also known as black-box testing.

Program-based techniques develop test cases according to program structures. The central idea is that program control structures and data structures determine program behaviors. If test cases can sufficiently cover all control structures and/or data structures, we can be reasonably confident that most program behaviors are examined. Statement coverage, branch coverage, and path coverage are example techniques used by white-box testing.

Specification-based techniques do not assume knowledge of internal program structures. Instead, they depend on the problem specifications or descriptions to determine which test cases should be used. The central idea is that if a program is supposed to solve a problem, as long as the problem is solved, it doesn't matter how the program is constructed.

Both types of these traditional testing techniques assume that there are static programs or specifications to work on and that a list of bugs is all development teams need.

To support "bug-list production", techniques are developed to cover the program-under-test more thoroughly under a pre-selected coverage criterion, to achieve a higher coverage with a smaller number of test cases, and to execute test cases more quickly. New coverage criteria are also invented to cover different aspects of the program-under-test.

The focus here is to produce more thorough lists of bugs, in other words, better products, even if it might take a longer turnaround time to provide such lists.

The "service" view of software testing

In practice, testers' jobs are sometimes more subtle than simply producing bug lists. I once asked a test lead from a large software company what his most important responsibility was. The answer was quite surprising to me at that time: the most important thing was to know the status of the software product at all times. After I thought about it, the idea became quite reasonable. Clearly, when both the program-under-test and the description of the problem are changing everyday, it is not feasible to produce a comprehensive bug list for each daily build. Nor is it necessary. It is more useful to the development team if testers can provide constant and rapid feedback on the status of the current builds. Overview information is as important as individual bug reports. This "service" view of software testing focuses on the need for rapid feedback and the evolving nature of the program-under-test. Just as with many other services such as phone services, the need for rapid responses is paramount. When a person picks up a phone, she expects to talk right away; when a development team gets a build done, they expect feedback right away.

To perform software testing as services, testers must be able to quickly find out the status of a new build. Automated test execution and result verification seems to be a logical way to go. However, most current test automation tools and techniques are closely tied to implementation details such as user interfaces. They are extremely sensitive to changes in the program-under-test. This creates a dilemma. On one hand, testers have to automate tests to provide rapid feedback. On the other hand, automated tests don't work very well with updated programs and thus sometimes slow down software testing. There is no perfect solution to this yet. More abstract test descriptions may be able to decouple test cases from implementation details in the future.

A key question here is, how do testers perform a small number of test cases on each build and still gain an good overall knowledge of the status of the entire program? In other words, how do they determine which test cases should be used on which builds? How do they combine the results of different test cases executed on different builds and make sense of it? I'm sure many testers are experienced enough to do this, but until we can clearly state how we do it, we cannot claim that we know how to engineer it and that we can do it successfully in the next project.

In the case of open-source software development, there are usually no deadlines. Still, builds are updated daily or weekly in many projects. It is likely that when someone declares "Hey, I just achieved 80% test coverage for project X based on test criterion Y." (if one ever would), the build she uses probably is an out-of-date one. I wonder how people in successful projects such as Emacs, Linux, and Apache put together feedback and determine stable builds. Or do they determine a build to be stable before user feedback? Is there a systematic way to separate stable builds from other builds?

What do you think?

The production view and the service view of software testing are certainly not entirely incompatible. Many testers who provide testing services are doing a good job using techniques developed for bug production, and make ad hoc adjustments to them to work with evolving environments. However, I think it is in the best interest of the software community to contemplate what we expect from software testing and what is the best way to provide it. I can't wait to hear what freshmeat users have to say.


Chang Liu is a member of the Rosatea group (Research Organization for Specification- and Architectual-based Testing & Analysis) at UC Irvine. His research interests are centered on software testing automation, software quality assurance, and software engineering in general. He is currently working on TestTalk -- a comphrehensive testing language.


T-Shirts and Fame!

We're eager to find people interested in writing editorials on software-related topics. We're flexible on length, style, and topic, so long as you know what you're talking about and back up your opinions with facts. Anyone who writes an editorial gets a freshmeat t-shirt from ThinkGeek in addition to 15 minutes of fame. If you think you'd like to try your hand at it, let jeff.covey@freshmeat.net know what you'd like to write about.

RSS Recent comments

19 Feb 2000 11:09 frankcast

Missing the most fundemental aspects of software development.
Issue

Given the authors bio, it is amazing that he missed the point that testing does NOT begin with the code.

Process

If a software development effort is not part of a process by which the requirements, analysis, design, and implementation have review and test coverage then it will probably have more defects, and not just in the code. Of course "hello world" can forgo this process, but if the objective is anything with any meat on it, do yourself a favor and plan the process.

Request Review

Just because someone thinks a feature would be neat or cool, doesn't qualify it for inclusion in the deliverable. A reasonable review should be performed to validate it in the scope of the effort. A few of the questions that should be asked are:

What is the source of the request?
Does the feature fit in with the character of the target system(s)?
What are the advantages and disadvanteges?
What benefit does it provide
What is the potential cost in terms of resources?
What is the risk involved?
Is there enough information about the feature request to create a requirements document?

Requirement Review

Requirement specifications are not just the request put into a simple document, database, sticky note, or memory. The document should include every possible aspect of what the feature will and will not do. The layout of the document is a function of the organization but should include, at a minimum, statements that cover:

Roles - who are the users by category?
Use Cases - What are the uses by each role that the feature need consider?
Participants - what other systems participate in the feature?
Assertions - these are statements of what the feature will and will not do, using terms such as SHALL, SHALL NOT, WILL, WILL NOT, or WILL SUPPORT and WILL NOT SUPPORT.

This, from my experience, is the most critical point at which the test group gets involved. They (you, me?), being part of this review begin to understand objectives for their test critieria. If the test machine is not involved at this point it becomes more expensive at the end. And I don't mean just money!

Use Case and Analysis

General Definition: A activity by which the Roles are mapped to each of their respective uses. To give a clear and consistent description of what the system will do.

This begins the identification of what relationships exist and are neccessary to produce a result. In addition, this step may potentially identify roles, usage, or participants that had not been originally considered. This is also critical for the test group that is covering an existing system as it provides insight to the potential regression areas.

The end result is, at this point, that the test group should certainly have a handle on what system, and possibly functional, tests will be realized.

Design

A number of steps that fortify the test requirements include:

Use Case systems are grouped into functional categories.
Technical aspects are added (classes, libraries, other software, networks, etc.).
All inputs and outputs are clearly defined. Relationships between components, classes, functions are fully defined (e.g. cardinality and scope).
Instance interactions in the face of synchrounous, asynchronous, balking, latency and reaction to these are defined.
State diagrams and/or flow charts to support interactions are produced.
Strong emphasis on constraint definition.
Exceptions, error conditions, rollbacks, recovery, and the like are clearly defined.

And such is the meat by which white box testing can evolve. For lack of a test group that can participate at this level, the developer will clearly be able to produce code that does what it is supposed to.

Implementation

There are aspects to implementation where test can and should play a mighty role:

Code review - having been involved from the begining, the test group is in a good position to prevent a defect before it is even checked in. Standards reviews can also come into play if they exist (and they should).
Test plans, scripts, loops, tools, as so on can be ready. If white box testing is to be performed the potential exists that the testers are WAITING for the developers to get it on! The feedback loop latency is minimized.
System testing is fully implemented to cover all of the input and output verification as described in the Requirements.

In conclusion

Waiting for code to be developed before testing is a recipe for disaster. If, us in the open source and open development efforts, don't consider process before starting the editor and relying on the user community as the test group may be wasting ours and their time.

While the focus on this has been test, it should be clear that the developers will have access to the same steps as the testers (and yes, I know, it may be the developers that do all of the above work). To produce wildly defective code at this point may require a review of those involved, or more importantly those not involved.

19 Feb 2000 11:24 jnewbern

Randomized Testing
I have found automated randomized testing to be very useful where it can be applied, both for finding bugs and for quickly estimating the quality of a build.

pros: much better coverage than human-made tests in less time than using a coverage tool, MTBF gives a quick estimate of quality, you can run far more testcases than if each one was hand-crafted.

cons: writing an effective randomized tester is a difficult skill to teach, random testcases are sometimes diffcult to understand, reduce and debug.

what makes writing good randomized testers difficult is that you want to automatically generate testcases that cover the entire valid input space, but generate no invalid inputs. the process of eliminating invalid inputs, if not done carefully, can reduce your test coverage and cause you to miss an important bug.

the other difficult thing with random testing is knowing the correct program behavior for a particular random input sequence. crashes are obviously a bug, but beyond that you need something that can identify an invalid output. this is no problem when checking that the answer is correct is easier than finding the answer in the first place, but what about when checking the output is just as hard as generating it? then, i have used techniques like comparing the output from two different programs with the same specs, two different builds of the same program, the same program with two testcases that should generate the same output, or against a less-efficient, but easier to write program following the same spec.

btw, i apply random testing below the GUI level. my gut feeling is that it would be much more difficult to apply this to a GUI, because
testcases need to be randomized about some meaningful navigation sequence through the GUI. just randomly clicking buttons isn't likely to get you good coverage.

jeff newbern

19 Feb 2000 11:55 dank

testing
When I worked at a biggish game company, they used service
testing early on, and production testing towards the end.
The test team was told what parts of the game were ready
for test, but little else. There was no formal process.
It was quite expensive; it would have been much cheaper
if we had hired a couple more really good programmers and engaged
in better development practices (e.g. software inspections,
see
www.ics.hawaii.edu/~jo... or

www.west.net/~steveco/....

I agree that testing should start before code
is written -- even on a one-man project.
In my own work, I find that writing simple unit
tests for each class while (or before!) coding
each class is a big win; not only are my classes
better understood and less buggy, but much of the
debugging takes place in a nice, happy unit test
rather than a big messy application. I now use gnu
automake, and it generates a simple makefile
rule to run all my unit tests. This comes in handy
for regression testing.

For any multiuser system, writing a system-level load test
should be done long before the system goes to QA, and
used by the developer to find his own dang bugs.
The load test can then also be used by the test team.

In a way, I feel that developers themselves are directly responsible for the quality of the code received by the
consumer. Managers should NEVER tell a programmer
"Oh, don't worry about finding all the bugs; QA will do
that for you," becase the sad truth is, catching bugs in
QA is far more expensive and difficult than in development.

19 Feb 2000 12:57 zenshadow

No invalid data in test cases?
[Amusingly, Netscape locked up just as I was about to submit this comment the first time. How ironic. Someone should forward discussions of software testing to Netscape, along with a few really good tomes on the philosophy behind the design of efficient and robust software.]

In any case, I don't feel like retyping the entire post that I typed before, so I'll just say it really simply:

Someone above mentions making sure that the inputs to all of your test cases are valid. This is not a good idea, IMO. Your program should behave sanely even if it is given completely bogus input (the results might not mean much, but it should do things like bounds checking, not blowing up, etc...). Example: If you're unit-testing a function like this:

void stuff(char *target) {
*target = 1;
}

This is going to generate a segfault if you pass it a NULL pointer. So the test case should include stuff like that. Nothing in your program may call stuff() with a NULL pointer, and it may not even seem possible - but in the future, four developers and thousands of lines of code later, that annoying little function is going to be the source of some big, hard to track problems, and wouldn't it be nice if it at least died -gracefully-? ;-)

In any case, point made. I'm off down the street to Netscape to find the engineer who had the bright idea of a multi-level undo buffer with unlimited depth (or whatever it is that keeps causing netscape to grow to 100 MB when I type long messages in textareas...).

-ZS

19 Feb 2000 13:23 jnewbern

RE: No invalid data in testcases?
the invalid inputs i was referring to are not cases like the example given. error handling can and should be tested with a random tester. but in some cases a spec explicitly states that a given input produces undefined results. in that case, there is no possible way to test that input, except to detect the invalid input and allow anything to happen, which isn't useful.

another case when this occurs is when an input allowed by the spec triggers a bug, but the testcase is so obscure that it is decided that fixing it is a low priority (this DOES happen in business). when that decision is made, you want to eliminate the offending input so that you can continue testing without generating a lot of false positive testcases.

19 Feb 2000 16:25 andylongton

The method depends on the environment...
There are many good comments here, so I'll keep mine fairly brief and to the point -- hopefully not too simple.

I've worked on both mass market and custom projects, and the methods chosen are exactly as you've mentioned -- both for practical reasons.

Retail/mass market software is marketing driven, and has no pressure from a customer to contain specific functionality. It does have strong time pressures, though, and must show that the results are a bargain at the price being asked.

Contract/custom software is specification driven, since the contractor (or internal department) needs to see an end to the project. It's also price driven, but tied directly to the task at hand, so there are fewer features and the result should be very efficient.

With open source projects, you usually have a mix of both; the features that creep in with the mass market software, and the limited scope of the custom software project. The developer(s) seldom have a reason to listen to anyone else if they disagree -- so they make the program they want.

Having said that, the formal, specifications driven method, is generally superior from a quality stand point. Without an end to testing, it's impossible to say just how many defects a product has. In most cases, even in financial applications, defects are acceptable...as long as they do not cause data loss or corruption.

In most open source projects, data loss or corruption are usually promoted as real possibilities by the developers...yet I have not lost any work from using an open source program...unlike other programs *cough!* *Word* cough!* *Access*. Sorry, had to cough.

Having said that, many open source projects follow strict specifications, so testing them should be limited in scope. When they don't, the people involved wanted to write the software, so the results tend to be of higher quality becuase those involved actually care.

For example, the TCP/IP stack either implements the specs or does not. When the specs can be read either way, it's the responsibility of the test/QA/VV&T person to identify the choice actually taken and verify it with development. At that point, as long as the choice is consistant with other choices, the word of development is taken as the end point.

Keep in mind, though, this only occurs when the specs can have reasonable alternate interpretations. Most of the time, this is not the case, and the specs decide if a test result is correct or incorrect.

Also, with open source there is an implicit efficiency; software with a high level of peer review is ready to steal. As a developer who I respect greatly told me: "10% of programming is careful coding, the other 90% is knowing what and who to steal from".

19 Feb 2000 23:18 ksmss

building in a test path
As far as testing systems is concerned, it really helps to have an automatic or semi-automatic test path built into the product, starting at the high-level design stage. There should be a lot of points in the development process where scripts and utilities can be used to check people's work objectively.

It makes a difference to build on an architecture that has enforces good structure, and where validation utilities can be used to create detailed build logs. This may mean thinking about software design in a non-traditional way.

Applications and systems that do not have a test path incorporated in them are a nightmare to diagnose when they are put into service.
Usually, they are required to interoperate with other systems that are similarly lacking this feature. When these interdependent systems are put into operation, and a problem occurs, it is hard to figure out which is the offending part of the process, becuase none of the applications leave an audit trail.

Ultimately, poorly-designed systems never end up getting redesigned, due to expense and organizational inertia. Instead, humans are expected to make up for the application's shortcomings, by performing the task of a computer, just to keep up the illusion that things are running smoothly.

People's time should be freed-up to spend on things that computers can't do - things that don't require mindless repetition.

19 Feb 2000 23:34 markbullock

open source test cases
I recently tested a custom FTP server for the driveway.com service. I was wishing for open source FTP test cases, although the FTP RFCs provided a decent spec.

If anyone is interested in contributing test plans or test cases, I can probably supply storage and organization on www.sasqag.org.

Also, how does Linux get tested? Is there a public web site with test related information?

Thanks, Mark

20 Feb 2000 13:25 jimcox

Stability verification amidst rapid releases
I must say, that it's pretty strange to see rational discussion of software testing - it's a part of the industry that's long been overlooked.

Rapid release cycles can deliver stable builds. The process is inherently difficult, but it can be done with extensive communication between the development and the testing groups.

First, the development team must meticulously maintain a change log, and before the release, communicate these changes to the test team.

The test team can then either determine with white-box methods which areas to test, or obtain the information from a less-partial developer.

This allows the test team to proceed with focused testing on areas that have been changed, in addition to the broad and shallow 'smoke-test' that is applied to the entire product before general release.

This strategy is entirely dependent, of course, on sound design and testing principles as Frank outlined earlier.

25 Nov 2003 02:16 robdavis

Re: testing
I agree... it is a good idea to start testing the software as early as possible.

Rob

Rob Davis, PE

Software QA/Verification/Validation/Test Engineer

www.robdavispe.com

30 Nov 2007 08:46 byeaw24

Re: Missing the most fundemental aspects of software development.
What you said is true, however you missed the whole point. He didn't miss the fact that "testing does not begin with the code." That is a well established tenant of software engineering. Notice how you titled your post: "...of software development." You are ranting about improper software development, not improper testing. The fact is, even with proper software engineering practices, their must be a testing process before new software is released. That is the focus of the discussion. Also, it seems somewhat rude to post such a large amount of text which looks like it has been copied and pasted from a website on software engineering. You could put in a link instead.

02 Dec 2009 11:51 Avatar A1QA Thumbs up

Look:
"until we can clearly state how we do it, we cannot claim that we know how to engineer it and that we can do it successfully in the next project" - quality inconstancy.
"the need for rapid feedback and the evolving nature of the program-under-test" - features of intangibility.
"there are usually no deadlines" - extended timeline, results in each period are different
These all are distinguishing characteristics of services, not products.
Therefore, I'm more inclined to the "service" view of software testing.

Dmitry Plavinsky, QA Manager
www.a1qa.com

Screenshot

Project Spotlight

WiKID Strong Authentication System

A two-factor authentication system.

Screenshot

Project Spotlight

RainbruRPG

A 3D multiplayer Role Playing Game project.