Articles / Coding Standards: Good Idea…

Coding Standards: Good Idea or Subtle Evil?

In a view-from-the-trenches editorial, Josh Fryman discusses coding standards, why they may be a necessary evil, and how they can sometimes overstep their bounds and inflict pain on the programmers who have to live with them.

I've worked in the industry for several years now, and I want to ask you all something: can these coding standards being used rampantly throughout programming society actually be useful? Or are they just some interesting notion that got so far out of hand that no one can stop it?

Hmm, that's a tough question. Let's look at some of the background, and what I mean by "coding standard" since we all know that's an oxymoron. First, we'll take a quick review look at a generic form of variable and function naming called Hungarian notation, and then we'll take a look at some sample corporate coding standards and why they were created. Then we'll look at the implications of these.

1. Hungarian Notation

The name "Hungarian Notation" is a joke. What does that tell us right away?

A fellow by the name of Charles Simonyi (a Hungarian, not surprisingly) developed it. He joined Microsoft back in 1981, and took this notation system from his work in the 1970s at Berkeley and Xerox Palo Alto to the evil empire, who subsequently spread it by force to the Microsoft planet. When he started using it, the other programmers thought names like "khprguchOkIcon" (yes, that's a valid form - a constant huge pointer to an array of unsigned characters named OkIcon) were nonsensical, as though it were Greek -- or in this case, Hungarian. Somehow this got tied to calculator form RPN, and the result was "Hungarian Notation."

The idea is that a common prefix and tag type are associated before a variable name so you know what data type a variable name is carrying. This was developed for weak-typed languages (such as C) to help programmers know what the data type was they were working with, to prevent miscast or bad assignment operations. Many subtle bugs come from these situations, and it seems like a good idea on the surface. For languages which are strong- or medium-typed (like Pascal), this is unnecessary, as the compiler will not let you assign an integer to a character without the proper cast operation.

While some simple examples might be "bOk" (Boolean-type variable Ok), or "wCount" (word-type variable Count), the complicated example above shows just how quickly those prefix letters can get out of hand. Suppose you have an icon image which is placed as constant to move it into the code space, freeing precious RAM in an embedded product. How bulky would it be for you to type "khprguchOkIcon" as opposed to "OkIcon", and just how useful are those prefix characters? Which one seems more readable to you, if you have to perform code maintenance?

DrawIcon( &OkIcon );
DrawIcon( khprguchOkICon );

It's amusing to note that because this erupted in the Microsoft empire, anyone who writes Microsoft-related code is likely to use this notational system; moreover, they're likely to use it in other places as well from habit. Most UN*X programmers avoid this notational system like the plague they perceive it to be. But with the rapid growth of programmers moving to Linux, it's inevitable that this notational system will be appearing in code near you. The question is, will it stick around, or be kicked out? Or is there perhaps a happy medium that will be reached somewhere between not having it and having it with messy variable names?

2. Coding Standards

Coding standards are template documents that anyone working on a program of more than a few thousand lines is probably familiar with. As the code becomes larger and more complex, it tends to gather more authors. As each programmer invariably has his own programming style, some effort is spent to keep everyone coding in the same format. Do you place your {}s on the same line as the if/else, or do you use them for white-space offsets by placing them on lines by themselves? Does the person next to you code the same way? Does everyone in your group of 100 engineers code the exact same way?

No, of course not. Everyone is a little bit different. You might name a variable "day_count", they might name a variable "DayCount", someone else might name it "count_of_all_days_in_one_work_week" (someone from a COBOL background, perhaps). The point of the coding standard is to prevent these issues, so the code is a little more uniform, a little easier for everyone to read, and simpler for other people to analyze, debug, and maintain in general. Coding standards typically address not only style (where you place your {}s), but also naming (Hungarian notation).

Of course, no one really likes writing in a format which isn't his own, but for the greater picture, people are willing to tolerate these things. Imagine a bunch of COBOL and FORTRAN programmers who learn C and decide to work on the Linux kernel, coding in their old indentation formats but using C. How readable and maintainable do you think that would be? Are you shuddering yet?

3. So what?

So what exactly is the point? Like many things which seem like a good idea on the surface, are these actually good ideas in practice? It seems a subjective issue, one that is defined like a moral issue -- there is right, there is wrong, but there are many shades of grey.

An important first question: when do these standards work? If you're maintaining someone else's code, or even your own several years later, the variable names will be unfamiliar. Worse, if the code was poorly written, with functions that run hundreds of lines long, you may not be able to readily recognize if "data" is integer, character, pointer, structure, or some other beast. In this case, a "puchData" might rapidly tell you that this "data" is of the type "pointer to an unsigned character" rather than anything else.

But let's take a closer look at this argument. If the function you're looking at is hundreds of lines long, should you be trying to maintain it? Or was it poorly written in the beginning? Wouldn't it be safer to study the function and break it into smaller functions? Worse, with a poorly-chosen variable name like "data" as opposed to "input_data", we're left wondering what data "data" is holding. By choosing better variable names and writing better code, we would avoid the problem.

The counter argument is global variables, which may be defined in a separate file. How are you supposed to know what types they are? Well, as with local variables, a good name will solve the problem.

Theorem One:

Hungarian Notation exists to make up for bad coding practices.

Good programming practices can avoid the situations this notational system was designed to work around. Rather than work around the problem, shouldn't we fix the problem itself?

Well, what about coding standards? Coding standards appear to be a necessary evil in today's world. There are too many schools of thought on where to place your {}s, how far to indent the next block, whether tab characters or spaces should be used, how many spaces make up a tab character, and so forth. Is there any hope of a good solution?

Some languages, like FORTRAN and COBOL (really, I'm not picking on them; they're fine for what they do) mandate certain spacing constraints. This is one solution. Another solution is to standardize among all the teaching institutions a certain generic style, and mandate its use at all times.

But can they go too far?

Some examples from industry will answer this question. The members of large company (which I shall leave nameless) decided they wanted to make truly portable software. They were tired of the headaches of porting between the SCO, Solaris, and NT PC platforms. To avoid this, they created a coding standard that said you could not use any standard function directly, but had to use their "wrapper" function, which would be changed depending on what system you were compiling on. On the surface, this sounds like a good idea. After all, the socket programming function "connect()" takes different arguments, depending on which platform you're using. So is this superficially good idea really good in practice?

They decided that to be safe, they'd better wrapper everything, and I mean everything. You couldn't call "malloc" or "free" or "sprintf" - you had to call "companyX_malloc" or "companyX_free". Correct me if I'm wrong, but the ANSI and ISO C specifications include the existence of malloc, free, sprintf, and a whole battery of other functions - and define them absolutely! If your compiler wants to be ANSI or ISO compliant, it has to take exactly the right arguments in exactly the right order! So what did using "companyX_malloc" do? It annoyed the hell out of a lot of engineers.

Is this an extreme example? No, not from my experiences as a consultant engineer. Another example from another company is a coding standard that said that the variables to a function must be arranged in alphabetical order, based on type. At first glance, this seems like a logical notion. It allows for more efficient function calling, stack alignments, and cleaner prototypes. For example, which looks cleaner?

void my_func( int, char, bool, char *, bool, bool, int *, void * );
void my_func( bool, bool, bool, char, char *, int , int *, void * );

Arguably, the alphabetical ordering makes the function name stand out and is less confusing when searching for a function prototype. The unordered types can be somewhat distracting to a superficial glance. But is this good? In this case, a commonly used function was "draw_pel" which took a series of arguments. However, a change was required which resulted in another argument being required which, instead of being appended to the end, was stuck in the middle of the list.

In hundreds of call sites, "draw_pel" had to be changed, and instead of doing it with a simple macro, it took hours of time to sit there and carefully move through the code, pushing variables around in the call list. Another argument against such an idea is that there can be a logical grouping to variables (such as "input, output") and this alphabetical ordering may ruin the logical grouping.

The result of coding standards can be mild (such as changing tab sizes and moving your {}s), to severe (such as spending two days re-ordering variables since you can't use a macro substitution). The grey issue is whether or not coding standards are a good thing.

Theorem Two:

Coding standards for style are a temporary necessary evil, but they should be restricted only to the alignment of source code, and not extend to naming or organizational issues.

In the future, perhaps the various institutions and groups writing software can come to a general form of commonality in the style of writing code. Until then, Theorem Two will remain.

Do you agree or disagree with these two theorems I propose to you? Can you give solid arguments to support your stance? Let's hear it!


Josh Fryman has been working as a computer professional for way too long. He spent over five years as a systems administrator in a network that was too large for one admin, and full of too many "I know what I'm doing!" users. He has also spent over seven years as a professional programmer and digital hardware designer. He's worked for research groups and large and small companies, and is presently employed as a consultant computer engineer. Josh shuns the notion of a personal WWW page, although he did create one back in 1994. He is also a fan of "vi", and actually prefers it when working with files, after finding too many instances of "emacs" not running on a crashed system. For correspondence, send Josh email to this address. These days, Josh spends his idle time far, far away from digital devices.


T-Shirts and Fame!

We're eager to find people interested in writing editorials on software-related topics. We're flexible on length, style, and topic, so long as you know what you're talking about and back up your opinions with facts. Anyone who writes an editorial gets a freshmeat t-shirt from ThinkGeek in addition to 15 minutes of fame. If you think you'd like to try your hand at it, let jeff.covey@freshmeat.net know what you'd like to write about.

Recent comments

08 May 2004 01:55 Avatar knifemakers

Good & Bad
I think they are more good then bad though, sometimes i love to follow a certain standard where all the programers are following the same thing. But then you get to work for some jack *** that has the dumbest standards ever.

11 Jan 2000 21:19 Avatar ajv

Read Steve McConnell's books
Steve McConnell has written some of the clearest and well documented examples of good software engineering practice.


I do not have any higher recommendations for practicing software engineers, and I am was surprised that these books where not mentioned by anyone here to date. I can recommend Code Complete, Rapid Development: Taming Wild Software Schedules, and the recently released After the Gold Rush : Creating a True Profession of Software Engineering (Best Practices).

In his books, you'll find why working insane hours is stupid and counterproductive (and I used to do it, too. Now I'm way more productive on 40-45 hrs/wk rather than 80-100 and I have a social life), good coding practices - including the relatively unimportant variable, function, and class naming as well as the vastly more important software engineering process itself.


Some people hate process, but I'm afraid that's what modern medium-large software project creation is. You can do it the long & hard way, which is to sit in front of your PC and code. This can be intensely gratifying those of us who like to code for coding's sake. Or you can do it much quicker by using software engineering practices, get a more reliable product and have fewer maintainance nightmares.


Steve's books, and good software eng practices help you get there quicker. If you're thinking of going up the software project management tree, these tomes will save you literally months of development. In addition, get yourself PeopleWare by De Marco and Lister. That's a gem, and you'll have staff who will literally love you when you implement the strategies to get good flow time happening.


Click your way to your favorite online book seller and get them. They are truly excellent references.

08 Jan 2000 19:42 Avatar jdege

Regarding Hungarian notation
Hungarian Notation was invented by Charles Simonyi, and was first described in his doctoral thesis. His original idea was to name variables based on their type, possibly followed by a qualifier, if several variables of the same type were in scope. In his thesis, it is clear that by type he does _not_ mean the fundamental data types provided by a language, but rather the logical sets of quantities that are used by the programmer.

Unfortunately, most programmers have been introduced to HN through its use in Microsoft's Windows API. If the developers of that API actually read Simonyi's thesis, they certainly misunderstood it. In the Windows API, prefixes are assigned based on the fundamental data types, int, long, double, etc., which is in direct opposition to what Simonyi had suggested.

One of the examples Simonyi uses is a color graphics program that has a set of internal values that denote colors.

Simonyi would have us recognize that color was a type, and thus define a prefix for that type. In his example, he uses "co".

When comparing a local variable containing a color to the manifest value red, his code would resemble:

#define coRed 1
int co;
...
if (co == coRed) ...

In this case, the use of the "co" prefix tells us something useful, that the variable and the constant, despite being declared as ints, are actually colors, and should be treated as such. The compiler can’t catch misuse, but the pattern makes it easier for the programmer to do so. If the code assigned 42 to the variable co, the compiler wouldn't complain, but it would be likely that a reviewing programmer would notice.

This sort of type mapping makes a great deal of sense in early C. In the early days, C provided no facilities for creating user-defined types, and this sort of “mental mapping” between physical and logical types was all that was possible. Later versions of C added limited facilities to create user-defined types. A more experienced C programmer, using these limited facilities, might end up with:

typedef enum {coUnset, coRed, coBlue, coGreen} Color;
Color co;
...
if (co == coRed) ...

This doesn't provide better compile-time checking than the first, but by declaring co as of type Color you've given the maintenance programmer a better clue as to how it should be used, even if Color is simply an alias for int, and the compiler can't catch misuse. Note that the use of the type prefixes makes at least as much sense here as it did before.

Meanwhile, Microsoft's approach to HN would prefix color variables with 'i', for the underlying base type, which provides no useful information at all.

int iRed = 1;
int iColor;
...
if (iColor == iRed) ...

In C++, on the other hand, we have much better tools for creating user-defined types, and with properly organized code we can have the compiler catch the sort of problems that the type prefixes were supposed to help us with. A C++ version of the problem might look like this:

Class Color
{
public:
enum value {Unset, Red, Blue, Green};
Color(value theValue = Unset)
{
myValue = theValue;
}

private:
value myValue;
};

Color theColor;
...
if (theColor == Color::Red) ...

Given this code, if we try to assign anything other than another Color object to a Color object, the compiler will complain. If we try to compare a Color object to anything other than a Color object, the compiler will complain. We don't need to encode type information in the variable names. The physical type is the logical type, and the compiler will catch any conflicts.

Even so, most OO programmers include logical type information in their variable names. This is critical for maintainability. But with OO designs producing so many user-defined types, attempting to define two-character abbreviations for them all is a hopeless task. Most C++ programmers have, instead, adopted the Smalltalk practice of naming variables as adjectiveNoun, where Noun is the Class type.

As for the fundamental datatypes provided by the language, they generally don't become a problem. The scope within which they are used is limited enough that there is almost never confusion regarding type. If, for example, you have a variable called "age", and you start getting confused as to whether it is an int or a double, you are probably also confused as to whether it represents age in years or age in seconds. If such confusion starts to arise, it's time for a new class:

Duration myAge = Date::currentDate() - myBirthDate;
cout

To summarize, Simonyi's original idea as he originally wrote it made good sense, given languages without user-defined types. Hungarian as Microsoft decided to implement it was a perversion that ignored everything that Simonyi was trying to do.

Hungarian as practiced by Microsoft is worthless, period.

Hungarian as originally described by Simonyi made good sense in languages without user-defined types.

In languages that include support for user-defined types, (which include all OO languages) HN is inappropriate. There should be, in any well-crafted program, more user-defined types than can be accomodated by any meaningful set of abbreviations.

08 Jan 2000 07:54 Avatar jburstein

Clearly an integer?
Likewise, when I write a piece of code that says x=x+14, It seems pretty clear that x is an integer.


There is an incredible amount of ambiguity in the expression x = x + 14. Yes, x may be an integer -- but is it signed or unsigned? And by integer, do you mean char, int, long, long long, int_64, etc.?


Or, perhaps, x is a floating point number. In that case, is it a float or a double? Are there any extended fp formats to deal with (with corresponding performance implications)?


Or might x actually be a pointer? I can certainly conceive of situations in which I might want to advance a pointer into a string by 14 characters. But I'd also like to know if that string is of normal or wide chars.


Of course, x might be an instance of a C++ class that has overloaded the + operator. If this is the case, I have no idea what in the world x = x + 14 means.


Without sufficient context, it is impossible to attribute any semantic meaning to an isolated expression or statement. The whole point of HN is to distribute that context throughout the code as much as possible, so that a programmer need spend a minimal amount of time searching for the context, as opposed using that time to actually understanding the logic of a given piece of code.


The contextual hints provided by HN are especially useful when browsing through unfamiliar code, when the intention is to get an overview of what the code is doing (as opposed to deeply examining a given segment). Such browsing is quite common in any large scale project, especially when one is tasked with debugging another's code (and you don't know where the problem lies) or modifying an existing system.


And finally, as another counter to the Torvalds "changing types requires much mucking through code altering variables' names" argument: how often does one blindly change a type that is pervasive throughout a system? Altering a structure used by many components of a system carries with it drastic costs -- at the very least, the compiler errors that will occur due to (for example) changing the name of the structure member will aid in the task of examining all uses of that member to ensure that they are consistent with the new type.


Of course, elements of HN can be misused, and a too strict adherency to a set of HN rules can cause more problems than they solve (e.g., UINT v. DWORD usage in Win32 code). But, as has been clearly pointer out, any coding standard can fall prey to this problem.

05 Jan 2000 14:30 Avatar fryman

Think more, react less
First off, I'd like to thank everyone for their comments. The editorial was meant to be somewhat controversial, but was more intended to solicit opinions on these things. Now that many people have had the chance to read and comment, I'll address some of your comments and ideas.


As many of you explained your own coding standards, you do share the same fundamental view I was trying to express -- bad variable naming or function naming = bad code. However, a lot of you argued that when I said coding standards shouldn't apply to naming, this was a bad idea, with examples of "get_c" versus "get_count" and such. This is exactly my point: bad naming = bad code. But more than the reaction of bad code, bad naming = bad programming = bad programmer. The coding standard shouldn't have to even address this issue -- the naming of variables and such should be correct and intuitive to everyone from the outset. Therefore, no, a coding standard shouldn't enforce a certain type of naming convention. We all recognize that "get_c" isn't a good idea. So why write it in the first place?


Let's talk a little more about case styles. In many cases, people argue that variables should always be "DayCount" or "day_count", one or the other. In real, complex examples, sometimes it's best to mix styles. You may receive source code from a vendor with functions such as "i2c_write()" -- in fact, you may have hundreds of functions and variables using the "x_y_z" convention. In order to distinguish your code from theirs, it's quite common to use the opposite in your own so a cursory glance will tell you if the function being called is vendor-provided or company-generated.


I agree that across a project, there should be an agreement to some things in style (indentation, does the company use DayCount or day_count, etc), but all too often the "Standards" go way too far. A standard is only put in place to address problems; in this case, it's a work-around to a fundamental issue that very few programmers are capable of being consistent with themselves, let alone others, in naming and style.


How many of you use multiple styles depending on what you're coding and the nature of the system? I maintain about 4 different styles I use on a regular basis. It takes a lot of effort not to mix them up. But then, I'm working on large projects of complex interactions with multi-threaded issues, making a complex application with a variety of vendor provided source routines for hardware access. It's a different world, but the problems are common.


As for Hungarian Notation, the examples I used were from the original work by Simonyi, as you can find at various places on the web or in publications. I won't profess to like it or be particularly expert in it, but as everyone points to the original work as the fount of HN, my examples should be HN. If there have been later distinctions between HN and type-based naming, hopefully an effort will be spent to distinguish between the two. Can anyone provide complete simple and complex examples of each?


What I'm getting at:


HN, other naming conventions, and coding styles exist and probably will for quite some time. But the fundamental REASON they exist is because we, programmers, are incoherent. We are lazy, too, and that makes things worse. Until we are ready to agree to a certain amount of responsibility for ourselves, these types of nonsense "patches" will be forced upon us.


And for those who complain that the big drawback of HN is that a variable that was once "puchData" and is now really "piData" -- yet the code wasn't changed to convert all puchData to piData -- I rest my case. The person doing the change was lazy, and didn't take the responsibility they should have. While this continues, we're all going to be subject to idiot managers (pointy-hair types) telling us how to do things.


The issue raised about whether companyX's proprietary coding style could be copyright/patented -- this is a very valid question. And a very scary one. It's not impossible -- hell, they can patent/copyright that "Ctrl-C" is "Copy", so why not programming style? Or a language itself? An interesting area to think about...


And while it's frequent to find buggy implementations of operations in different platforms, sprintf() will never change from one system to another. Putting a wrapper on it buys you nothing. It's ANSI and ISO defined in parameters, format, etc. You may change compilers, but you won't change the call. If you replace sprintf() with your own for some reason, you shouldn't call it sprintf() in the first place -- this will lead to unnecessary confusion.


I agree that contractors and even corporate programmers must live in their environments. I constantly am working under some company's format, and it's irksome to find no two companies like things the same way. Mostly, I want everyone to stop and think --


Why did these things get developed in the first place?

If they're a "good" thing, then why the hell aren't we doing this on our own?

Why do we let non-technical people make up our coding standards?

What are the REAL issues behind why these got started?

How can we address these REAL issues?

Think about it.


One final comment, for "Mauro":

My knowledge of mathematics is rather good, IMHO. A theorem, if you'll recall, is an item that has yet to be disproven. You have confused a Theorem with an Axiom. An Axiom is an item which has been proven and can not be disproven -- ever. A theorem is something which has been postulated, but has yet to be proven absolutely. The only restriction to a theorem is that it must have all assumptions explicitly listed. At the time of writing, my inflammatory statements were yet to be proven false (and still are, I might add), therefore qualified as "Theorems"... Granted, they are opinions, are it would be ungainly to run around all day to precede any comments you make with "My new Theorem is ...", however, it would be technically correct. The funny thing is, everything is an opinion. 2+2=4 is an opinion, and assumes a lot of information about the underlying set, but isn't always true. It's just an opinion that it usally is.

Screenshot

Project Spotlight

Kigo Video Converter Ultimate for Mac

A tool for converting and editing videos.

Screenshot

Project Spotlight

Kid3

An efficient tagger for MP3, Ogg/Vorbis, and FLAC files.