I've worked in the industry for several years now, and I want to ask you all something: can these coding standards being used rampantly throughout programming society actually be useful? Or are they just some interesting notion that got so far out of hand that no one can stop it?
Hmm, that's a tough question. Let's look at some of the background, and what I mean by "coding standard" since we all know that's an oxymoron. First, we'll take a quick review look at a generic form of variable and function naming called Hungarian notation, and then we'll take a look at some sample corporate coding standards and why they were created. Then we'll look at the implications of these.
The name "Hungarian Notation" is a joke. What does that tell us right away?
A fellow by the name of Charles Simonyi (a Hungarian, not
surprisingly) developed it. He joined Microsoft back in 1981, and
took this notation system from his work in the 1970s at Berkeley and
Xerox Palo Alto to the evil empire, who subsequently spread it by
force to the Microsoft planet. When he started using it, the other
programmers thought names like "khprguchOkIcon" (yes,
that's a valid form - a constant huge pointer to an array of unsigned
characters named OkIcon) were nonsensical, as though it were Greek --
or in this case, Hungarian. Somehow this got tied to calculator form
RPN, and the result was "Hungarian Notation."
The idea is that a common prefix and tag type are associated before a variable name so you know what data type a variable name is carrying. This was developed for weak-typed languages (such as C) to help programmers know what the data type was they were working with, to prevent miscast or bad assignment operations. Many subtle bugs come from these situations, and it seems like a good idea on the surface. For languages which are strong- or medium-typed (like Pascal), this is unnecessary, as the compiler will not let you assign an integer to a character without the proper cast operation.
While some simple examples might be "bOk" (Boolean-type
variable Ok), or "wCount" (word-type variable Count), the
complicated example above shows just how quickly those prefix letters
can get out of hand. Suppose you have an icon image which is placed
as constant to move it into the code space, freeing precious RAM in an
embedded product. How bulky would it be for you to type
"khprguchOkIcon" as opposed to "OkIcon", and
just how useful are those prefix characters? Which one seems more
readable to you, if you have to perform code maintenance?
DrawIcon( &OkIcon );
DrawIcon( khprguchOkICon );
It's amusing to note that because this erupted in the Microsoft empire, anyone who writes Microsoft-related code is likely to use this notational system; moreover, they're likely to use it in other places as well from habit. Most UN*X programmers avoid this notational system like the plague they perceive it to be. But with the rapid growth of programmers moving to Linux, it's inevitable that this notational system will be appearing in code near you. The question is, will it stick around, or be kicked out? Or is there perhaps a happy medium that will be reached somewhere between not having it and having it with messy variable names?
Coding standards are template documents that anyone working on a program of more than a few thousand lines is probably familiar with. As the code becomes larger and more complex, it tends to gather more authors. As each programmer invariably has his own programming style, some effort is spent to keep everyone coding in the same format. Do you place your {}s on the same line as the if/else, or do you use them for white-space offsets by placing them on lines by themselves? Does the person next to you code the same way? Does everyone in your group of 100 engineers code the exact same way?
No, of course not. Everyone is a little bit different. You might name
a variable "day_count", they might name a variable
"DayCount", someone else might name it
"count_of_all_days_in_one_work_week" (someone from a
COBOL background, perhaps). The point of the coding standard is to
prevent these issues, so the code is a little more uniform, a little
easier for everyone to read, and simpler for other people to analyze,
debug, and maintain in general. Coding standards typically address
not only style (where you place your {}s), but also naming (Hungarian
notation).
Of course, no one really likes writing in a format which isn't his own, but for the greater picture, people are willing to tolerate these things. Imagine a bunch of COBOL and FORTRAN programmers who learn C and decide to work on the Linux kernel, coding in their old indentation formats but using C. How readable and maintainable do you think that would be? Are you shuddering yet?
So what exactly is the point? Like many things which seem like a good idea on the surface, are these actually good ideas in practice? It seems a subjective issue, one that is defined like a moral issue -- there is right, there is wrong, but there are many shades of grey.
An important first question: when do these standards work? If you're
maintaining someone else's code, or even your own several years later,
the variable names will be unfamiliar. Worse, if the code was poorly
written, with functions that run hundreds of lines long, you may not
be able to readily recognize if "data" is integer,
character, pointer, structure, or some other beast. In this case, a
"puchData" might rapidly tell you that this
"data" is of the type "pointer to an unsigned character"
rather than anything else.
But let's take a closer look at this argument. If the function you're
looking at is hundreds of lines long, should you be trying to maintain
it? Or was it poorly written in the beginning? Wouldn't it be safer
to study the function and break it into smaller functions? Worse,
with a poorly-chosen variable name like "data" as opposed
to "input_data", we're left wondering what data
"data" is holding. By choosing better variable names
and writing better code, we would avoid the problem.
The counter argument is global variables, which may be defined in a separate file. How are you supposed to know what types they are? Well, as with local variables, a good name will solve the problem.
Good programming practices can avoid the situations this notational system was designed to work around. Rather than work around the problem, shouldn't we fix the problem itself?
Well, what about coding standards? Coding standards appear to be a necessary evil in today's world. There are too many schools of thought on where to place your {}s, how far to indent the next block, whether tab characters or spaces should be used, how many spaces make up a tab character, and so forth. Is there any hope of a good solution?
Some languages, like FORTRAN and COBOL (really, I'm not picking on them; they're fine for what they do) mandate certain spacing constraints. This is one solution. Another solution is to standardize among all the teaching institutions a certain generic style, and mandate its use at all times.
But can they go too far?
Some examples from industry will answer this question. The members of
large company (which I shall leave nameless) decided they wanted to
make truly portable software. They were tired of the headaches of
porting between the SCO, Solaris, and NT PC platforms. To avoid this,
they created a coding standard that said you could not use any
standard function directly, but had to use their
"wrapper" function, which would be changed depending on
what system you were compiling on. On the surface, this sounds like a
good idea. After all, the socket programming function
"connect()" takes different arguments, depending on which
platform you're using. So is this superficially good idea really good
in practice?
They decided that to be safe, they'd better wrapper everything, and I
mean everything. You couldn't call "malloc" or
"free" or "sprintf" - you had to call
"companyX_malloc" or "companyX_free".
Correct me if I'm wrong, but the ANSI and ISO C specifications include
the existence of malloc, free, sprintf, and a whole battery of other
functions - and define them absolutely! If your compiler wants to be
ANSI or ISO compliant, it has to take exactly the right arguments in
exactly the right order! So what did using
"companyX_malloc" do? It annoyed the hell out of a lot of
engineers.
Is this an extreme example? No, not from my experiences as a consultant engineer. Another example from another company is a coding standard that said that the variables to a function must be arranged in alphabetical order, based on type. At first glance, this seems like a logical notion. It allows for more efficient function calling, stack alignments, and cleaner prototypes. For example, which looks cleaner?
void my_func( int, char, bool, char *, bool, bool, int *, void * );
void my_func( bool, bool, bool, char, char *, int , int *, void * );
Arguably, the alphabetical ordering makes the function name stand out
and is less confusing when searching for a function prototype. The
unordered types can be somewhat distracting to a superficial glance.
But is this good? In this case, a commonly used function was
"draw_pel" which took a series of arguments. However, a
change was required which resulted in another argument being required
which, instead of being appended to the end, was stuck in the middle
of the list.
In hundreds of call sites, "draw_pel" had to be changed,
and instead of doing it with a simple macro, it took hours of time to
sit there and carefully move through the code, pushing variables
around in the call list. Another argument against such an idea is
that there can be a logical grouping to variables (such as "input,
output") and this alphabetical ordering may ruin the logical grouping.
The result of coding standards can be mild (such as changing tab sizes and moving your {}s), to severe (such as spending two days re-ordering variables since you can't use a macro substitution). The grey issue is whether or not coding standards are a good thing.
In the future, perhaps the various institutions and groups writing software can come to a general form of commonality in the style of writing code. Until then, Theorem Two will remain.
Do you agree or disagree with these two theorems I propose to you? Can you give solid arguments to support your stance? Let's hear it!
Josh Fryman has been working as a computer professional for way too long. He spent over five years as a systems administrator in a network that was too large for one admin, and full of too many "I know what I'm doing!" users. He has also spent over seven years as a professional programmer and digital hardware designer. He's worked for research groups and large and small companies, and is presently employed as a consultant computer engineer. Josh shuns the notion of a personal WWW page, although he did create one back in 1994. He is also a fan of "vi", and actually prefers it when working with files, after finding too many instances of "emacs" not running on a crashed system. For correspondence, send Josh email to this address. These days, Josh spends his idle time far, far away from digital devices.
We're eager to find people interested in writing editorials on software-related topics. We're flexible on length, style, and topic, so long as you know what you're talking about and back up your opinions with facts. Anyone who writes an editorial gets a freshmeat t-shirt from ThinkGeek in addition to 15 minutes of fame. If you think you'd like to try your hand at it, let jeff.covey@freshmeat.net know what you'd like to write about.
An object-oriented, type safe, multi-threaded approach to computer algebra.
My coding standards
This is largely ripped off from the Mac framework PowerPlant, and I used to follow it...
Variable prefixes:
int i
bool b
char[] s
Prepend an m if the variable is in a class...
variable names are case-delimited, i.e. dayCount or mDayCount (the first word is lower case, the rest start with capital letters). Classes (if applicable) have the first letter of every word capitalized, i.e. GtkWidget.
Variable names should actually mean something and every decleration should have a comment that says what purpose the variable has, unless it's very obvious (and it never is).
I've seen a few variations on this scheme, for example AbiWord seems to follow it pretty closely IIRC except they prepend m_ instead of m, i.e. it looks like m_iDayCount.
I haven't been following these recently but I plan to when I return to C++. Recently I've been naming my structures 'thing' and using underscores in variable names (in defense of this method, C really should be completely lower case, and my proposed method makes annoying cases necessary).
But really, does it make sense to include the type of variable in its name? Isn't that what types of variables are for? Last time I checked, most C compilers would at least warn you if you tried to set a variable with an incompatible data type. I think it's more important for people to comment their code well than to do silly things like naming a variables in a consistant way... if they tell us, it doesn't matter how easy it is to guess from the name.
My two cents (Canadian)...
A number of different ideas get lumped into "Coding Standards", including:
- naming convention for identifiers
- style (i.e., where braces go, indentation)
- commenting code, comment headers, etc
- portable code (e.g., abstracting lower level functions)
- coding practices (e.g., usage of "goto", while(1) vs for(;;))
These all have some merit when you consider some of the goals behind coding standards:
1) consistency
Having been thrown into projects in the midst of implementation, I find that consistency makes it easier to understand the code, and shorten the learning curve. This especially comes in handy when documentation is scarce. Having a published/well-known coding standard in place, also makes for better code reviews, i.e., more time is spent on analyzing the logic than nitpicking someone's personal coding style.
I'm also a strong proponent of standardized comment blocks so that "Autodoc"-like tools can be used to generate external documentation (e.g., man pages).
2) maintainability
If you've ever maintained code (especially code which is shared by a multitude of projects), incremental changes, to some extent, encourages programmers to 1) follow the dominant coding convention that already exists in a source file, and 2) be very resistant to re-writing the code from scratch, i.e., minimizing "diffs".
This has the negative side effect that bad coding style begets more code of the same [inferior] quality.
3) metrics
Yes, evil evil evil. But companies continue to look for ways to measure productivity (LOC, defects per LOC, etc). And in some cases, managers-turned-spin-doctors abuse metrics as a way to explain to upper management how the latest version can have more bugs than the previous release.
Conclusion
What it really boils down to is this: a good coding standard promotes maintainable code. A bad coding coding standard doesn't. And only good code can come with experience.
*** Obligatory Bad Joke ***
Question:
What do you call this?
typedef char * STRLP;
STRLP customIdentifierStrlp;
Answer:
Reverse Hungarian Notation
Naming is the most important part
I strongly disagree with theorem 2.
Style is not very important, it is nice to have a single style in all files, but you can also live with a situation where style varies from file to file (or even between functions / methods).
But naming is extremely important. This does not mean that I propose hungarian, but you should use some conventions for the names of all symbols. For example, in Java the usual style is to use capital letters to seperate words, like in doTheThing(). IMHO this is extremely important when you talk about code (and it also makes remembering a function name a lot easier). If you dont do this, a method doTheThing() could be called dothething(), do_the_thing() or whatever ill minds come up with. Then you cannot simply say to a collegue "use the do-the-thing function", but you must say "use the do-the-thing function with uppercase first letters and underscores between the words". You can also make it easier to remember the name of a function if you use some simple rules as a naming scheme. In Java, it is usual practise to use get/set pairs for methods that access properties of a class. For example, if the class has got a color property, you create two methods setColor() and getColor(). This is much better than every programer inventing his own scheme. Usually it is a good idea to find some simple rules for method naming, for example verb + subject.
Personal experience with coding standards
I've been a committer with the FreeBSD project since this summer, and I've definitely learned a lot. One of the first things I learned was that style(9) is _very_ important. Even if it takes time even to get things to where they should be -- KNF (Kernel Normal Form reference (www.FreeBSD.org/cgi/ma...)).
Styling guidelines are really a wonderful thing. Sure, you may have a personal style to unlearn, but you save even more time after using the standard style than it took to learn it. People make mistakes with the style sometimes, but even with style bugs, things are much more readable than could be remotely possible without a standardized style.
If I'm going through the kernel, reading code and vgrepping for what I need, it takes much more effort to adjust to different styles than it does if everything is one comfortable style, and it all fits together. The biggest visual criteria covered by the style are indentation and comments. There are rules for whitespace, and rules for how comments work. In addition, tabstops are defined to look like 8 spaces, and lines are to be broken at 80 columns. Some of the more unwritten rules are ones of naming. Typically, something can be all capitals, all lowercase, and may have underscores or numbers with either one. Functions are usually all lowercase, as are variables, but exceptions that don't affect the namespace (i.e. giving your variables names you like better is usually just fine, but changing your function to something like FooBar from foo_bar is not) are okay.
Perhaps I'm a little biased by actually having to work with them, but I think that by using a good (well-defined) coding style, everyone's time can be saved, and software tends to fit together much more nicely than without a style.
Re: Coding standards and personal choice.
David B. Harrs wrote:
So standards are forced upon people so others understand better, but those others best understand their own code, which was formatted for them.
I claim that after using a coding standard for a reasonable amount of time (one or two months), you would become used to it, and it would become your own. I used to write code with the braces on the same line as the if, and non-indented. Then I went to a company where the standard was braces on a new line, indented. It was annoying for a bit, but the editor supported it, and after about a month it wasn't a problem anymore. Now I'm at a company where the standard is braces on a new line, non-indented, and the switch to that wasn't even noticable. I don't think I'm super-flexable, or anything, but if you're writing code for eight hours a day, you'll get used to a style pretty quickly.
a) frustrate the engineer/programmer, who is an artist, and shouldn't be told how to hold his/her paintbrush
I didn't have a FreshMeat login, but this comment inspired me to get one just so that I could reply. I think that one of the main problems in the software industry is that programmers act like artists instead of engineers. If the value of being told how to hold your paintbrush is that someone else can read your programs for maintenance or debugging purposes, then that's a trade-off that everyone in this industry should be prepared, even happy, to make.
At the company I'm at now, we employ co-ops for short-term labour. The chances that they'll be around when I need to make a change to "their" code is very low, and so I am thankful that they are using the standard coding style, since it removes a large barrier to my understanding of their code.
Later,
Blake.
Coding Style
Its my personal (and probably a bit radical) belief that issues of
coding style should be moot given the right kind of coding
environment. That this is not the case seems to reflect more the lack
of the right tools than anything else.
Layout issues are particularly annoying since a tool such as indent
can reformat code to match personal preferences without a lot of
difficulty. Naming issues are a bit more thorny but we can imagine a
code editing tool that encourages commenting identifiers when they are
declared and that then facilitates looking up the identifiers easily
(I can imagine many ways to do this - including a mode in which a
small window displays the comment when the cursor is over the
variable).
I worked in an environment (mostly java coding) not too long ago in
which layout issues were a major bone of contention. One programmer
was insistant that everyone use exactly his layout and hungarian
notation - to the point where he would tell people to make their
editing windows exactly the size that he used and would become loud
and very insistant when others did not follow his notions. The odd
thing about this was that he followed otherwise very poor (IMHO)
coding practices otherwise - and these coding practices were far more
structural - having to do with the way he built objects and methods.
I found his code hard to follow and extend - not because of his layout
(which was fairly vanilla) but because he seemed to have no idea of
how to reasonably factor his programs to make them readable and
extensible.
Style issues based on cosmetics are not trivial by any means - but
they are much less important than those based on structure - and they
would likely disappear given the right kind of programming
environment.
A Look At Your Theorems
Theorem One:
"Hungarian Notation exists to make up for bad coding practices"
Whilst it's easy to ridicule the names used in Hungarian Notation, I note that your article actually fails to include any form of proof that the basic idea itself is bad.
In a general sense, it's just a naming convention. The right naming convention is very useful for making code significantly easier to read.
For example, take libraries (such as GDBM) that prefix their function names with 'gdbm_'. That's just a naming convention, but it's an absolutely essential tool for carving up the cluttered namespace effectively and safely.
For example, take languages (such as PHP) where variables are not declared. Using prefixes on variable names (l_ for locals, a_ for functions, gl_ for globals, c_ for data members, and so on) makes such code a lot easier to read.
Personally, if I was ever to meet Linus, I'd have to thank him for coming up with the best reason ever for schemes such as Hungarian Notation. His famous objection to the use of Hungarian Notation is that, when you change the type of a variable, you have to go through and edit all the source code where the variable is used. That's actually a superb reason FOR the use of Hungarian Notation. If you change the type of a variable, should you not actually check your code to ensure that the new type isn't going to break anything? And if you don't check, isn't that unprofessional behaviour, because you're now guessing something that it is perfectly possible to check and be correct about?
--
In your "discussion" of Theorem 1, you mention the idea of wrapping even supposedly standard functions to assist portability.
As someone who maintains highly portable UNIX software, I can personally testify that this approach is sometimes necessary (and that's before you look at porting to non-UNIX). Even on UNIX, many of the "standard" functions are buggy from one vendor (or even one release) to the next, and such wrappers are an excellent way to deal with this. In an ideal world, they wouldn't be needed - but this is RealLife(TM), and practical measures are what it's all about.
I'd also question the professionalism of the "annoyed" engineers. IT staff are (in the main) highly-paid professionals. They're paid to do a job, not be prima-donnas. Unfortunately, management exists to make stupid suggestions - after all, some manager at some time must have promoted the current lot to their over-elevated position - and the IT *professional* is paid to be just that - professional. To do a job.
Many of us would like to be artists, but we're not. Most IT staff aren't even any good at their job. (Not a popular statement, but unfortunately very true.) The vast majority of programmers I've met and worked with in my career are certainly not qualified or capable of determining HOW to write a piece of software. If they were, software would be delivered on time and on budget. The OpenSource world wouldn't be littered with the remains of v0.1 releases of software that never gets completed.
Theorem Two:
"Coding standards for style are a temporary necessary evil, but they should be restricted only to the alignment of source code, and not extend to naming or organizational issues."
Temporary? Why are they temporary? What will replace them? Better educated programmers? And where are these programmers going to come from? The quality of IT teaching is currently decreasing, not increasing, and many posts these days do not require any formal IT qualification. There is a shortage of programmers (I refuse to use the term engineer for most of these people), and that only lowers the entry standards.
I cannot see any "institution or group writing software" putting forth a "general form of commonality in the style of writing code". Such achieve agreement, such a style would have to be so general as to be ineffective.
Styles are designed not just to reflect best practice, but also to reflect the needs of the organisation. For example, go to a sweat-shop software firm that specialises in one-off bespoke projects, and the standards there will be high-quality and detailed, perhaps because the staff turnover runs so high. (Okay, no perhaps about it ;-) Go to a famous name computer company, especially one of the older ones, and the standards there can be low-quality, partly because of complacency, and partly because their work is research-driven (and getting researchers to work to standards is an exercise in futility at times.)
Restricted? Why? I don't see an argument supporting your theorem.
Coding standards exist to ensure that everyone in a team is producing code along the same lines. Strong, high quality standards are part of the framework (that includes proper designs and specifications) that allow managers to easily move staff around teams within a project, and even replace staff as and when necessary.
Good standards also have practical purposes, such as ensuring common coding mistakes are clearly documented (and therefore avoided), ensuring that developers know how to create a new module (eg which directory to put a library in, what the header files should be), and document the standards (as in quality of work) that your developers are expected to aim towards.
They also have the excellent role of being part of the framework that ensures that your senior developers cannot hold you to ransom. How many companies have you worked in where some of the senior staff are senior not because they're actually any good, but because they have some piece of knowledge which everyone else has to rely on?
They take the guesswork out of the process. Why have to guess how things should be, or even make things up on the spot, when you can write down what they should be? This approach is part of the work necessary to ensuring that you produce high-quality software through judgement rather than through luck.
In Summary:
Accept the principles, for they are with merit. Do all you can to defeat silly details. Be open to the benefits of any idea before you dismiss it. Try it out. Most ideas can only be appreciated once you've used them in anger, and at scale.
Guesswork is such a major cause of the low-quality work that most computer programmers output these days. Eliminate the guesswork, and the quality goes up.
Coding Standards aren't needed if...
I don't push code formats so much in my own place of employment. Call all your local vars 'i' for all I care. However, each new scope needs to be documented. Whatever happened to good, old-fashioned sub / function block comments, detailing all the vars passed in, any that are changed, local usage, etc etc etc? If that's in place, you can easily maintain even the most obnoxious of code with the most bizarre var names.
I give anyone who dares utter the phrase "oh, it's self documenting code" (often claimed by such Hungarian Notational types) a good stern frowning and they typically back off and start commenting.
I write all my code as if it were something I wouldn't see again for six months - meaning that I over-comment, so that I know my rational for wierd looking decisions so that I don't break anything when going back through. Doesn't everyone write like this? With maintenance in mind?!?
Re: Re: Coding standards and personal choice.
a) frustrate the engineer/programmer, who is an artist, and shouldn't be told how to hold his/her paintbrush
I didn't have a FreshMeat login, but this comment inspired me to get one just so that I could reply. I think that one of the main problems in the software industry is that programmers act like artists instead of engineers. If the value of being told how to hold your paintbrush is that someone else can read your programs for maintenance or debugging purposes, then that's a trade-off that everyone in this industry should be prepared, even happy, to make.
At the company I'm at now, we employ co-ops for short-term labour. The chances that they'll be around when I need to make a change to "their" code is very low, and so I am thankful that they are using the standard coding style, since it removes a large barrier to my understanding of their code.
I got me a freshmeat account also to reply to this as well. I agree with what Blake said, but wanted to add my own bit. I used to be a consultant developer and program for a living. I changed that this summer when I finally got fed up with the effects of various business practices have on my ability to create what I felt was a good product. But while I was there, I thought through many of my philosophies on craft vs. work that I write here.
Businesses are entities which don't always mix with allowing programmers to create art. They have other priorities, like readability, scalability, maintenence, conformity to standards. This allows them to substitute coders for other coders--move people around to get the best fit. These priorities in many cases prevent absolute coding freedom. In addition, one is usually part of a team and you have to have some team standards otherwise the team is not productive. i.e. What happens if I name all my variables in phonetic variations of Japanese? My co-workers would have beaten me with wet noodles because the standard was to use short words found in English.
I think if you want to code for a living you really have to follow standards set by the team you're working with. You're not an individual creating an artistic masterpiece, but a member of a team engineering a solution to a series of problems. In order to be a functional member of that team you have to follow standards.
I do a lot of what I call hobby-coding. My hobby projects encompass solutions for the problems I require solutions for, as well as being artistically beautiful solutions to the problems. I get to maintain my craft and my confidence as a craftsman. I have the freedom of trying new approaches, toolkits, platforms, and such. I use my own designs. I have these freedoms because I'm an individual building solutions for my problems. There aren't many other people involved.
So, I think you need to approach this like any other craftsman would. On the job you have a responsibility to yourself and your team to produce good code that follows the team's standards. But when you're not doing job-work, you can play around a bit and have the freedom to practice your craft. This then satisfies your need to create beautiful things and feel positive as well as satisfying the company's (and client's) need for a working solution that meets their criteria in a team-focused environment.
I think if you don't separate yourself this way, you will get frustrated and you'll be a huge detriment to the team. Life's too short for that. If you can't separate yourself, maybe you should code as a hobby and not as a line of work. Craft and work don't always have the privelidge to go hand in hand.
/will
Evils and Goodness
In my humble opinion, using any notation where you explicitly state the type of an entity in it's name is poor practice.
To my mind, the reason for this can be best explained by why this isn't necessary in normal conversation. If I say, "I fed spot some dog food this afternoon", anyone who reads that comment can tell me that spot is a dog.
Likewise, when I write a piece of code that says x=x+14, It seems pretty clear that x is an integer. Of course, it can get more complicated than this, but in most cases the type of an entity can be determined by the context it is used in. As further evidence, some languages actually employ this method for typing, the ones I know of being Caml (and it's varients) and Icon. It's called type inference by the Objective Caml manual. In these languages, if you can't tell what type a variable is from it's context, then it's type doesn't matter in that context so you don't need to know.
That said, you still need to give your variables and functions meaningful names, and preferably ones which allow them to be read like an english sentance. One convention which I do support is:
Mixed case naming likeThis.
Lower case first letter nouns for variables. eg. processJournal, transactionLog, etc.
Upper case first letter nouns for functions. eg. NextElement, RoundedFloat, FirstLetter.
Upper case first letter verbs for procedures. eg. ProccessThis.
Why these? Well, using nouns and verbs in that way makes it easier to read like an english statement, and the mixed case naming is just my preference, but I believe it's important to have a standard there if you are working with other people.
Also, indenting standards really don't matter these days as far as I'm concerned. If you don't like someone else's indenting, run it through your favourite indenting program. Even vim does this for you.
--B!nej's Humble Opinion v0.98
Coding convention
Don't forget that not everyone read or speak fluently in english , langage use mostly in coding so the look of the words and conventions are very important ...
Keyboard are'nt always the same. I'm changing of type of keyboards very often during the day , mac , unix , pc , querty azerty ..
Coding names with '_' or any non alphabetic chars have to be avoided , ppls who change keyboards often loose time and concentration in typing and reading ...
Personnaly i like this sort of convention
1) for global constant and var in the program
gTheCurrentWindow
gTheFocusText
kColorBlue
kMaxRecords
---
aVar
aGlobal
aTempVar
inside function
myIndex
myCount
for exported functions, from a library named 'Hello'
HelloInit ()
HelloClose () ..
HelloNewObject ()
for exported functions from a file named "set"
setCreate
setAdd()
setDispose()
i hate those__sort_of_things !
never_know_where_it_is_declared :)
:)))
Various comments
"Hungarian Notation exists to make up for bad coding practices." -- I rather disagree; your argument rarely touched HN, but instead focused on its retarded and redheaded step-brother, type-based naming conventions. HN isn't what you used. HN still looks like an abomination, but isn't nearly as abhorrent as the strain you used. See Hungarian Notation (cupid.suninternet.com/...) and Naming Variables and Functions (cupid.suninternet.com/...) for more discussion and explanation of this. However, note that this kind of "HN" is what's most commonly used when discussion of HN and examples of it show up.
"Coding standards for style are a temporary necessary evil, but they should be restricted only to the alignment of source code, and not extend to naming or organizational issues." -- I fully disagree with this one. Standards are standards, after all. They indicate a mindset that's supposed to be common across a project. Some (many?) standards are idiotic, no doubt. (Alphabetizing parameter TYPES? What drugs were they on?) However, well-written standards can be used to lessen the integration curve, ease testing (as there are fewer things to look for), help programmers use proper error handling ("We check every malloc(); why aren't you doing that here?" -- as opposed to "We willingly ABEND whenever malloc() fails," which is something I've actually run across), help code reviews go faster ("What's this wierd macro? Oh, it's just a wrapper for setjmp()/longjmp()...") and other various minor boons. Few standards are well-written, which is sad, but hey. That's not an indictment of standards; that's an indictment of the stupid companies that write some of them.
Coding Style and Standards
Theorem One:
You missed the point. For hungarian notation the biggest problem is that the type is encoded in the variable, argument, member, whatever name. This makes it difficult, if not a complete cluster f**k, to change a type without going through potentially massive volumes of code to reflect the change throughout the system. Most times what happens is that the developer will leave the incorrect encoding in the name and expect other developers on the project to "know" what it is.
Theorem Two:
Again, I disagree. The alignment is for readability. The naming conventions should follow the domain of the problem space. This is more OO in nature than say assember, but it still gives more to the reader who is looking to come up to speed on a given project.
If the problem domain was, lets say, a shopping cart metaphor, it would be easier to read:
class Cart
{
public:
Count getTotalItemCount( void ) const;
virtual void addItem( const Item & );
virtual void removeItem( const Item & );
};
then
class Cart
{
public:
intt getic( void ) const;
virtual void add_i( const unsigned int & );
virtual void remove_i( const unsigned int & );
};
Theorems ????
I *do* understand that Your article was written mainly to stir the water and I actually somewhat agree whit what You say, but I could not help noticing the only thing You managed to "prove" is Your bad knowledge of Mathematics.
You cannot discuss about a theorem.
It's either *proven* or it's not a theorem at all.
You could discuss the postulates the theorem descends from.
What You presented are simply opinions.
Please call them so.
This said I can proceed making a fool of myself expressing my personal opinions on a subject that is evidently a huge battlefield for crusadeers :) :)
1) Coding standards should be considered like "good manners" (or etiquette, if you prefer); When You visit someone's (it is immaterial if it's a home or a program) You should learn how to behave and try to stick to the local rules.
2) Most of the coding standards are just plain ridiculous (like having to put comment headers before "local variable declarations" or "type definitions": if you cannot recognize a typedef you should really stop looking at the code NOW), but that doesn't mean that *all* of them are ridicoulus.
3) I personally hate Hungarian Notation and, where necessary (someone else's BIG code), I compile C++ (with all warnings on) even plain old C to get a better type checking.
4) Coding style, in order to be effective, useful and not an unnecesary burden for the programmer, should be adapted to the project (and here we go back at (1)). My rules are relatively simple:
a) use short names for local variables (wherever possible declare local temporaries in the block that uses them). For this variables I use a Fortran-like notation (if the name starts with ijklmn then it's an integer, pqrs are for pointers to char, c is for chars, etc.).
b) if the project is big enough then all the exported variables (avoid them as plague, if you can!) and functions are prefixed with a short (3 char, usually) indicator of the section of the code they belong to (file, module, section or whatever is appropriate to subdivide the code in sensible chunks).
c) make heavy use of the "static" keyword.
d) enforce consistent indentation style by running indent on the code (expecially when I didn't write it) and then carefully ignore complaints because the comments go messed up. (Yes, I know this is rather drastic, but it works wonders in very short time :) :))
e) encorage documenting the interface to sections of code in the header files (included from the users of the code) and documenting in the code itself (.c) the implementation (algorithms, limitations, todo, ...)
As You see all these are really a cross between coding rules and good programming advice.
I'm completely aganist carving in stone the style rules, but, on the other hand, arrive to some common agreement to ease everyone work is, expecially in large projects, a must.
The bottom line is that (IMHO) coding *style* and healthy coding *practices* are difficoult to separate.
If You try to give only syntactic style rules you are likely to end up with bored programmers and little useful outcome.
On the other hand style is an integral part of coding guidelines.
Mauro
Style - Copyrights
So CompanyX (or is it company_X ?) decides that you need to code in styleA. Can CompanyX hold a copyright on styleA as some sort of intellectual property? If so, evilEmpireMS would be in or even more money.
And on style issues - any programmer worth her/his salary should be able to make the transition between styles, provided that one given style is used for one given code chunk (say, per function at least).
In my environment, i know that coWorker34 programs in personalStyleIdiotic and thinks in logicalStyleNeurotic. Thus whenever i see personalStyleIdiotic, i can expect that the logics of it will flow to logicalStyleNeurotic.
In that sense, it's a context-based environment.
A comparison to something non-programming:
It's like accents. Once you're good with the English language, you can interpolate for the intricacies of any given language and it won't stand in the way of a conversation. Plus, hearing a British accent will give you some context of the person you're talking to.
Not bad, not good.
Style is a lazyness issue. It's a matter of being lazy to not follow the team style, it's also a matter of being lazy to not run the code through a pretty printer to correct the style problems of others. I kind of think that future development environments will allow for customizable style settings and let each developer indent as they like, maybe even name variables as they like and when you extract the code it will format it into your style. It's an ease of reading issue and it should be personal and catered to each code reader.
Hungarian notation is a relic. It is one of the easier ways to make code harder to read. The idea is good but I think that the same if not even more information can be propagated by having good variable and function names. It's also not usually too tough to look back through the code to find the declaration, if you have to go way way back or to a different file, I think that is a much bigger problem. Tersness in general is something that kind of bugs, I understand the dislike of 30 character names but the difference between a 4 letter Hungarian symbol and an 8 letter word isn't so bad. I'd rather see "dayOfWeek" than "piDay" and I think it has more meaning in some contexts.
even variable names can be corrected for easily. sed isn't that hard to use and Hungarian notation is something that can be programatically inserted. Conventions are a good thing in so far as they help keep the lazy coders in check. Conventions are a bad thing when they try to make good programmers out of poor ones. No amount of guidelines or coding conventions will turn a lousy programmer into a good one. If your team has someone who spends an inordinate amount of time policing code conventions then they probably aren't good conventions or you're being too pedantic with them.
Coding Standards
A contractor has little choice but to live with the coding standard imposed by the client company. The result is I've worked with most of them, except Hungarian.
You claim that Hungarian and other type prefixing standards exist to make up for bad coding practices. I claim it's because the software IDEs are so bad. There is typically no easy way to get a type definition for a variable given the scope it's in. Under UNIX you can easily write/run a small program that will show you the current type definition for the variable, where it's defined and if it hides another variable.
Encoding the type as a prefix is just as bad as adding a comment like /* increments i by one */. It's redundant, will be wrong some percentage of the time and adds noise.
Long variable names do the same thing. They often mislead the reader as to what the code really does with the variable. Debugging a program with long descriptive variables is much harder in my experience. Especially when the name describes the original intention and not the actual practice.
Standard naming conventions (caps, underscore) are important purely for communication with colleagues.
Indentation standards help to reduce the size of diffs between versions of the same program when multiple people have to cooperate together.
Portability should just be a matter of agreeing what interfaces will be available on all platforms and making sure the use of any other platform-specific API is well isolated in the code.
The rest should be left to the individual programmer to make their code as readable and understandable as possible.
Use good programming tools to aid in the development and understanding of a body of code, don't obfuscate the code to make up for a lack of good tools.
AndrewN
Think more, react less
First off, I'd like to thank everyone for their comments. The editorial was meant to be somewhat controversial, but was more intended to solicit opinions on these things. Now that many people have had the chance to read and comment, I'll address some of your comments and ideas.
As many of you explained your own coding standards, you do share the same fundamental view I was trying to express -- bad variable naming or function naming = bad code. However, a lot of you argued that when I said coding standards shouldn't apply to naming, this was a bad idea, with examples of "get_c" versus "get_count" and such. This is exactly my point: bad naming = bad code. But more than the reaction of bad code, bad naming = bad programming = bad programmer. The coding standard shouldn't have to even address this issue -- the naming of variables and such should be correct and intuitive to everyone from the outset. Therefore, no, a coding standard shouldn't enforce a certain type of naming convention. We all recognize that "get_c" isn't a good idea. So why write it in the first place?
Let's talk a little more about case styles. In many cases, people argue that variables should always be "DayCount" or "day_count", one or the other. In real, complex examples, sometimes it's best to mix styles. You may receive source code from a vendor with functions such as "i2c_write()" -- in fact, you may have hundreds of functions and variables using the "x_y_z" convention. In order to distinguish your code from theirs, it's quite common to use the opposite in your own so a cursory glance will tell you if the function being called is vendor-provided or company-generated.
I agree that across a project, there should be an agreement to some things in style (indentation, does the company use DayCount or day_count, etc), but all too often the "Standards" go way too far. A standard is only put in place to address problems; in this case, it's a work-around to a fundamental issue that very few programmers are capable of being consistent with themselves, let alone others, in naming and style.
How many of you use multiple styles depending on what you're coding and the nature of the system? I maintain about 4 different styles I use on a regular basis. It takes a lot of effort not to mix them up. But then, I'm working on large projects of complex interactions with multi-threaded issues, making a complex application with a variety of vendor provided source routines for hardware access. It's a different world, but the problems are common.
As for Hungarian Notation, the examples I used were from the original work by Simonyi, as you can find at various places on the web or in publications. I won't profess to like it or be particularly expert in it, but as everyone points to the original work as the fount of HN, my examples should be HN. If there have been later distinctions between HN and type-based naming, hopefully an effort will be spent to distinguish between the two. Can anyone provide complete simple and complex examples of each?
What I'm getting at:
HN, other naming conventions, and coding styles exist and probably will for quite some time. But the fundamental REASON they exist is because we, programmers, are incoherent. We are lazy, too, and that makes things worse. Until we are ready to agree to a certain amount of responsibility for ourselves, these types of nonsense "patches" will be forced upon us.
And for those who complain that the big drawback of HN is that a variable that was once "puchData" and is now really "piData" -- yet the code wasn't changed to convert all puchData to piData -- I rest my case. The person doing the change was lazy, and didn't take the responsibility they should have. While this continues, we're all going to be subject to idiot managers (pointy-hair types) telling us how to do things.
The issue raised about whether companyX's proprietary coding style could be copyright/patented -- this is a very valid question. And a very scary one. It's not impossible -- hell, they can patent/copyright that "Ctrl-C" is "Copy", so why not programming style? Or a language itself? An interesting area to think about...
And while it's frequent to find buggy implementations of operations in different platforms, sprintf() will never change from one system to another. Putting a wrapper on it buys you nothing. It's ANSI and ISO defined in parameters, format, etc. You may change compilers, but you won't change the call. If you replace sprintf() with your own for some reason, you shouldn't call it sprintf() in the first place -- this will lead to unnecessary confusion.
I agree that contractors and even corporate programmers must live in their environments. I constantly am working under some company's format, and it's irksome to find no two companies like things the same way. Mostly, I want everyone to stop and think --
Why did these things get developed in the first place?
If they're a "good" thing, then why the hell aren't we doing this on our own?
Why do we let non-technical people make up our coding standards?
What are the REAL issues behind why these got started?
How can we address these REAL issues?
Think about it.
One final comment, for "Mauro":
My knowledge of mathematics is rather good, IMHO. A theorem, if you'll recall, is an item that has yet to be disproven. You have confused a Theorem with an Axiom. An Axiom is an item which has been proven and can not be disproven -- ever. A theorem is something which has been postulated, but has yet to be proven absolutely. The only restriction to a theorem is that it must have all assumptions explicitly listed. At the time of writing, my inflammatory statements were yet to be proven false (and still are, I might add), therefore qualified as "Theorems"... Granted, they are opinions, are it would be ungainly to run around all day to precede any comments you make with "My new Theorem is ...", however, it would be technically correct. The funny thing is, everything is an opinion. 2+2=4 is an opinion, and assumes a lot of information about the underlying set, but isn't always true. It's just an opinion that it usally is.
Clearly an integer?
Likewise, when I write a piece of code that says x=x+14, It seems pretty clear that x is an integer.
There is an incredible amount of ambiguity in the expression x = x + 14. Yes, x may be an integer -- but is it signed or unsigned? And by integer, do you mean char, int, long, long long, int_64, etc.?
Or, perhaps, x is a floating point number. In that case, is it a float or a double? Are there any extended fp formats to deal with (with corresponding performance implications)?
Or might x actually be a pointer? I can certainly conceive of situations in which I might want to advance a pointer into a string by 14 characters. But I'd also like to know if that string is of normal or wide chars.
Of course, x might be an instance of a C++ class that has overloaded the + operator. If this is the case, I have no idea what in the world x = x + 14 means.
Without sufficient context, it is impossible to attribute any semantic meaning to an isolated expression or statement. The whole point of HN is to distribute that context throughout the code as much as possible, so that a programmer need spend a minimal amount of time searching for the context, as opposed using that time to actually understanding the logic of a given piece of code.
The contextual hints provided by HN are especially useful when browsing through unfamiliar code, when the intention is to get an overview of what the code is doing (as opposed to deeply examining a given segment). Such browsing is quite common in any large scale project, especially when one is tasked with debugging another's code (and you don't know where the problem lies) or modifying an existing system.
And finally, as another counter to the Torvalds "changing types requires much mucking through code altering variables' names" argument: how often does one blindly change a type that is pervasive throughout a system? Altering a structure used by many components of a system carries with it drastic costs -- at the very least, the compiler errors that will occur due to (for example) changing the name of the structure member will aid in the task of examining all uses of that member to ensure that they are consistent with the new type.
Of course, elements of HN can be misused, and a too strict adherency to a set of HN rules can cause more problems than they solve (e.g., UINT v. DWORD usage in Win32 code). But, as has been clearly pointer out, any coding standard can fall prey to this problem.
Regarding Hungarian notation
Hungarian Notation was invented by Charles Simonyi, and was first described in his doctoral thesis. His original idea was to name variables based on their type, possibly followed by a qualifier, if several variables of the same type were in scope. In his thesis, it is clear that by type he does _not_ mean the fundamental data types provided by a language, but rather the logical sets of quantities that are used by the programmer.
Unfortunately, most programmers have been introduced to HN through its use in Microsoft's Windows API. If the developers of that API actually read Simonyi's thesis, they certainly misunderstood it. In the Windows API, prefixes are assigned based on the fundamental data types, int, long, double, etc., which is in direct opposition to what Simonyi had suggested.
One of the examples Simonyi uses is a color graphics program that has a set of internal values that denote colors.
Simonyi would have us recognize that color was a type, and thus define a prefix for that type. In his example, he uses "co".
When comparing a local variable containing a color to the manifest value red, his code would resemble:
#define coRed 1
int co;
...
if (co == coRed) ...
In this case, the use of the "co" prefix tells us something useful, that the variable and the constant, despite being declared as ints, are actually colors, and should be treated as such. The compiler can’t catch misuse, but the pattern makes it easier for the programmer to do so. If the code assigned 42 to the variable co, the compiler wouldn't complain, but it would be likely that a reviewing programmer would notice.
This sort of type mapping makes a great deal of sense in early C. In the early days, C provided no facilities for creating user-defined types, and this sort of “mental mapping” between physical and logical types was all that was possible. Later versions of C added limited facilities to create user-defined types. A more experienced C programmer, using these limited facilities, might end up with:
typedef enum {coUnset, coRed, coBlue, coGreen} Color;
Color co;
...
if (co == coRed) ...
This doesn't provide better compile-time checking than the first, but by declaring co as of type Color you've given the maintenance programmer a better clue as to how it should be used, even if Color is simply an alias for int, and the compiler can't catch misuse. Note that the use of the type prefixes makes at least as much sense here as it did before.
Meanwhile, Microsoft's approach to HN would prefix color variables with 'i', for the underlying base type, which provides no useful information at all.
int iRed = 1;
int iColor;
...
if (iColor == iRed) ...
In C++, on the other hand, we have much better tools for creating user-defined types, and with properly organized code we can have the compiler catch the sort of problems that the type prefixes were supposed to help us with. A C++ version of the problem might look like this:
Class Color
{
public:
enum value {Unset, Red, Blue, Green};
Color(value theValue = Unset)
{
myValue = theValue;
}
private:
value myValue;
};
Color theColor;
...
if (theColor == Color::Red) ...
Given this code, if we try to assign anything other than another Color object to a Color object, the compiler will complain. If we try to compare a Color object to anything other than a Color object, the compiler will complain. We don't need to encode type information in the variable names. The physical type is the logical type, and the compiler will catch any conflicts.
Even so, most OO programmers include logical type information in their variable names. This is critical for maintainability. But with OO designs producing so many user-defined types, attempting to define two-character abbreviations for them all is a hopeless task. Most C++ programmers have, instead, adopted the Smalltalk practice of naming variables as adjectiveNoun, where Noun is the Class type.
As for the fundamental datatypes provided by the language, they generally don't become a problem. The scope within which they are used is limited enough that there is almost never confusion regarding type. If, for example, you have a variable called "age", and you start getting confused as to whether it is an int or a double, you are probably also confused as to whether it represents age in years or age in seconds. If such confusion starts to arise, it's time for a new class:
Duration myAge = Date::currentDate() - myBirthDate;
cout
To summarize, Simonyi's original idea as he originally wrote it made good sense, given languages without user-defined types. Hungarian as Microsoft decided to implement it was a perversion that ignored everything that Simonyi was trying to do.
Hungarian as practiced by Microsoft is worthless, period.
Hungarian as originally described by Simonyi made good sense in languages without user-defined types.
In languages that include support for user-defined types, (which include all OO languages) HN is inappropriate. There should be, in any well-crafted program, more user-defined types than can be accomodated by any meaningful set of abbreviations.
Read Steve McConnell's books
Steve McConnell has written some of the clearest and well documented examples of good software engineering practice.
I do not have any higher recommendations for practicing software engineers, and I am was surprised that these books where not mentioned by anyone here to date. I can recommend Code Complete, Rapid Development: Taming Wild Software Schedules, and the recently released After the Gold Rush : Creating a True Profession of Software Engineering (Best Practices).
In his books, you'll find why working insane hours is stupid and counterproductive (and I used to do it, too. Now I'm way more productive on 40-45 hrs/wk rather than 80-100 and I have a social life), good coding practices - including the relatively unimportant variable, function, and class naming as well as the vastly more important software engineering process itself.
Some people hate process, but I'm afraid that's what modern medium-large software project creation is. You can do it the long & hard way, which is to sit in front of your PC and code. This can be intensely gratifying those of us who like to code for coding's sake. Or you can do it much quicker by using software engineering practices, get a more reliable product and have fewer maintainance nightmares.
Steve's books, and good software eng practices help you get there quicker. If you're thinking of going up the software project management tree, these tomes will save you literally months of development. In addition, get yourself PeopleWare by De Marco and Lister. That's a gem, and you'll have staff who will literally love you when you implement the strategies to get good flow time happening.
Click your way to your favorite online book seller and get them. They are truly excellent references.
Good & Bad
I think they are more good then bad though, sometimes i love to follow a certain standard where all the programers are following the same thing. But then you get to work for some jack *** that has the dumbest standards ever.