Writing

Feed Software, technology, sysadmin war stories, and more.

Thursday, September 6, 2012

A proposal to enforce the 20% of C++ you actually use

One of the things I always hear about C++ is that everybody uses only some portion of the features, and nobody uses all of them. That sounds plausible, and then they add another point: the portion that gets used isn't the same portion from one programmer to the next, or from one company to another.

This actually makes sense to me, especially with any language which has a significant number of features. There are bound to be constructs which just do not sit well with the people in charge, and they will make a decree which states "do not do this". Think of it like a style guide, only instead of talking about tabs vs. spaces (how many?) or where you put your { braces }, there are statements about fundamental things you may or may not use.

Let's say that limiting such things by default is a good idea. Given this, perhaps it would be useful if you could tell your compiler exactly what you are willing to accept. After all, the compiler knows exactly which feature(s) are being invoked when it looks at a line of code. It should be able to say "oh, they're trying to do multiple inheritance here" and check to see if it's allowed.

Obviously, there will be corner cases where you need to allow use of something nasty deep down in the code, perhaps if only so you can wrap it and provide your own custom interface to it. In those places, you'd need some way to say "it's okay to violate rule X here" to allow it to compile.

I can think of a bunch of things which would be good to disable by default in a code base used by a large company. Maybe you don't want people to be able to open their own FILE* streams because they tend to forget to fclose() them and thus leak resources. One way around this would be to ban the use of FILE*, and then write your own replacement.

Your replacement class would need to be written to use FILE* responsibly. This means keeping track of whether one is active, and properly closing it in the destructor if it was left hanging. It might also be useful to print a warning to stderr if the destructor does the cleanup work, since it means the outer program didn't explicitly call the Close() in the replacement class.

This is a bit of a stretch, and possibly a contentious one, but there are other places where you want to ban certain things. Maybe you decide that strtok is evil because it can lead to badness when used in multi-threaded situations. You'd be right. There are also things like localtime() and gmtime() and readdir() which are bad for similar reasons. Having a way to squash them would be great.

I am aware that some of this can be done with a very simple grep type strategy where you look for certain key strings in source code. This can be used as part of a bigger "lint check" strategy. Unfortunately, as I am all too aware, there is no guarantee that people will run these tools to catch such issues prior to submitting code. Also, if you only catch such violations at that stage, you have wasted a lot of programmer time since they assumed they were nearly done.

By flagging such things at compile-time, it becomes obvious that something is wrong the first time you try to build it. This moves the failure up to the earliest reasonable point in the process. It's still annoying, but at least you haven't built an entire scheme around something which is disallowed.

I understand that Microsoft has a "banned.h" file which can be #included to turn certain evil function calls into compile-time errors. This sounds annoying to me, since it's one more piece of boilerplate to remember every time you start something new. I'd rather have it live in something fundamental that's harder to "forget" (if they're good) or "purposely omit" (if they're being bad).

I've read about tools like the Clang Static Analyzer, but I'm not sure how far they could go in this regard. It's not clear to me whether you could truly make it enforce your local language subset rules. I've had enough trouble just trying to experiment with just the compiler part of this project, so I'm in no hurry to try even more complicated parts of it.

Another answer I'm sure to hear is that of code reviews. To that, all I can say is that it's entirely based on the ability of the people doing those reviews. If all you have are amazing people who know all of the rules, even when they change, and never make mistakes, great! If not, maybe you should turn to these wonderful computers to help out.

Here's something else to consider: automated continuous builds. If you update your ruleset and existing code is now found to be in violation, it can be caught right away. Just trigger a full rebuild any time something fundamental like that happens in your build system. This way, someone can respond to it at a leisurely pace. The alternative is having it pop up when someone tries to work on something unrelated and just happens to be the first person to rebuild that code under the new rules.

That's my idea. Bring on the pitchforks! I'm ready.