“Brittle” design isn’t limited to enterprise application software. You can find brittle design in cars, bridges, buildings, TVs, or even the vegetable bin. (What are those 1/2-pint boxes of $5.00 raspberries, 3/4 of which are moldy, but examples of brittle design?)

What do brittle designs have in common? The designer chose to accentuate high performance at the expense of other reasonable design parameters, like cost, reliability, usability, etc. A Ferrari goes very, very fast, and it feels good when it goes fast, and that’s a design choice. And it’s part of the design choice that the car requires a technician or two to keep it going fast for longer than an afternoon.

So why are most enterprise applications brittle? You can see this coming a mile away. It was a design choice. The enterprise applications in question were designed to be the Ferraris of their particular class of application. They were designed to do the most, have the most functionality, be the most strategic, appeal to the most advanced early adopters, be the most highly differentiated, etc.

To get Ferrari-like performance, they had to make the same design choices Ferrari did. They had to assume the application was perfectly tuned every time the key was turned, and they had to assume that the technicians were there to perform the tuning.

Enterprise applications, you see, were intended to run on the best, the highest-end machines (for their class). They were intended to be set up by experts. They were intended to be maintained by people who had the resources to do what was necessary. They were intended to satisfy the demands of good early-adopter customers who put a lot of pressure on them, with complex pricing schemes or intricate accounting, even if later on, it made setting the thing up complex, increased the chance that there were bugs, and made later upgrades expensive.

This wasn’t bad design; it was good design, especially from the marketing point of view. The applications that put the most pressure on every other design parameter got the highest ratings, attracted the earliest early adopters, recruited the most capable (and highest-cost) implementers, etc., etc. So they won in the marketplace and beat out other applications in their class that made different design choices.

I lived through this when I worked at QAD. At QAD, the founder (Pam Lopker) made different design choices. She built a simple app, one that was pretty easy to understand and pretty easy to set up and did the basics. And for about two years, shortly after I got there, she had the leading application in the marketplace. And then SAP and JDE and PeopleSoft came in and cleaned our clock with applications that promised to do more.

Now, none of the people who actually bought SAP instead of QAD back then or chose (later) to try to replace QAD with SAP did this because they really wanted high performance, per se. They wanted “value” and “flexibility” and “return on investment” and “marketable skills.” They literally didn’t realize that the value and flexibility came at a cost, that the cost was that the application was brittle and that therefore, the value or flexibility or whatever was only achievable if you did everything right.

If they had realized this, would they have made different choices?

I don’t know. I remember a company that made kilns in Pittsburgh that had been using QAD for many years. The company had been taken over by a European company that used SAP, and the CIO had been sent over from Germany to replace the QAD system with the one that was (admittedly) more powerful. He called me in (years after I had worked at QAD) to help him justify the project.

I looked at it pretty carefully, and I shook my head. Admittedly, the QAD product didn’t do what he wanted. But I didn’t like the fit with SAP. I was worried that the product designed for German kiln production just wasn’t going to work. I didn’t want to be right, and I was disappointed to find out that two years later, despite very disciplined and careful efforts, he was back in Germany and QAD was still running the company.

I’m glossing over a lot, of course. There are secondary effects. Very often, the first user of an application dominates its development, so the app will be tuned to the users strengths and weaknesses. It will turn out to be brittle for other users because they don’t have the same strengths. Stuff that was easy for the first user then turns out to be hard for others.

Two final points need to be made. First, when a brittle application works, it’s GREAT. It can make a huge difference to the user. Brian Sommer frequently points out that the first users of an application adopt it for the strategic benefit, but later users don’t. He thinks it’s because the benefit gets commoditized. But I think it’s at least partially because the first users are often the best equipped to get the strategic benefit, whereas later users are not. I think you see something of the same issue, too, in many of Vinnie Mirchandani’s comments about the value that vendors deliver (or don’t deliver).

Second, as to the cause of failure. Michael Krigsman often correctly says that projects are a three-legged stool and that the vendors are often blamed for errors that could just as easily be blamed on the customers. Dennis Moore often voices similar thoughts. With brittle systems, of course, they’re quite right; the failure point can come anywhere. But when they say this, I think they may be overlooking how much the underlying design has contributed.

It may be the technician’s fault that he dropped the beaker of nitrogycerin. But whose brilliant idea was it to move nitroglycerin around in a beaker?

Advertisement

In the last few posts, I’ve been talking about “brittle” applications, applications that just don’t work unless everything goes right. We all know lots of analogues for these apps in other areas: the Ferrari that coughs and chokes if it’s not tuned once a week or the souffle that falls if you just look at it funny. But it doesn’t occur to most people that many, if not most enterprise applications fall into the same category.

They don’t realize, that is, that ordinary, run of the mill, plain-vanilla enterprise apps are kind of like Ferraris: it takes a lot to get them to do what they’re supposed to do.

Here are some examples.

Consider, for instance, the homely CRM application. Many, if not most executive users go to the trouble of buying and putting one in because they want the system to give them an actionable view of the pipeline. Valuable stuff, if you can get it. When the pipeline is down, you can lay people off or try new marketing campaigns. When it’s up, you can redeploy resources.

Unfortunately, though, it takes a lot to get a CRM system to do this. If, for instance, the pipeline data is inaccurate, it won’t be actionable. Say a mere 10% of the salesforce is so good or so recalcitrant that their pipeline data just can’t be trusted. You just won’t be able to use your system for that purpose.

It’s kind of funny, really. Giving you actionable pipeline data is a huge selling point for these CRM systems. But when was the last time you personally encountered one that was 90% accurate.

Another example I ran into recently was workforce scheduling in retail. A lot of retailers buy these things. But it’s clear from looking at them that the scheduling can be very easily blown.

Finally, take the example I talked about before, MRP. When you run that calculation, you have to get all the data right, or the MRP calculation can’t help you be responsive to demand. If even 30% of the lead times are off, you won’t be able to trust much of the run. And when was the last time more than 70% of the lead times were right.

Notice that if you underutilize a brittle system, it can be somewhat serviceable. If you use the CRM system to follow the performance of the salespeople who use it and don’t try to use the totals, the inaccuracy doesn’t matter. If you just use MRP to generate purchase requisitions, the lead times don’t matter. But, as I said in the earlier post, if you do underutilize the system, the amount of the benefit available falls precipitously. And then the whole business case that justified this purchase and all this effort just crumples.

What makes me think that a lot of enterprise applications are brittle? Well, I have a lot of practical, personal experience. But setting that to one side, it’s always seemed to me that the notion of brittle application explains a lot of data that is otherwise very puzzling.

Take for instance all the stuff that Michael Krigsman tells us about project failures. He is constantly reminding us that the cause of failure is a three-legged stool, that it could be the vendor or the consultant or the company, and he is surely right. What ought to be puzzling about this, though, is why there is no redundancy, why the efforts of one group can’t be redoubled to make up for failings in another. If the apps themselves are brittle, however, it takes all three groups working at full capacity to make the project work.

Or, more generally, look at the huge number of project failures (what’s the number, 40%, are abandoned?). Given how embarrassing and awful it is to throw away the kind of money that enterprise app projects take, you’d think that most people would declare some level of victory and go home, unless the benefit they actually got was so far from what they were hoping for that the whole effort became pointless.

How can you tell whether the enterprise app you’re looking at is brittle? Well, look at the failure rate. If companies in your industry historically report a lot of trouble getting these things to work, or if the “reference” installations that you look at have a hard time explaining how they’re getting the benefits that the salesman is promising you, it’s probably a brittle app.

Two questions remain. When do you want to bite the bullet and put in a brittle app? And why would vendors create apps that are intrinsically brittle?

Next post.

A Flaw in Business Cases

September 19, 2009

A software business case compares the total cost of software with the benefits to be gained from implementing the software. If the IRR of the investment is adequate, relative to the company’s policies on capital investment, and if the simple-minded powers that be have a good gut feel about the case itself, it is approved.

Clearly, a business case isn’t perfect. Implementing software is an uncertain business. The costs are complex and hard to fix precisely. Implementation projects do have a way of going over; maintenance costs can vary widely; the benefits are not necessarily always realizable.

That’s where the gut feel comes in. If an executive thinks the benefits might not be there, the implementation team might have a steeper learning curve than estimated, the user acceptance might be problematic, he or she will blow the project off, even if the IRR sounds good.

Business cases of this kind are intended to reduce the risk of a software purchase, but I think they’ve actually been responsible for a lot of failures, because they fail to characterize the risk in an appropriate way.

There’s an assumption that is built into business cases, which turns out to be wrong. The fact that this assumption is wrong means that there’s a flaw in the business case. If you ignore this flaw (which everybody does), you take on a lot of risk. A lot.

The flaw is this. Business cases assume that benefits are roughly linear. So, the assumption runs, if you do a little better than expected on the implementation and maintenance, you’ll get a little more benefit, and if you do a little worse, you’ll get a little less.

Unfortunately, that’s just not the case. Benefits from software systems aren’t linear. They are step functions. So if you do a little worse on an implementation, you won’t get somewhat less benefit; you’ll get a lot less benefit or even zero benefit.

The reason for this is that large software systems tend to be “brittle” systems. (See the recent post on “Brittle Applications.”) With brittle systems, there are a lot of prerequisites that must be met, and if you don’t meet them, the systems work very, very poorly, yielding benefit at a rate far, far below what was expected of them.

This problem is probably easier to understand if we look at how business cases work in an analogous situation. Imagine, for instance, the business case for an apartment building. The expected IRR is based on the rents available at a reasonable occupancy rate. There is, of course, uncertainty, revolving around occupancy rates, rentals, maintenance costs, quality of management, etc. But all of this uncertainty is roughly linear. If occupancy goes up, return goes up, maintenance goes up, return goes down, etc. The business case deals with those kinds of risks very effectively, by identifying them and insuring that adequate cushions are built in.

But what if there were other risks, which the business case ignored? These risks would be associated with things that were absolutely required if the building was to get any return at all. Would a traditional business case work?

Imagine, for instance, that virtually every component of the building that you were going to construct–the foundation, the wiring, the roof, the elevators, the permits, the ventilation, etc.,–was highly engineered and relatively unreliable, required highly skilled people who were not readily available to install, fit so precisely with every other part of the system that anything out of tolerance caused the component to shut down, etc., etc. The building would be a brittle system. (There are buildings like this, and they have in fact proven to be enormously challenging.)

In such a case, it is not only misguided to use a traditional business case, it is very risky. If one of these risky systems doesn’t work–if there’s no roof or no electricity or no elevator or no permit–you don’t just generate somewhat less revenue. You generate none. Cushions and gut feel and figuring that an overrun or two might happen simply lead you to a totally false sense of security. With this kind of risk, it’s entirely possible that no matter how much you spend, you won’t get any benefit.

Now, businessmen are resourceful, and it is possible to develop a business case that correctly assesses the operational risk (the risk that the whole thing won’t work AT ALL). I’ve just never seen one in enterprise applications. (Comments welcome at this point.)

The business cases and business case methodologies that I’ve seen tend to derive from the software vendors themselves or from the large consulting companies. Neither of these are going to want to bring the risk of failure to the front and center. But even those that were developed by the companies themselves (I’ve seen a couple from GE) run into a similar problem: executives don’t want to acknowledge that there might be failure, either.

But the fact that the risk makes people uncomfortable doesn’t mean that it’s a risk that should be ignored. That’s like ignoring the risk that a piton will come out when you’re mountain climbing.

Is the risk real? Next post.