The case against metrics


Anyone who has worked in any fairly large organization will sooner or later be confronted with metrics to measure performance. The idea of metrics, setting out measures to see if one is meeting one’s goals, is not in itself bad. What is problematic is when metrics are created whose purpose is not to provide valuable feedback but are used almost exclusively to determine rewards and punishments. Then one frequently finds that metrics distort performance as people game the system to meet the requirements of the metrics even if the actual results of doing so are deleterious. Badly designed metrics also focus on the things that can be measured easily rather than on the things worth measuring.

Historian Jerry Z. Muller describes the problematic features of metrics.

More and more companies, government agencies, educational institutions and philanthropic organisations are today in the grip of a new phenomenon. I’ve termed it ‘metric fixation’. The key components of metric fixation are the belief that it is possible – and desirable – to replace professional judgment (acquired through personal experience and talent) with numerical indicators of comparative performance based upon standardised data (metrics); and that the best way to motivate people within these organisations is by attaching rewards and penalties to their measured performance.

The rewards can be monetary, in the form of pay for performance, say, or reputational, in the form of college rankings, hospital ratings, surgical report cards and so on. But the most dramatic negative effect of metric fixation is its propensity to incentivise gaming: that is, encouraging professionals to maximise the metrics in ways that are at odds with the larger purpose of the organisation. If the rate of major crimes in a district becomes the metric according to which police officers are promoted, then some officers will respond by simply not recording crimes or downgrading them from major offences to misdemeanours. Or take the case of surgeons. When the metrics of success and failure are made public – affecting their reputation and income – some surgeons will improve their metric scores by refusing to operate on patients with more complex problems, whose surgical outcomes are more likely to be negative. Who suffers? The patients who don’t get operated upon.

Compelling people in an organisation to focus their efforts on a narrow range of measurable features degrades the experience of work. Subject to performance metrics, people are forced to focus on limited goals, imposed by others who might not understand the work that they do. Mental stimulation is dulled when people don’t decide the problems to be solved or how to solve them, and there is no excitement of venturing into the unknown because the unknown is beyond the measureable. The entrepreneurial element of human nature is stifled by metric fixation.

When I was teaching, I worried about how to assess my students fairly because any experienced instructor knows that if one is not careful, many subjective factors can color one’s perception of an essay or exam or other measure of performance. So at one point, I created metrics that broke down the performance measures into smaller chunks, to make sure that I was weighing all the relevant factors and not forgetting any. My initial mistake was to quantify those small chunks on a numerical scale and then add those numbers to get the overall grade. The problem was that the final result didn’t always agree with my professional judgment of the quality of the overall performance. I then changed the system, keeping the metrics as feedback but replaced the numerical scores with a three point ‘good, satisfactory, unsatisfactory’ judgment. The metric made sure that I was paying attention to all the important elements but now enabled me to use my professional judgment to make the final evaluation. That system worked out well.

In the last decade or so of my university career, I was director of a center and periodically I would get metrics from central administration that were sent out to all departments to be filled out. These generic metrics were useless because they were of the ‘one size fits all’ variety and did not really measure things of value and many features did not apply to my center at all. I initially tried to persuade the administration to change the metrics, to customize them to make them more useful, but after failing a couple of times, I largely ignored them, filling them out perfunctorily in a few minutes. I preferred to focus my time and energy on doing the things that I thought it were important for my center to do. That worked out well too because I realized that as long as the people my center was designed to serve, i.e., the faculty who cared about improving teaching, were pleased and satisfied with the assistance we were providing, that was the main thing.

In short, metrics can be very useful as a means of providing detailed feedback but they become harmful when they are used as rote evaluation tools that replace expert, professional judgments.

Comments

  1. starskeptic says

    As a personal example: A high “administrative meetings per patient-bed” number probably doesn’t lead to better quality health-care…..

  2. says

    I think this is one of many symptoms of the illness that spreads from USA in last decades -- profesional managers. People whose sole purpose in life is being managers, without actualy having some real-world parctical knowledge about the field they tend to manage. Know-nothing powerpoint and excel jockeys whose sole skillset consists of crunching numbers, making graphs and riding on the backs of others (or stabbing them) have a tendency to try and reduce absolutely everything into one or two numbers. They tend to use statistics a lot, without actualy understanding its true advantages and limitations or understanding the underlying principles. This tendency often devolves into pure management-woo.

    It also leads to absurd situations where it can be easier and with fewer problems to replace a high ranking manager than it is to replace a janitor, despite the manager having multiple times over higher wages (speaking from experience here) and “responsibility” (*spits*). In last decades many people in higher ranks at the company where I work were people who have no clue whatsoever about the technical side of what the company does. And the higher up in the hierarchy, the lower (often, not always) the knowledge. And the higher up someone sits, the more and more they demand everything being reduced into a simple metric.

    One of the managers, who now luckily poisons another company, got real technical engineering degree, and afterward he got MBA in New York. The MBA turned him completely “number oriented” manager, i.e. insufferable a*hole.

    One of the technically savvy managers actualy told me that he cannot tell his superior that certain problems cannot be solved the way he desires because said proposed solutions defy the laws of physics, because he would be deemed incompetent. Thus lower (competent) managers are actualy forced to lie to their (less competent or downright incompetent) supervisors if they want to keep their job.

    In my experience, the less are managers involved in decision making and leave it to actual experts, the better the job gets done. Most, if not all, problems stem from idiots with overinflated ego and big power thinking they actualy have something to say.

  3. starskeptic says

    Charly @2
    Indeed -- this whole idea of managing as a separate discipline, completely divorced from the process….

  4. says

    -- or used as clubs for beating individuals to work harder. In a sense, the stock market share price is a metric of investor confidence in Amazon; Amazon in turn measures its meat robots performance. It’s not value-neutral but metrics are not always a negative.

  5. Pierce R. Butler says

    Sounds like the graduates of the G. Dubious Bush measure-and-reward/punish-public-schools-by-multiple-choice-tests regime have gotten their MBAs and set about turning everything else into the same systems which brought them success and flunked out the actual functional humans.

  6. Daniel Schealler says

    A friend of mine gets this all the time.

    He works for a cinema company. There’s a bunch of metrics they measure around the food and drink they sell at the snack bar. And that’s crucial for their financial success as a company, because the studios take a massive share of the actual ticket price. Cinemas live and die on their snack bar.

    One of the rules of selling products is around price optimization: The goal should be to maximize profit. If you drop prices a little, you may sell more units, and if you sell enough units at a lower price you could wind up with more profit even though you dropped prices. But if you drop prices too much, you can’t sell enough units to make up the per-unit shortfall. So you ideally want to optimize the price to get the balance exactly right for the highest possible profit.

    They did a trial recently on the price of popcorn, and they were sort of scientific about it. They had two cinemas with very similar demographics and metrics. They trialed a price drop of 50 cents on all their popcorn sizes at once store, and left the other store unchanged as a control. They compared the metrics of both stores, and they found that the drop in price at the test store led to a significant increase in the percentage of people who bought popcorn compared to the control store.

    They crunched the numbers, and the increase in popcorn sales alone led to enough additional revenue to justify the price drop. But on top of that, once people bought popcorn, they were also likely to buy a drink or some other snack along with it, even though the prices on those items hadn’t been dropped!

    They thought this was great, and appealed to the head office to approve the change to drop the popcorn prices by 50 cents on all the stores that fit that demographic profile.

    Approval should have been a no-brainer. But head office rejected the proposal. They explicitly cited that they did so because the proposed change dropped the revenue-per-unit markup on each popcorn sale was down at the trial cinema, and they considered that to be unacceptable. So the price went back up to the original at the trial cinema, so of course revenue-per-head went back to normal.

    Then they received a directive to find a way to get the revenue-per-head at the trial store back up to where it was during the trial, but without dropping the price of popcorn. When that inevitably failed to happen, they were chastised.

    It was total madness: Couldn’t see the forest because their view was blocked by a single tree, and a massive reduction in employee engagement and morale to boot.

  7. Nomad says

    This exists all over. Years ago I used to deliver pizza for a carry out and delivery only store. You know, the hole in the wall places you see in strip malls that only have a few benches for people waiting on their carry out orders to sit on. It was a corporate owned store, not a franchise, so we had this regular group of middle managers that would occasionally tour through the area to tell us that we’re supposed to sell more pizza by working harder.

    This was in the late 90s and early 2000s, I expect everything has only gotten worse since then. But even at the time we had a store network (no Internet ordering yet) that connected our monochrome CRT POS terminals to a local server that phoned home on a modem at night to report all our data to our corporate masters.

    We used conveyor belt pizza ovens. They use a metal conveyor belt to take the pizzas through the oven so that the pizzas get cooked for a fixed time. We had two ovens, so that gave us a very hard upper limit on how many pizzas at a time we could make. At times our order volume would go up beyond the capacity of our ovens and we’d get a backup. To any reasonable person this is unavoidable. But our corporate overlords were not reasonable. The computers tracked how long it took to get delivery orders out the door and it didn’t take our oven capacity into consideration. If there was too long of a gap between when a delivery order was placed and when it was sent out the door we were penalized, no matter what.

    Our managers ended up gaming the system by faking it. The order tracking process was fairly basic, it didn’t track when things came out of the oven like many stores do now. It only knew when an order was either picked up by a carry-out customer or when it was sent out the door with a delivery driver. So our managers would tell the system that pizzas were out on delivery before they were even out of the oven at the busiest times. And then they’d be cashed in early as well so that the system didn’t think the driver was out for too long of a time either. This would also be done if we simply lacked enough drivers to handle the volume. This created additional problems, for instance an order that couldn’t be delivered for any reason was already marked as delivered in the system so additional fakery was required to deal with the money that the computer thought the store had received that it hadn’t.

    And no, getting more drivers wasn’t always the answer. Having too many was also a problem, our labor hours were another metric that we were constantly judged on. I remember one night I was working until close and we had a mountain of dishes piled up because we didn’t have enough drivers scheduled to do the work (delivery drivers are really multi-purpose minimum wage slaves, dedicated inside workers actually get paid more because they don’t get tips but drivers are the ones who end up having to wash dishes) and had had a lot of orders. You’d think corporate would love that, but they still expect everyone to clock out on schedule no matter how busy we’ve been.

    Well later on that night once the store was closed the manager turns to me and said that he had to have me clock out because of the importance of keeping our labor hours down. “I won’t ask you to keep working” he said. Just understand that he was a friend, and I knew that no matter what I did he’d have to keep working off the clock. This wasn’t his fault, he was, if anything, more powerless than I was. So I stayed and worked off the clock, voluntarily letting the company steal my wages so he didn’t get screwed even worse than he already was.

    So yes, this is what I think of when I hear “metrics”. I think of my managers working out a system to systematically deceive our computer system to keep the metrics in line with the impossible demands made of us. I think of the middle managers and their numbers that reflected the alternate reality that we had constructed for them, a fiction that further isolated them from the reality we were facing on the ground and unfortunately reinforced their confidence in a rotten system.

  8. Peter Butler says

    A story from my dad, a longtime AT&T -> Pac Bell engineer. He was involved when BART dug up San Leandro for BART tracks. BART paid for cable relocation but the phone company could take advantage of the relocated underground construction by pulling more cable and installing empty pipe for future expanded service.

    A mainframe at Bell Labs computed the optimum mix of new cable and empty pipe. What it was unable to account for was the expected hillside upscale homes and their demand on telephony services. Bottom line: input to the computer was faked until the desired result was produced.

    Intelligent boots on the ground wins over any computer ~3k miles away. Given how fast the upscale homes sprouted even the boots on the ground underestimated the future demand.

  9. flex says

    There are good metrics, but there are a lot of stupid ones. The CEO of the company I just left decided that the engineering budget should be fixed percentage of sales, and the value he used was lower than any division in the company was operating under. No one knows where he got the metric, probably from some generic presentation or magazine article. Then he decided that all hiring approvals needed to come through him. When people left the company, he wouldn’t approve any replacements because this one metric showed that there was enough people to handle the work. Eventually the customers said they would not award any new business until enough people were hired to support the business the company already had. Even customers explicitly telling the executive board that they would not award any new business did not shift the CEOs opinion of his metric. Just after I had accepted a new job elsewhere the shareholders stepped in and threw out the CEO.

    But the stupidest metric I saw was when a new drawing release system was introduced and a new metric was also created. The time it took from a drawing change request and the time until the drawing was released to production. That’s not a bad metric if it was used to evaluate the system and look for improvements. Instead it was used to personally evaluate the poor sods who worked in that department. After a near-revolt just after their annual performance reviews, which directly impact their raises, the team was allowed to adjust the starting time for the metric to reflect the time it took to accomplish tasks they were not responsible for. Of course the metric showed a tremendous improvement. It’s hard to not show an improvement on a metric measuring duration when the people being evaluated can change the starting time as much as they want.

    I am a fan of responsible metric use. To evaluate the performance of a system they are useful. But I’d rather have no metrics than ones which are used incorrectly. I have similar feelings about statistics.

  10. lanir says

    I’m not entirely certain there is a direct causal line between the cult of management and the cult of metrics but they do blindly worship similar philosophies. Both are blind worship of observably false nonsense, that much is easy to prove. But because they both move all power up the chain and away from anyone doing the work or near enough to see how it’s actually done, they’re both very popular.

    The logic isn’t all BS. Sometimes you can measure important things. One of our critiques of the sorts of corporate systems that tend to subscribe to these cults is also a metric: worker pay vs management pay. And every company lives or dies by the metrics of money in compared to money out.

    But I think the point of the post is that they’re often applied in a universal way but treated as though they’re customized to the job at hand, which they’re often not. It’s the same false premise the entire cult of management is based on, the idea that specific knowledge is universal and universal knowledge is specialized knowledge. This concept is popular because scarcity of knowledge is a factor in how much an employee is paid. When you willfully fudge the line between general knowledge and specific knowledge, you get to the idea that anyone can be replaced by any warm body off the street except for management (of course, what a surprise). Since specialized knowledge is one of the key things any employee can bring to the table to justify a favorable pay rate, there’s a reason people join the cult and choose to be willfully ignorant about this. As with most religions in my experience, there aren’t many true believers. If there were, you’d be hired and fired based on metrics alone and nothing else would matter. But in most places that doesn’t happen. Largely because no matter what they tell you, somewhere along the way there’s an awareness that the metrics being measured don’t have any relation with the only metric that’s actually important to the owners: the profit margin.