Don Reinertsen gave a wonderful keynote at the Lean Software and Systems conference today. He brought in a lot of new ideas (and many old ones too!) with many insights. I especially liked how he looked at flow as an independent concept from lean. A lot of the talk resonated with what we are trying to do with Silver Stories, so that was a timely talk in more ways than one.
One of the great advantages of being in a startup is that it teaches you to appreciate both the technical and business points of views. Here are some of my observations on the talk.
Lean is just the first peak
A nice analogy by Don was likening Lean to the first peak in a range. Mountain climbers often experience this when they climb the tallest peak they can see and find a taller peak beyond that and another one and another one. Lean is just the first peak in the range, a waypoint on the journey. Lean has some good ideas, but also some ideas that don’t work in product development. We need to look beyond lean and also see other domains that we can learn from. Most of the talk was looking at flow with an Internet packet switching analogy, though Don mentioned some other potential areas to look and learn from.
An economic view of projects and features
One of the highlights of the talk is the way Don linked everything to the economics of running a business. This is something that I feel is really needed in the agile community (in fact this is the primary goal of Silver Stories), so there was a good resonance with the idea. A lot of agile processes talk in terms of development, like burndowns and stories and velocity. We need to translate things into business terms before we can make business decisions on them. A great example is prioritization: How do we prioritize features? Do we use gut instinct? Don proposed (about 20 yrs ago, but its only now coming into the agile community) looking at factors like cost of delay and risk factors to analytically make prioritization decisions.
An assumption here though, it that we have the data to make the quantitative decisions. For instance, we rarely know a-priori what the cost of delay graph for a particular feature is going to be exactly. We can broadly classify them as emergency, or linear cost, or fixed cost and such, but we rarely know what the exact numbers (Do we need to know?). The only way to find out is to actually delay and see what the cost it. Once we have enough numbers we can plot the cost vs time. This seems to be infeasible in certain cases (for little benefit?).
Killing sacred cows
Along the way Don had the audience in splits as he went about killing some sacred cows. One of the examples was about operating an emergency room in a hospital that uses FIFO ordering. In this setup, a person with an ear ache gets treated while a person with a heart attack is waiting. The argument was a bit of a strawman, because I don’t think anyone really does a strict FIFO in software. There is almost always a method to expedite urgent items, and most teams triage at some level. But analysis stops at urgent/not-urgent coupled with a gut feel, when in fact there are many axes to consider. Kanban methods promote a risk based prioritization method based on class of service, which is a big step forward. (Did I mention that this is one of the focus areas of Silver Stories? No? ) So although the examples Don presented were to make a funny point, there is quite a kernel of truth behind them.
Points on variance
Don spent some time talking about variance, and whether it is good or bad. Don pointed out that high variance is good for product development. If you don’t have variance then you aren’t taking the risks needed to generate learning. The maximum amount of information is generated when you fail 50% of the time. I want to point out that we should be clear on the causation: taking risks leads to variation, but just because you have variation doesn’t mean you are doing something right – you could just have an unreliable process.
Don had an interesting tweet on variance – “If two runners have identical mean race times the runner with the greatest variance will win more races”. This is true, but it slightly misleading since most runners try to increase the mean rather than increase the variance. There is something to be said for consistency too – the great sportsmen for example are consistently good over a period of time (higher mean), not one-shot wonders (higher variance). Perhaps venture capitalists are the only group that I can think of that intentinally try to maximize variance.
Another interesting comment came during the audience question on the difference between product variance and process variance. Product variance is variance on high level strategy decisions – which product and feature ideas to pursue, and so on, and the associated variation in the outcome of those decisions. I can see that variation at this level is good – that’s what generates learning on “building the right software”. This is usually an area of high uncertainty so variation is an outcome of that. Process variation is the variation in actually “building the software right”. I would imagine that variation here would be a lot smaller as it is generally much better understood. There are lots of pieces of uncertainty for sure, but a lot less than at the high level.
Standardizing the process
A very insightful comment from Don was during the Q&A when a question came up on standardizing the process. Don mentioned that most organizations tend to standardize at the top of the process. A better way would be to standardize the bottom. With a standard alphabet and a standard grammar, we can generate an infinity variety of proper sentences. Don suggested a similar method to “building” a process for each project using a standardized set of building blocks. Doing so allows you to create and infinite variety of processes to suit each situation. This is a very insightful idea, especially in the current backlash against scrum as being too prescriptive, but I want to note that languages have a well defined grammar that tells us how to put blocks together. Without the grammar, you could make an infinite variety of illegal sentences. So whats the grammar for a software process to prevent an infinite number of bad processes? I think that is still an open question.
The Internet as a metaphor for flow
Don used the Internet as a metaphor for flow. This was a very interesting analogy. One of the main points was how we break up packets, route them independently, then reassemble them at the destination. This leads to much better throughput, and superior error handling. A very nice analogy which brought out the benefits of small batch sizes. The other point was about calculating the correct batch size. We don’t use single byte packets because the overhead is too much. So we need to find the point where we balance the transaction cost with the need for small batches. Don also pointed out the benefits of sliding window throttling to control traffic. We don’t want to wait until we are at full capacity since this leaves us open to congestion collapse when a bursty stream comes in. In terms of software development, this would translate to dynamically adjusted work in progress limits.
Don also mentioned how the Internet is a “best effort” delivery network. The idea is not to prevent errors, but to simply re-request when an error is found. It turns out that re-requesting is a lot cheaper than setting up a guaranteed delivery system (like circuit switching). This ties back to the discussion on economic benefits when sometimes it is simpler to rework than to put in a lot of effort to prevent the rework.
Although it sounds new, I would say that a large part of agile rests on this principle. In agile terms, we would say its cheaper to deliver, collect feedback and iterate rather than spending a huge amount of time up front getting the requirements perfect before starting. In many cases it is cheaper to prevent the error, so this doesn’t mean you shouldn’t prevent the error, it means you shouldn’t blindly focus on preventing errors alone.
My thoughts: In packet routing, none of the routers process the information in the packets, so its easy to do it independently. In the case of work routing, each workcell does process information and that requires context. So for example, its not easy to do half a feature in California, decide there is congestion there, and send the other half to Bangalore. You lose the context for the work. You see this situation for example in load balancing networks when state is stored on the server, which dictates that all requests for a particular session must always route to the same server. Similarly, it seems to me that work item routing can only be done at a much higher level of granularity – possibly at the MMF level. (Another one of the problems we are looking to solve with Silver Stories!)
Don also brought up the point about people and work not always being fungible. I would think specializing generalists help for this kind of situation.
That’s a wrap of my notes. Really looking forward to seeing the talk again when the video is uploaded. This is one that is worth watching multiple times and get something new each time.
Were you in the session today? Let me know your thoughts in the comments below. And do link up your blog posts too.