Robotic Process Automatization versus Software Development – the case for Agile RPA

Recently I had the huge chance to learn and then to apply RPA in a real life project. I think the method is nothing short of revolutionary and will definitely change the way we think about process improvements and more importantly, how we will actually improve processes. I also plan to write several blog posts about my experiences as it will be probably useful, or at least entertaining, to those who embark on the same path.

I spent about 15 years of my professional life as a software developer at various big companies like Siemens or GE, so I was naturally intrigued (well, the actual term should be more like pissed off ) by the advertisement of several RPA providers who all claim that scripting robots is NOT software development, it is in fact something that anyone can do without previous knowledge of programming.

This claim is based on the fact, RPA systems allow one to capture user activities and to replay them, so it is in fact possible to not have to “program” to develop a script. Unfortunately, there are some basic problems with this view.

The first is the confusion between “typing code” and programming. The idea that clicking on icons and pulling them to various places on a screen is somehow different and easier than actually writing a program looks very tempting at the first sight. After all, programmers type gibberish that only they understand most of the time while business analysts and managers build presentations using pretty pictures – or so the stereotypes go . So, if we can do away with the gibberish (and with the typing) then probably business analysts and even managers will be able to produce robots, and we will need no expensive programmers no one understands anyway – this feels like a very tempting proposition.

Unfortunately this idea is very far from being true.  At the risk of stating the obvious – programming is not about typing. Programming means describing procedures in a highly structured way so that no other knowledge, of  the kind humans have, will be necessary to execute them, and even more importantly , keeping these descriptions (aka “programs”) in a state where they can be updated, modified and generally used by people who did not participate in the development. This is absolutely necessary as programmers tend to wander off to other projects, customers discover new needs and bugs raise their ugly heads. Programs are never finished in the sense bridges, for instance, are so they need constant care, which would be next to impossible if the program is not developed with this in mind.

That means in Software Engineering terms that programming is mostly concerned with the famous “ibilities” – usability,  reliability and maintainability. Usability means developing a program (or script) that the customers find useful and are willing to pay for it, reliability that the program can run most of the time and maintainability that the developed code is easy to understand and to modify without the risk of breaking it  by introducing changes that have unforeseen effects.

Achieving these goals has absolutely nothing to do with the way a program is developed – by typing text or by assembling pretty little pictures. In this sense the message “RPA is absolutely not like programming” is wrong and probably dangerous. My uncomfortable feeling is that many companies will translate this marketing message to something like “ we can now finally forget all the lessons SW Engineering learned in the last 50 years and we can just work spontaneously as we think fit, because THIS IS NOT PROGRAMMING”.   I think the lesson to be first forgotten will be about the Agile Development.

To be fair, Agile has already taken some hits due to the hype that has been going on for some years now. Most companies developed an uneasy relationship with the concept which makes it all the more riskier to take the position that RPA initiatives absolutely need Agile as the development methodology. Let me showcase why this is amust by the example of the “ibilities”:

Usability

How do we make sure we develop robots that make things people find useful? The only way I can imagine is to send the RPA developer to sit with the people who actually do the work that will be automatized (at least partially) observes what they do and discusses with them what they need. (Remember the lean term Gemba ? The Agile idea of including customers in the development team? This is the same, only ten times more necessary).  This is only possible in loops – ideas will be captured, robots with minimally useful functionality implemented and feedback from the direct customers (the office workers whose work will be made easier) collected for the next version. And the next version will be developed the next week, so that the customers see the effects of their participation immediately.  This also means that in the early phases errors are tolerated or even welcome. After all, each error discovered in this phase is one error less in the final robot.

Unfortunately, if the organization is unaware (or wilfully forgets) what we all learned about SW development (and remember , they bought into the RPA with the idea it is NOT SW) the spontaneous way to develop will be the well-known waterfall. We will send an e-mail to process experts to please describe their processes with the most details possible and when the spec is ready somebody who probably never saw a live process worker, somewhere in the basement of the IT organization,  or maybe somewhere even farther, will develop a robot. The robot will be tested on a number of (ideally well chosen) test cases and then deployed. It will also quickly fail – because requirements change, the users forgot to mention the odd extraordinary case and so on… we al know the examples from real life projects. However by this time the developers have other robots to develop and the whole deployment degenerates into the acrimonious discussion of who is to be blamed for what.  We have all been there and have done that  – and we risk starting the cycle once again.

Reliability

This „ibility“ will be about testing . Again, the Agile way of developing small useful chunks, the immediately let them be tested by the users and repeat the cycle is the best way of achieving this. The traditional way of developing a number of testcases and test them cold has the weakness of never knowing for sure whether the testcases really cover all the eventualities and whether a “tested” SW is really safe to deploy or not. I worked once in a project where ateam sent one year writing test cases and when we checked later it turned out that all the cases covered less the 20% of al eventualities.  Designing tests is an important part of the SW Engineer know-how but remember, RPA is NOT SW ?  This way we risk ending up with the worst of both worlds: no timely, direct feedback from the people who use the robot and no really usable testcases either.

Maintainability

This is where I find the marketing line RPA not being SW the most dangerous.   In the effort to sell RÜA as NOT SOFTWARE most RPA providers embrace a visual programming style . This is all very nice and easy on a marketing show, where anybody can draw up to five icons to a screen and visually link them with nice arrows – but the then real life will necessarily kick in after the purchase. It is no accident that other industries already experimented with visual programming and then returned to text. The problem is, that with a visual program a LOT of essential information is hidden in small dialog boxes attached to these nice icons. And by essential I mean things like delays before a mouse click or waiting times before a terminal message is sent or even input parameters that are given to what goes for a function in this visual paradigm. Now, imagine the simplest of maintenance actions – find the places where a timeout has a small given value and change it to something longer. In the good old text-programming wold this would be a simple search and replace operation . Maybe there is a better way in an RPA visual program, but the only way I can see now is opening each and every program that uses this timeout and click through each and every icon to separately edit each  dialog-box . Good luck doing this with a few hundred icons (aka lines of code) and especially good luck finding people willing to do this brainless work for days on end. (Well maybe we could write code maintenance robots to do it).

And this just the tip of the iceberg and a pretty much trivial task. Once robots will be deployed in numbers there will be many such and more complicated tasks – we know from SW development that in big organizations code maintenance takes up to 80% or more of the time of a developer. Unless we can freeze the processes with robots down to the last mouse-click I see no way this percentage should be different for robots.

So, where does this leave us? Is RPA bad for the companies?

I definitely do not think so. The message that RPA is easy and so not like SW is on the other hand dangerous and damaging. If anything, we would need to be more agile in RPA development than in “normal” SW development, and this definitely needs planning and organization before the deployment. Call me a maniac, but I strongly believe that Lean and its SW offshoot Agile are the answer for most of the problems. It only takes the will of the organizations to implement it – and to not fail for the syren song of “this is easy, anyone can do it” of the marketing types. It is not easy and many will fail if they mindlessly implement it – but it has a huge potential to make life better for the people working in processes and we know how to do it right. So,  as in so many other things in life : PLAN, DO, CHECK ACT  and take the benfits.

 

Value Stream Analysis in a Digital World

Capturing and analysing value streams is one of the most used and liked methods in the lean process improvement methodology. In a sense we all grew up , as lean coaches, reading Mike Rothers brilliant “Learning to See” book and applying it in all possible situations. Most of us are also familiar with the objections of the type “ our process is far too complex for a value stream analysis to work” and learned how to work around them, mostly by eliminating unnecessary complexity from the analysis. I think there is a consensus among us, lean coaches, that value streams work very well and are the most important step in analysing a process, be it manufacturing or administration.

We must recognize though that the easy application of a VS rests on a few premises:

  1. Each process step is executed by dedicated resources who only work in that process step
  2. The processes described by the value stream are standardized to the extent that they have little enough variation around their mean values describe the process well.

The Formula

If we are looking at a manufacturing operation, like a production line, both of these premises are almost certainly true. However, as soon as we move to administrative processes the situation starts to look a bit more shaky. It is common knowledge that resources are not dedicated to a single task , but have several tasks related to different value streams: e.g. a person answering customer enquiries about new products might also be responsible for handling customer complaints , or someone managing the finished goods deliveries will also work in planning the production, or a maintenance engineer will work on classifying incoming defects and also repair parts and so on. To add insult to injury in many of these cases the processing times will be wildly variable ranging from minutes to days for the same type of task .

An important question to answer for any VS specialist will be, how we can handle these situations. One obvious answer would be, to just explain the premises and regress to the “normal” process mapping like the swim lane. This approach has the downside, that it will represent and not eliminate the “unnecessary” complexity of the process, indeed one of the goals of the mapping exercise will be to show everyone how complex the process is in order to create an impetus towards simplifying it. So, a swim lane is a great tool to shock the stakeholders into action, but much less suitable to actually analyse a process.

A better approach would be to extend the value stream methodology to handle deviations from the two premises. There are two steps needed to do this, mostly corresponding to each of the two premises.

If the problem is that we have several unrelated tasks performed by the people working in each of the boxes in the value stream, and more people work on those tasks in parallel we can extend the concept of the processing time to something we call “effective processing time” which will be the average time between two finished products leaving the process step, provided the step was well supplied (i.e. it did not have to wait for materials, input, etc). We have a nice formula for the general case of several resources with differing efficiency and allocated for different tasks as well. The derivation of the formula, for those interested, can be found here:

https://www.xing.com/communities/posts/institute-for-lean-six-sigma-process-excellence-lounge-1010755515

and it looks like this:

PTeff = 1/SUM(A1/P1 + A2/P2+ … + An/Pn)

where A1 is the percent of time resource 1 is working efficiently on the task related to our value stream and P1 is the processing time, the time it takes resource 1 to accomplish the task, provided there are  no interruptions during the task.

For example, imagine we have two maintenance engineers working on repairs. The first one is more experienced, and he averages 2 hours per repair, the less experienced colleague averages 2.5 hours. However the experienced engineer will also have to manage the suppliers of spare parts which takes about 3 hours of his day, the less experienced one is only dedicated to repair jobs. What will be the effective processing time of the repair step?

Using the formula with P1=2, P2=2.5. the first engineer can only work (8-3)/8=0.6 (60%) of his time on repairs, the less experienced one 100%. Putting it all together the PTeff of the step will be

1/(0.6/2+1/2.5)=1/0.7=1.4 hours.

So, every one and half hours the team finishes a repair job. In a value stream map we could represent this step in the same way as if we had one resource that could finish one repair every 1.5 hours. By applying the formula we managed to eliminate the complexity generated by the unequal process times, more then one resource in the process step and additional tasks that are not related to the value stream, basically eliminating the problems related to the first premise.

Simulation

The second premise also raises problems in practice. One of the most frequently heard comments, when training lean and especially value streams is that this only applies to car manufacturing, exactly because they have very highly standardized processes. In cases where we have too many random influences our analysis of the value stream will miss important effects because we only concentrate on the average behaviour.

The problem is also, that we have very little intuitive understanding of how a value stream will behave and especially how random effects will influence a process. This would require a dynamic view of the value stream and our mapping is essentially static. The way out of this has been known for a long time : it is to build and to analyse a simulation of the value stream. The problem is (or rather was) that simulation software was expensive and specialized. As far as I know, there was no standardized and cheap way of easily building a simulation.

This changed, as so much else in statistical analysis, with the advent of R (and to be fair ,Python as well). Today we have open source, widely used software that is a de facto standard for system simulations. This means that any value stream we build in the traditional static way can be easily transformed into a dynamic view. A dynamic view also means that we can build a much better intuition of what the value stream is doing and also we can get a picture of what the effects of random variations will be and moreover, we can answer hypothetical questions about how our value stream would change if we introduced specific changes in the process.

As an example I will take an interesting process proposed by one especially talented trainee group we work with: a visit at the doctor. The process has 4 steps :

  1. The nurse receives the patient and prepares the patient file for the doctor
  2. The doctor examines the patient
  3. The nurse updates the patient file
  4. The doctor signs the documents
  5. During the day random calls for future appointments by patients also have to be answered by the nurse.

As we can see, this is not a complex process by far, still, it already violates both premises. In order to map the process we need to work out the effective processing times for each step and to do this we need some average values. These would need to be measured or estimated in a real case, for now, for the sake of the analysis let us just assume them like this:

  1. Step 1 takes in the average  2 minutes
  2. Step 2 8 minutes
  3. Step 3 4 minutes
  4. Step 4 0,5 minutes
  5. A call takes on the average 4 minutes, one call arrives once in about 10 minutes

We also assume one patient arriving every 9 minutes.

Using the formula from before we can calculate the effective processing times for the nurse like this. For Step 1 she can spend 2/(2+4+4) = 20% of his time. The processing time is 2 minutes so the effective processing time is 2/0,2 = 10 minutes, so in average he can prepare 1 patient file every 10 minutes. The effective processing time of the doctor is 8.5 minutes. The standard value stream analysis will tell us that the patients will queue waiting for the nurse, there will be no queue waiting for the doctor.

Can we have a better view of what is going on, by using a simulation? Using the R library simmer we can easily build one and check the queue length over a working day. It will look like this, if we do not consider any randomness. This would be Case 1 on the graph.

 

We can see that the real behaviour is a bit more complex then our static view. Even in absence of random effects, we might see a bit of a queue at the doctor but essentially our view is correct: we see no build-up at the doctor but a steadily increasing queue at the nurse.

Now let us introduce some randomness into the process. In order to do this we would need more detailed information about the distribution of the processing times for each of these steps – that would mean detailed measurements and a longer period of data collection. However, there is´s a quick and dirty way of introducing some assumptions in a simulation by using the so called triangular distributions . These can be defined by 3 numbers: the minimum, the maximum and the most frequently occurring value (aka the mode). The shape of the distribution is triangular, so we will miss the finer details, but in order to get a first impression the details are generally not that important, and can be refined in later steps if necessary.

Let us take an example : let us assume that the doctor has some variation described by the triangular distribution  (5,14,8) . The mean time would then be 9 minutes per exam, with variations between 5 minutes and 14 minutes. Let us also assume that the patients arrive randomly described by the distribution (5, 15, 8) that is on the average 9,33 minutes per patient. To make things simple let us assume the nurse will work in the standardized way, that is his working time is constant in each phase with no variation.

Now, as we have introduced some randomness into the simulation we will have a different picture at each run. One example can be seen in Case 2. Even though statically seen the doctor has time, we see a build-up developing  around mid-day for the doctor . This is purely due to bad luck, the doctor had a few random patients who took longer and/or some arrived earlier then expected. This effect is hard to predict based on the static value stream alone. The nurse is still overworked – he can finish one patient in 10 minutes (all phases considered) and patients arrive once in 9.3 minutes so that by the end of the day we predictably have a queue in front of the nurse. The doctor however managed to eliminate the queue , which was to be expected over a longer term.

 

Just to illustrate how this analysis would go on, let us consider the idea of outsourcing the incoming calls and let the nurse only work with the patients. The result is seen in Case 3. By applying the formulas we could have more or less predicted this result but it is still nice to see our prediction realised.

Now there is room for further ideas. Obviously the new bottleneck is the doctor, so what would we need to do in order to reduce waiting times and queuing?

The above is just a simple  example of combining a more detailed value stream analysis with simulations , but imagine the power of this method in  a real workshop where we have the people actually working on the process, together with an analyst and a simulation, where we could try ideas and hypothesis on the fly, coming up with new scenarios and being able to quickly answer questions like the above. This will be a whole new level of understanding the processes we work with and we definitely should apply the method as a new standard of digital value stream analysis.

 

Designed Experiment to Re-engage Silent Customers

In the spring I had a chance to work in a project that had a very special problem. We had to convince the customers of an energy company to stay at home for a day, so that the company  can upgrade a meter in their home. The problem was special because the upgrade was mandated by government policy, but offered basically few advantages to the customers.

Obviously this a great challenge for the customer care organization – they need to contact as many customers as they can  and convince them to take a day off and wait at home for the upgrade. The organization needs to send out huge numbers of messages in the hope that enough customers will react to it. This necessarily means that we also get a great number of so called “silent customers” – people who decide to  not react to our first message in any way.

As we obviously do not have an infinite number of customers to convince, silent customers do have a great value – at least they did not say no yet. The question is, how to make them respond ? If we learn how to activate at least some of them we can use this knowledge for  the first contact message and make our communication more effective.

The problem is of a more general interest then this special project  – just think of NGOs who depend on donors. Learning how to make prospective donors more interested at the first contact has a very definite advantage for them as well.

So, how do we go about this? Coming from the Lean/Six Sigma world our first idea was to actually LEARN what is of interest to the customers. Previously there were many discussions and many hypothesis were floating around, mostly based on personal experiences and introspection. Some were already tried but none were really successful.

We changed the game by first admitting that we do not know what is of interest to our customer base – they had wildly differing demographic, age and income profiles, which did make all these discussions quite difficult.  Once we admit ignorance though (not an easy thing to do BTW) our task becomes way more simple. There is just one question left in the room: how do we learn what the customer preferences are, except the many we used to have along the lines of   “how do we interest hipsters or families with small children”? and so on.   Coming from the Lean six Sigma world there is just one answer to this question : we run a designed experiment to find out.

It is important to realze that we run the experiment to LEARN and not to improve anything. This is an error in industrial settings as well but in this project managing the expectations was even more important. However as we stuck to our goal of learning about the customer, designing the experiment became  much simpler, as we avoided useless discussions about what will be beneficial and what not. Every time an objection came up about the possible usefulness of an experimental setting we could just give our standard answer : we do not know, but if you are right it will be proven by the experiment.

As we went on designing the experiment we realized that we only needed (and were allowed to) to use two factors :  communication channels and message types. All the previously so bothersome issues of age distribution, locality and such we solved by requiring large random samples across al these factors.  Having large samples was, unlike in manufacturing, no problem at all. We could decide to send an email to a thousand customers or two thousand without any great difficulty or cost. As we were expecting weak effects anyway, having large sample sizes was essential to the success of the experiment.

Finally we decided on the following : we used two communication channels, e-mail and SMS, and three message types. One message targeted the geeks by describing how much cooler is the new meter, one targeted greens by describing how the new meters contribute  to saving the environment and one  was appealing to our natural laziness by describing how much easier it will be to read the meter. So, in the end we had a 2X3 design., two channels times three message types And this is where our problems started.

Customer contacts are different from settings on a complex machines in the sense that everybody has an opinion about them and for the machines you do not need to talk to the legal and to the marketing department before changing a setting. We had several weeks of difficult negotiations trying to convince every real or imagined stakeholder that what we intend to do will not harm the company – and at every level it would have been way easier to just give up then to trudge on . It is a tribute to the negotiation skills and commitment of our team members that we managed to actually run the experiment. I kind of think, that this political hassle is the greatest single reason why we do not see more experiments done in customer related businesses.

For 3 weeks we sent every week about 800 e-mails and about 300 SMS-es per each message type . We had several choices about how to measure the results. With the e-mails we could count how many customers  actually clicked on the link to the company web-site but for the sms-es it was only possible to see whether a customer chose to book an appointment or not. This was definitely not optimal, because the we could not directly measure the efficiency of the messages except for the emails. To put it simply the fact whether a customer clicks on the link in the message is mostly influenced by the message content while the fact whether the customer books an assignment depends on many other factors. Here is randomization helpful – with the sample sizes and randomization we could hope that these other factors statistically cancel each other so that the effect of the message will be visible if a little more dimly.

Our results were finally worth the effort. A first learning was that we had basically no-one reacting to the SMS messages. Looking back, this had a quite clear explanation – our message directed the recipient to click on a link to the company web-site and people are generally much more reluctant to open a web-site on a mobile phone than on a computer (at least that’s what I think). Fact is, our sms-es were completely unsuccessful, though more expensive than the e-mails.

On the e-mails we had a response of 3.5 – 4% for the ones appealing to the natural laziness as compared to less then 2% for the other message types. As the contacted people were silent customers, who once already decided to ignore our message, getting 4.5% of them to answer was a sizeable success.By the sample sizes, we had, proving statistical significance was a no-brainer.

The fly in the ointment was that we failed to translate these clicks to confirmed appointments – we basically had the same, very low percentage of confirmations  irrespective of channels or message types. Does this mean that our experiment failed to identify any possible improvement? At the risk of being self-defensive here, I would say that it does not. Making a binding confirmation depends on many factors outside the first priming message we were experimenting with. The content of the Web-side our customers go to, to mention just one, should be in synch with the priming message, which was not the case here. So, the experiment delivered valuable knowledge about how we can make a customer come to our web-site , but not about how to make the customer interested  in our message – and this ok.  This was exactly what we set out to investigate. As mentioned before, managing expectations is a very important element here.

What would be the next steps? Obviously we would need to set up a new experiment to investigate what factors impact the customer willingness to accept our offer. I am certain, that this is what the team will do in the next phase – after all, we learned quite a lot about our customers with a ridiculously low effort (excepting the negotiations) so why not keep on learning?

Theory of Constraints meets Big Data part 2

I would like to continue the story of the hunt for the constraint using a lot of historical data and the invaluable expertise of the local team.  There is a lot of hype around big data and data being the new oil – and there is also a lot of truth in this. However, I find that ultimately the success of a data mining operation will depend on the intimate process knowledge of the team . The local team will generally not have the expertise of mining the data using the appropriate tools, which is absolutely ok, given that data mining is not their daily job.  On the other hand a data specialist will be absolutely blind to the fine points of the operation of the process – so cooperation is an absolute must to achieve results  The story of our hunt for the constraint illustrates this point nicely in my opinion.

After having found proof that we have a bottleneck in the process our task was to find it or at least gain as much knowledge about the nature of the bottleneck as possible. This might seem to be an easy task for hardcore ToC practitioners in manufacturing, where the constraint is generally a process step or even a physical entity, such as a machine. In our process of 4 different regions, about 100 engineers per regions, intricate long and short term planning and erratic customer behaviour, little of the known methods to find the bottleneck seemed to be relevant.  For starters, there was no shop-floor we could have visited and no WIP laying around giving us clues about the location of the bottleneck. The behaviour of all regions seemed to be quite similar which pointed us in the direction of a systematic or policy constraint . I have read much about those, but a procedure how to identify one was sorely missing from my reading list.

So, we went back to our standard behaviour in process improvements : “when you do not know what to do learn more about the process”.  A hard-core lean practitioner would have instructed us to go Gemba, which, I have no doubt, would have provided us with adequate knowledge in time. But we did not have enough time, so our idea was to learn more about the process by building a model of it. This is nicely in line with the CRISP-DM methodology and it was also our only possibility given the short time period we had to complete the job.

The idea (or maybe I should call it a bet) was to build a well-behaved statistical model of the installation process and then check the residuals. If we have a constraint, we shall either be able to identify it with the model or (even better) we shall observe that the actual numbers are always below the model predictions and thus we can pinpoint where and how the bottleneck manifests itself.

Using the tidyverse (https://www.tidyverse.org/) packages from R  it was easy to summarize the daily data to weekly averages. Then, taking the simplest approach, we built a linear regression model. After some tweeking and adjusting we came up with a model that had an amazing 96.5% R-squared adjusted value, with 4 variables. Such high R-squared values are in fact more of a bad news in themselves – they are an almost certain sign of overfitting, that is, that our model is tracking the data  too faithfully, incorporating even random fluctuations into the model. To test that we used the model to predict the number of successful installs of Q1 2018. If we overfitted the 2017 data then the 2018  predictions should be off the mark – god knows, there was enough random fluctuation in 2017 to lead the model astray.

But we were lucky – our predictions fit the new data to within +/- 5% . This meant, that the fundamental process did not change between 2017 and 2018 and also that our model was good enough to be investigated for the bottleneck.    Looking at the variables we used we saw that we had two that had a large impact and were process related  –  the average number of jobs an operator will be given per week and the percentage of cases where an operator was given access to the meter by the customer . The first was a thinly disguised measure of the utilisation of our capacity and the other a measure of the quality of our “raw material” – the customers. Looking at this with a process eye, we found a less then earth-shaking conclusion – for a high success rate we need a high utilisation and high quality raw materials.

Looking at the model in more detail we found another consequence – there were many different combinations of these two parameters that led to the same number of successes:  low utilisation combined with high quality was just as successful as high utilization combined with much lower quality. If we plotted the contour lines of equal number of successes then we got, unsurprisingly, a number of parallel straight lines moving from the lower left corner to the upper right corner of the graph.  This delivered the message, again, not an earth-shaking discovery, that in order to increase the number of successes we need to increase the utilisation AND the quality in the same time.

To me, the surprise came when we plotted the weekly data from 2017 over this graph of parallel lines, and this was really a jaw-dropping surprise. All weekly performance data for the whole of 2017 (and 2018) were moving parallel to one of the constant success lines. This meant that all the different improvements and ideas that were tried during the whole year were either improving the utilization but in parallel reducing the quality or improving the quality but reducing the utilization – sliding up and down along a line of a constant number of success (see attached graph).

This is a clear case of a policy constraint – there is no physical law forcing the process to move along that single line (well, two lines actually) but there is something that forces the company to stay there. As long as the policies keep the operation on this one (two) lines, this will look exactly the same as a physical constraint.

This is about the most we can achieve with data anylysis. The job is not yet done – the most important step is now for the local team to identify the policy constraint and to move the company towards changing the mode they operate from sliding in parallel to the constant line  to a mode where they move perpendicular to the lines. We can provide the data, the models and the graphs but now we need passion, convincing power and commitment –  and this the way data mining can actually deliver on the hype. In the end it is about people able and willing to change the way a company operates and about the company  empowering them to investigate, draw conclusions and implement the right changes.  so, business as usual in the process improvement world.Historical 2017 with weeks

 

Theory of Constraints meets Big Data

The theory of constraints is the oldest and probably the simplest (and most logical) of the great process optimization methodologies. One must also add that it is probably the most difficult to sell nowadays as everybody already heard about it and is also convinced that for their particular operation it is not applicable. Most often we hear the remark, “we have dynamic constraints”, meaning that the constraint is randomly moving from one place in the process to the other . Given that the ToC postulates one fixed constraint in any process clearly the method is not applicable to such complex operations.  This is an easily refutable argument though it undoubtedly points to a missing link in the original theory : if there is too much random variation in the process steps, this variation will generate fake bottlenecks in the process, such that they seem to move unpredictably from one part of the process to the other. Obviously, we need a more standardized process with less variation in the steps to even recognize, where the true bottleneck is, and this leads us directly to Lean with its emphasis on Mura reduction (no typo, Mura is the excessive variation in the process, that is recognized just as bad as it’s better known counterpart Muda). This probably eliminates or at least reduces the need to directly apply the theory of constraints as a first step.

There are other situations as well. Recently I was working for a large utilities company in a project where they need to gain access to their customer’s homes to   execute an upgrade in a meter, which is a legal obligation of the company prescribed by law. So, the process starts with convincing customers to grant access to their site and actually be present during the upgrade, allocate the job to an operator with sufficient technical knowledge to execute the upgrade, get the operator to the site on time and to execute the necessary work. There is a lot of locality and time based variation in this process  – different regions have different demographics that react differently to the request for access and also people tend to be more willing to grant access to the operator outside the working hours, but not too late in the day and so on.

 

On the other hand this process looks like a textbook example of the Theory of Constraints : we have a clear goal defined by the law, to upgrade X amount of meters in two y

ears. Given a clear goal, the next question will be, what is keeping us from reaching this goal? Whatever we identify here, will be our bottleneck and once the bottleneck is identified we can apply the famous 5 improvement steps of the ToC,

1. Identify the constraint

2. Exploit the constraint

3. Subordinate all processes to the constraint

4. Elevate the constraint

5. Go back to step 1

In a traditional, very much silo-based, organization steps 1-3 would already be very valuable. By observing the processes in their actual state we already saw, that each silo was working hard on improving their part of the process. We literally had tens of uncoordinated improvement initiatives per silo, all trying their best to move closer to the goal. The problem with this understandable approach  is nicely summarized in the ToC principle: any improvement at a non-constraint is nothing but an illusion.  As long as we do not know where the bottleneck is, running around starting improvement projects will be a satisfying but vain activity. It is clearly a difficult message to send concerned managers, that their efforts are mostly generating illusions, but I believe this is a necessary first step in getting to a culture of process (as opposed to silo) management.

The obvious first requirement, then, is to find the bottleneck. In a production environment we would most probably start with a standardization initiative to eliminate the Mura, to clear the smoke-screen that does not allow us to see. But what can we do in a geographically, organizationally diverse, huge organization? In this case our lucky break was that the organization already collected huge amounts of data – and this is where my second theme “big data” comes in.  One of the advantages of having a lot of data points – several hundreds per region per month – is that smaller individual random variations will be evened out and even in the presence of Mura we might be able to see the most important patterns.

In this case the first basic question was: “do we have a bottleneck”? this might seem funny to someone steeped in ToC but in practice, people need positive proof that a bottleneck exists in their process – or, to put it differently, that the ToC concepts are applicable. Having a large and varied dataset we could start with several steps of exploratory data analysis to find the signature of the bottleneck. Exploratory data analysis means that we run through many cycles of looking at the process in detail, set up a hypothesis, try to find proof of the hypothesis and repeat the cycle. The proof is at the beginning mostly of graphical nature – in short, we try to find a representation that tells the story in an easy to interpret way, without worrying too much about statistical significance.

In order to run these cycles there are a few pre-requisites in terms of people and tools. We need some team members who know the processes deeply and are not caught in the traditional silo-thinking. They should also be open and able to interpret and translate the graphs for the benefit of others. We also need at least one team member who can handle the data analysis part – has a good knowledge of the different graphical possibilities and has experience with telling a story through data. And finally we need the right tools to do the work.

In terms of tools I have found that Excel is singularly ill-suited to this task – it really handles several hundred thousands of lines badly (loading, saving, searching all take ages) and the graphical capabilities are poor and difficult to do. In working on a task like this I will use R with the “tidyverse” library and of course the ggplot2 graphical library. This is a very handy and fast environment – using pipes with a few well chosen filtering and processing functions and directing the data output directly to the ggplot graphics system allows the generation of hypothesis and publication quality graphs on the fly during a discussion with the process experts. It does have its charm to have the process expert announce a hypothesis and to have a high quality graph to show the hypothesis within one two minutes of the announcement. It is also the only practical way to proceed in such a case.

Most of the hypothesis and graphs end on the dung-heap of history, but some will not. They will become the proofs that we do have a bottleneck, and bring us closer to identifying it. Once we are close enough we can take the second step in the exploratory data analysis and complete a first CRISP-DM cycle (https://en.wikipedia.org/wiki/Cross-industry_standard_process_for_data_mining) by building a statistical model and generate predictions. If we are lucky, our predictions will overestimate our performance in terms of the goal – thus pointing towards a limiting factor (aka bottleneck) because we achieve LESS than what would be expected based on the model. Once here, we try some new, more concrete hypothesis, generate new graphs and models and see how close we get to the bottleneck.

So, where are we in real life today? In this concrete example we are through the first cycle and our latest model, though overoptimistic, will predict the performance towards the goal up to -10%. We are at the second iteration now, trying to find the last element of the puzzle to give us the full picture – and of course we already have a few hypothesis.

In conclusion – I think that the oldest and most venerable process optimization methodology might get a new infusion of life by adapting the most modern and up-to-date one. This is a development to watch out for and I will definitely keep my fingers crossed.

Rplot01

Rapid Action Workouts – Lean for NGOs

There are situations where we would like to improve a process but we do not have the luxury of working inside a full-fledged lean initiative. This means, most of all, that we can not build on previous trainings, lean awareness and a changing culture that the teams know about. Also, in these cases, the expectation is to achieve rapid successes , as the effort of getting the teams together can not be justified by long-term positive evolution. In short, the activity has to pay for itself.

 

In my experience, these situations can arise in two ways – either there is a simple need to improve one process in the organization, and they have not (yet) had thoughts about a long-term improvement initiative or the exercise is meant by a promoter of a lean initiative as an appetizer to convince the organization to start a deeper and more serious lean initiative. Either way, it is important to be successful in the allocated short time.

To respond to this need we at ifss  (www.ifss.net) developed a rapid process improvement methodology. The methodology is addressing several of the constraints we see for this scenario:

  1. The teams are not trained, indeed, not even aware of the lean way of thinking and solving problems
  2. The costs of the workshop need to be minimal, so the action needs to be fast

Our idea is to select the minimal effective subset of the lean toolset. Each day starts with a short training (short meaning a maximum of 1 hour) only focusing on the lean tools that will be needed for the day. The rest of the day will be spent on applying the tools the team learned on that day, to the problem that needs to be solved. The whole day the team has access to the coach, but they will have to apply the tools themselves. At the end of the day results will be summarized and the roadmap for the next day will be discussed.

Of course, for this to work, problem selection and expectation management are key. As such, the coach has to work with the organization to understand the problem before the RAW and to help the organization select an appropriate problem.  It would be totally disrespectful to assume that we, as lean coaches, can solve any organizational problem within a workshop of 4 days, but in most cases we can suggest improvements, achieve team buy-in and design a roadmap to be followed. Thus, we must work with the organization to define a problem, where this improvement justifies the effort of organizing the workshop. In optimal cases we do have the required tools to help them with, Intermediate Objectives Maps or Priorization Matrices, to just name a few. Nevertheless, the ultimate decision, and most important one at that, is the responsability of the target organization in the end.

The second step the coach needs to do is to select the right tools for the RAW workshop, This can be, in theory, different for each client and problem. In practice we have a set of tools that can be utilized well in many different situations – SIPOC, Process Mapping, Root Cause Analysis , Future State Definition, Risk Analysis and Improvement Plan will (in this order) generally work. I put the methods in uppercase, much like chapter titles, because we have a fair number of different methods for each “chapter” and the coach will have to pick the one that is best suited to the problem and team.

E.g. for Root Cause Analysis the coach might pick an Ishikawa diagram if she judges the causes to be simple (and uncontroversial) or dig deep with an Apollo Chart if the contrary. Of course the training for the day the team starts to apply the tool will have to be adapted, based on the choice the coach made.

Because we generally do not get to finish all the actions and we definitely aim for a sustained improvement effort  I will always discuss PDCAs as well – and make sure that the teams defines a rhythm in which the PDCA cycles will be performed and presented to the local management.

This is all nice in theory, but does it really work? I had the privilege to work for several years with two NGOs in improving processes in Africa and recently in the Middle East. The constraints I mentioned above apply very strongly for them and I found that this approach of combining minimal training with process improvement work met with enthusiastic support and was successful. So, hopefully we will be able to work and refine this approach further in the future .

Learning from a Line simulation

In this installment I would like to describe a first set of small experiments we can make using our simulation program, and show what we can learn out of it. This is by no means an exhaustive list – it serves to encourage you to try it out for yourselves.

First a quick description of the program: in order to make the effects more visible we made some changes to the original dice game by making the difference between the high and low variation stronger. Now, in the simulation, each workstation will process up to 50 pieces per round, the actual processed number being uniformly distributed between 1 and 50 for the high variance case and between 24 and 27 in the low variance case.

The average number of processed parts and the variance is, using the formulae for the uniform distribution,   mean=(a+b)/2 and Variance=(b-a)^2/12 where a and b are the endpoints of the uniform distribution.  This will give us for the high variance case a mean of (1+50)/2 = 25.5 and for the low variance case a mean of (24+27)/2=25.5 . The variances however will differ – in the first case we will have V=(50-1)^2/12  = 200 for the high and (27-24)^2/12=0.7  for the low variance case.

The simulation has a graphical UI that is shown in the  graph. We can select the number of the workstations and the number of simulation runs in increments of 500. We can also chose the line feed strategy – that is the way the new orders come into the system. Random Feed means that they come in the same way, uniformly distributed, with high variance between 1 and 50 while Regular Mean Feed means that we feed the line with the average number of orders the line can handle 25.5 per run , with a low variance.

There is also a button called re-run which will just create a new run with the same settings as the previous run. This is for our first experiments the most important button because by re-running the same setup we can really see the influence of randomness.

We have 3 output graphs :

  1. The number of WIP for each run of the simulation. This is the number of work items that are somewhere inside the production line after the end of a run – the number of semi-finished goods.
  2. A graph describing the throughput of the line. This is a bit more tricky – in order to clearly see the differences we do not display the number of finished pieces per run but the deviation from the expected number of finished goods per run. So a value of 0 would roughly mean 25 finished pieces, a value of 20 means 25+20=45 finished pieces while a value of -20 means 5 finished pieces. While this takes a bit of getting used to, this graph shows the deviations from the expectation much more clearly then the raw numbers.
  3. The number of WIP at each workstation at the end of the simulation. This can give us an idea where we have an accumulation of semi-finished goods and this is often proposed as a way of finding the bottleneck on a line. We will soon know better.

So, let us run a first simulation! I set the number of workstations at 5 , for 1000 runs. Now, I would be interested in the effects of randomness, so I pick the worst scenario – highly variable demand and higly variable machines. The output is on the graph.

Remember, from traditional point of view this is a nicely balanced line. Still, after 1000 runs, we have almost 2000 pieces scattered around production (graph 1) and our throughput was more or less unpredictable between 20% (5 pieces) of the expected volume and 180%  (45 pieces) .Obviously we had some good and some bad days without any chance of predicting what the next day will bring. By looking at the last graph we can readily identify the bottleneck –  workstation 2 with 3 trailing quite close behind it. So, to improve the line performance we need to focus on these two, right?

Before investing resources on improvement work, press the Re-run button. You will obviously have different results, but very likely something similar will happen to what I see now – the bottleneck at 2 disappeared and this workstation now has the best performance, while workstation 1 accumulated the most WIP. Obviously the threat of starting an improvement project at their workstation worked wonders – however at workstation 1 people became complacent and need more supervision . Also, the WIP in general has evolved in a much better way, we had a long stretch of runs where the WIP actually decreased and we are now at about 1500 pieces instead of 2000 !

So, let us press Re-run again: the WIP trend is a complete mess now, steadily increasing  to about 2500 at the end of the runs and the workstation 2 started to accumulate more WIP , however the workstation 1 is still the black sheep.  Press again : workstation 1 has improved , however 2 became the bottleneck again, while our focus on the overall WIP trend payed off and we are again down at about 1500 pieces.

The one thing that did not change across all these simulations was the daily performance – it stayed just as variable in all 3 re-runs.

Of course this is all a joke – there is no way the simulation would react to the threat of an improvement project. What we see is the effect of random variations in the line – and the most visible is the random movement of the “bottleneck”.  This is the core problem of the Theory of Constraints methodology by the way – we hear very often that the a line does not have a single bottleneck, but “moving bottlenecks”. This is the effect of random variations in the line.

The second visible effect is the accumulation of WIP. We would naively expect no WIP or very little, as the line is balanced on the average. In reality the random variations cause ever larger queues at random locations in our balanced line and the prediction of the factory physics formulae is that the WIP will continuously increase as a result of these random variations.  Looking at the graphs, it does seem that the WIP is kind of levelling off and stabilizing at around 2000 pieces but this is an illusion. If you change the number of runs to 2000 and higher  you will see that the increase is continuous.

The third, even more worrying effect is the high variability of the throughput. This line, though balanced, can give no guarantee of the number of pieces produced per day (run), despite the best efforts of the production people. So, what will happen in real life? Important orders will be moved forward (expedited) while less important ones will stay in the WIP – all this will create even more randomness and increase the negative effects we already see. I think this is not unlike many situations we see in production lines today.

As we are not trained to discern random effects we will necessarily come up with explanations for purely random phenomena – like my example of the threat of running an improvement project having positive effects or people becoming complacent.  This is also something we see very often in real life.

So, how would you investigate our line further? Please feel free to try.