Designed Experiment to Re-engage Silent Customers

In the spring I had a chance to work in a project that had a very special problem. We had to convince the customers of an energy company to stay at home for a day, so that the company  can upgrade a meter in their home. The problem was special because the upgrade was mandated by government policy, but offered basically few advantages to the customers.

Obviously this a great challenge for the customer care organization – they need to contact as many customers as they can  and convince them to take a day off and wait at home for the upgrade. The organization needs to send out huge numbers of messages in the hope that enough customers will react to it. This necessarily means that we also get a great number of so called “silent customers” – people who decide to  not react to our first message in any way.

As we obviously do not have an infinite number of customers to convince, silent customers do have a great value – at least they did not say no yet. The question is, how to make them respond ? If we learn how to activate at least some of them we can use this knowledge for  the first contact message and make our communication more effective.

The problem is of a more general interest then this special project  – just think of NGOs who depend on donors. Learning how to make prospective donors more interested at the first contact has a very definite advantage for them as well.

So, how do we go about this? Coming from the Lean/Six Sigma world our first idea was to actually LEARN what is of interest to the customers. Previously there were many discussions and many hypothesis were floating around, mostly based on personal experiences and introspection. Some were already tried but none were really successful.

We changed the game by first admitting that we do not know what is of interest to our customer base – they had wildly differing demographic, age and income profiles, which did make all these discussions quite difficult.  Once we admit ignorance though (not an easy thing to do BTW) our task becomes way more simple. There is just one question left in the room: how do we learn what the customer preferences are, except the many we used to have along the lines of   “how do we interest hipsters or families with small children”? and so on.   Coming from the Lean six Sigma world there is just one answer to this question : we run a designed experiment to find out.

It is important to realze that we run the experiment to LEARN and not to improve anything. This is an error in industrial settings as well but in this project managing the expectations was even more important. However as we stuck to our goal of learning about the customer, designing the experiment became  much simpler, as we avoided useless discussions about what will be beneficial and what not. Every time an objection came up about the possible usefulness of an experimental setting we could just give our standard answer : we do not know, but if you are right it will be proven by the experiment.

As we went on designing the experiment we realized that we only needed (and were allowed to) to use two factors :  communication channels and message types. All the previously so bothersome issues of age distribution, locality and such we solved by requiring large random samples across al these factors.  Having large samples was, unlike in manufacturing, no problem at all. We could decide to send an email to a thousand customers or two thousand without any great difficulty or cost. As we were expecting weak effects anyway, having large sample sizes was essential to the success of the experiment.

Finally we decided on the following : we used two communication channels, e-mail and SMS, and three message types. One message targeted the geeks by describing how much cooler is the new meter, one targeted greens by describing how the new meters contribute  to saving the environment and one  was appealing to our natural laziness by describing how much easier it will be to read the meter. So, in the end we had a 2X3 design., two channels times three message types And this is where our problems started.

Customer contacts are different from settings on a complex machines in the sense that everybody has an opinion about them and for the machines you do not need to talk to the legal and to the marketing department before changing a setting. We had several weeks of difficult negotiations trying to convince every real or imagined stakeholder that what we intend to do will not harm the company – and at every level it would have been way easier to just give up then to trudge on . It is a tribute to the negotiation skills and commitment of our team members that we managed to actually run the experiment. I kind of think, that this political hassle is the greatest single reason why we do not see more experiments done in customer related businesses.

For 3 weeks we sent every week about 800 e-mails and about 300 SMS-es per each message type . We had several choices about how to measure the results. With the e-mails we could count how many customers  actually clicked on the link to the company web-site but for the sms-es it was only possible to see whether a customer chose to book an appointment or not. This was definitely not optimal, because the we could not directly measure the efficiency of the messages except for the emails. To put it simply the fact whether a customer clicks on the link in the message is mostly influenced by the message content while the fact whether the customer books an assignment depends on many other factors. Here is randomization helpful – with the sample sizes and randomization we could hope that these other factors statistically cancel each other so that the effect of the message will be visible if a little more dimly.

Our results were finally worth the effort. A first learning was that we had basically no-one reacting to the SMS messages. Looking back, this had a quite clear explanation – our message directed the recipient to click on a link to the company web-site and people are generally much more reluctant to open a web-site on a mobile phone than on a computer (at least that’s what I think). Fact is, our sms-es were completely unsuccessful, though more expensive than the e-mails.

On the e-mails we had a response of 3.5 – 4% for the ones appealing to the natural laziness as compared to less then 2% for the other message types. As the contacted people were silent customers, who once already decided to ignore our message, getting 4.5% of them to answer was a sizeable success.By the sample sizes, we had, proving statistical significance was a no-brainer.

The fly in the ointment was that we failed to translate these clicks to confirmed appointments – we basically had the same, very low percentage of confirmations  irrespective of channels or message types. Does this mean that our experiment failed to identify any possible improvement? At the risk of being self-defensive here, I would say that it does not. Making a binding confirmation depends on many factors outside the first priming message we were experimenting with. The content of the Web-side our customers go to, to mention just one, should be in synch with the priming message, which was not the case here. So, the experiment delivered valuable knowledge about how we can make a customer come to our web-site , but not about how to make the customer interested  in our message – and this ok.  This was exactly what we set out to investigate. As mentioned before, managing expectations is a very important element here.

What would be the next steps? Obviously we would need to set up a new experiment to investigate what factors impact the customer willingness to accept our offer. I am certain, that this is what the team will do in the next phase – after all, we learned quite a lot about our customers with a ridiculously low effort (excepting the negotiations) so why not keep on learning?

Advertisements

Theory of Constraints meets Big Data part 2

I would like to continue the story of the hunt for the constraint using a lot of historical data and the invaluable expertise of the local team.  There is a lot of hype around big data and data being the new oil – and there is also a lot of truth in this. However, I find that ultimately the success of a data mining operation will depend on the intimate process knowledge of the team . The local team will generally not have the expertise of mining the data using the appropriate tools, which is absolutely ok, given that data mining is not their daily job.  On the other hand a data specialist will be absolutely blind to the fine points of the operation of the process – so cooperation is an absolute must to achieve results  The story of our hunt for the constraint illustrates this point nicely in my opinion.

After having found proof that we have a bottleneck in the process our task was to find it or at least gain as much knowledge about the nature of the bottleneck as possible. This might seem to be an easy task for hardcore ToC practitioners in manufacturing, where the constraint is generally a process step or even a physical entity, such as a machine. In our process of 4 different regions, about 100 engineers per regions, intricate long and short term planning and erratic customer behaviour, little of the known methods to find the bottleneck seemed to be relevant.  For starters, there was no shop-floor we could have visited and no WIP laying around giving us clues about the location of the bottleneck. The behaviour of all regions seemed to be quite similar which pointed us in the direction of a systematic or policy constraint . I have read much about those, but a procedure how to identify one was sorely missing from my reading list.

So, we went back to our standard behaviour in process improvements : “when you do not know what to do learn more about the process”.  A hard-core lean practitioner would have instructed us to go Gemba, which, I have no doubt, would have provided us with adequate knowledge in time. But we did not have enough time, so our idea was to learn more about the process by building a model of it. This is nicely in line with the CRISP-DM methodology and it was also our only possibility given the short time period we had to complete the job.

The idea (or maybe I should call it a bet) was to build a well-behaved statistical model of the installation process and then check the residuals. If we have a constraint, we shall either be able to identify it with the model or (even better) we shall observe that the actual numbers are always below the model predictions and thus we can pinpoint where and how the bottleneck manifests itself.

Using the tidyverse (https://www.tidyverse.org/) packages from R  it was easy to summarize the daily data to weekly averages. Then, taking the simplest approach, we built a linear regression model. After some tweeking and adjusting we came up with a model that had an amazing 96.5% R-squared adjusted value, with 4 variables. Such high R-squared values are in fact more of a bad news in themselves – they are an almost certain sign of overfitting, that is, that our model is tracking the data  too faithfully, incorporating even random fluctuations into the model. To test that we used the model to predict the number of successful installs of Q1 2018. If we overfitted the 2017 data then the 2018  predictions should be off the mark – god knows, there was enough random fluctuation in 2017 to lead the model astray.

But we were lucky – our predictions fit the new data to within +/- 5% . This meant, that the fundamental process did not change between 2017 and 2018 and also that our model was good enough to be investigated for the bottleneck.    Looking at the variables we used we saw that we had two that had a large impact and were process related  –  the average number of jobs an operator will be given per week and the percentage of cases where an operator was given access to the meter by the customer . The first was a thinly disguised measure of the utilisation of our capacity and the other a measure of the quality of our “raw material” – the customers. Looking at this with a process eye, we found a less then earth-shaking conclusion – for a high success rate we need a high utilisation and high quality raw materials.

Looking at the model in more detail we found another consequence – there were many different combinations of these two parameters that led to the same number of successes:  low utilisation combined with high quality was just as successful as high utilization combined with much lower quality. If we plotted the contour lines of equal number of successes then we got, unsurprisingly, a number of parallel straight lines moving from the lower left corner to the upper right corner of the graph.  This delivered the message, again, not an earth-shaking discovery, that in order to increase the number of successes we need to increase the utilisation AND the quality in the same time.

To me, the surprise came when we plotted the weekly data from 2017 over this graph of parallel lines, and this was really a jaw-dropping surprise. All weekly performance data for the whole of 2017 (and 2018) were moving parallel to one of the constant success lines. This meant that all the different improvements and ideas that were tried during the whole year were either improving the utilization but in parallel reducing the quality or improving the quality but reducing the utilization – sliding up and down along a line of a constant number of success (see attached graph).

This is a clear case of a policy constraint – there is no physical law forcing the process to move along that single line (well, two lines actually) but there is something that forces the company to stay there. As long as the policies keep the operation on this one (two) lines, this will look exactly the same as a physical constraint.

This is about the most we can achieve with data anylysis. The job is not yet done – the most important step is now for the local team to identify the policy constraint and to move the company towards changing the mode they operate from sliding in parallel to the constant line  to a mode where they move perpendicular to the lines. We can provide the data, the models and the graphs but now we need passion, convincing power and commitment –  and this the way data mining can actually deliver on the hype. In the end it is about people able and willing to change the way a company operates and about the company  empowering them to investigate, draw conclusions and implement the right changes.  so, business as usual in the process improvement world.Historical 2017 with weeks

 

Theory of Constraints meets Big Data

The theory of constraints is the oldest and probably the simplest (and most logical) of the great process optimization methodologies. One must also add that it is probably the most difficult to sell nowadays as everybody already heard about it and is also convinced that for their particular operation it is not applicable. Most often we hear the remark, “we have dynamic constraints”, meaning that the constraint is randomly moving from one place in the process to the other . Given that the ToC postulates one fixed constraint in any process clearly the method is not applicable to such complex operations.  This is an easily refutable argument though it undoubtedly points to a missing link in the original theory : if there is too much random variation in the process steps, this variation will generate fake bottlenecks in the process, such that they seem to move unpredictably from one part of the process to the other. Obviously, we need a more standardized process with less variation in the steps to even recognize, where the true bottleneck is, and this leads us directly to Lean with its emphasis on Mura reduction (no typo, Mura is the excessive variation in the process, that is recognized just as bad as it’s better known counterpart Muda). This probably eliminates or at least reduces the need to directly apply the theory of constraints as a first step.

There are other situations as well. Recently I was working for a large utilities company in a project where they need to gain access to their customer’s homes to   execute an upgrade in a meter, which is a legal obligation of the company prescribed by law. So, the process starts with convincing customers to grant access to their site and actually be present during the upgrade, allocate the job to an operator with sufficient technical knowledge to execute the upgrade, get the operator to the site on time and to execute the necessary work. There is a lot of locality and time based variation in this process  – different regions have different demographics that react differently to the request for access and also people tend to be more willing to grant access to the operator outside the working hours, but not too late in the day and so on.

 

On the other hand this process looks like a textbook example of the Theory of Constraints : we have a clear goal defined by the law, to upgrade X amount of meters in two y

ears. Given a clear goal, the next question will be, what is keeping us from reaching this goal? Whatever we identify here, will be our bottleneck and once the bottleneck is identified we can apply the famous 5 improvement steps of the ToC,

1. Identify the constraint

2. Exploit the constraint

3. Subordinate all processes to the constraint

4. Elevate the constraint

5. Go back to step 1

In a traditional, very much silo-based, organization steps 1-3 would already be very valuable. By observing the processes in their actual state we already saw, that each silo was working hard on improving their part of the process. We literally had tens of uncoordinated improvement initiatives per silo, all trying their best to move closer to the goal. The problem with this understandable approach  is nicely summarized in the ToC principle: any improvement at a non-constraint is nothing but an illusion.  As long as we do not know where the bottleneck is, running around starting improvement projects will be a satisfying but vain activity. It is clearly a difficult message to send concerned managers, that their efforts are mostly generating illusions, but I believe this is a necessary first step in getting to a culture of process (as opposed to silo) management.

The obvious first requirement, then, is to find the bottleneck. In a production environment we would most probably start with a standardization initiative to eliminate the Mura, to clear the smoke-screen that does not allow us to see. But what can we do in a geographically, organizationally diverse, huge organization? In this case our lucky break was that the organization already collected huge amounts of data – and this is where my second theme “big data” comes in.  One of the advantages of having a lot of data points – several hundreds per region per month – is that smaller individual random variations will be evened out and even in the presence of Mura we might be able to see the most important patterns.

In this case the first basic question was: “do we have a bottleneck”? this might seem funny to someone steeped in ToC but in practice, people need positive proof that a bottleneck exists in their process – or, to put it differently, that the ToC concepts are applicable. Having a large and varied dataset we could start with several steps of exploratory data analysis to find the signature of the bottleneck. Exploratory data analysis means that we run through many cycles of looking at the process in detail, set up a hypothesis, try to find proof of the hypothesis and repeat the cycle. The proof is at the beginning mostly of graphical nature – in short, we try to find a representation that tells the story in an easy to interpret way, without worrying too much about statistical significance.

In order to run these cycles there are a few pre-requisites in terms of people and tools. We need some team members who know the processes deeply and are not caught in the traditional silo-thinking. They should also be open and able to interpret and translate the graphs for the benefit of others. We also need at least one team member who can handle the data analysis part – has a good knowledge of the different graphical possibilities and has experience with telling a story through data. And finally we need the right tools to do the work.

In terms of tools I have found that Excel is singularly ill-suited to this task – it really handles several hundred thousands of lines badly (loading, saving, searching all take ages) and the graphical capabilities are poor and difficult to do. In working on a task like this I will use R with the “tidyverse” library and of course the ggplot2 graphical library. This is a very handy and fast environment – using pipes with a few well chosen filtering and processing functions and directing the data output directly to the ggplot graphics system allows the generation of hypothesis and publication quality graphs on the fly during a discussion with the process experts. It does have its charm to have the process expert announce a hypothesis and to have a high quality graph to show the hypothesis within one two minutes of the announcement. It is also the only practical way to proceed in such a case.

Most of the hypothesis and graphs end on the dung-heap of history, but some will not. They will become the proofs that we do have a bottleneck, and bring us closer to identifying it. Once we are close enough we can take the second step in the exploratory data analysis and complete a first CRISP-DM cycle (https://en.wikipedia.org/wiki/Cross-industry_standard_process_for_data_mining) by building a statistical model and generate predictions. If we are lucky, our predictions will overestimate our performance in terms of the goal – thus pointing towards a limiting factor (aka bottleneck) because we achieve LESS than what would be expected based on the model. Once here, we try some new, more concrete hypothesis, generate new graphs and models and see how close we get to the bottleneck.

So, where are we in real life today? In this concrete example we are through the first cycle and our latest model, though overoptimistic, will predict the performance towards the goal up to -10%. We are at the second iteration now, trying to find the last element of the puzzle to give us the full picture – and of course we already have a few hypothesis.

In conclusion – I think that the oldest and most venerable process optimization methodology might get a new infusion of life by adapting the most modern and up-to-date one. This is a development to watch out for and I will definitely keep my fingers crossed.

Rplot01

Rapid Action Workouts – Lean for NGOs

There are situations where we would like to improve a process but we do not have the luxury of working inside a full-fledged lean initiative. This means, most of all, that we can not build on previous trainings, lean awareness and a changing culture that the teams know about. Also, in these cases, the expectation is to achieve rapid successes , as the effort of getting the teams together can not be justified by long-term positive evolution. In short, the activity has to pay for itself.

 

In my experience, these situations can arise in two ways – either there is a simple need to improve one process in the organization, and they have not (yet) had thoughts about a long-term improvement initiative or the exercise is meant by a promoter of a lean initiative as an appetizer to convince the organization to start a deeper and more serious lean initiative. Either way, it is important to be successful in the allocated short time.

To respond to this need we at ifss  (www.ifss.net) developed a rapid process improvement methodology. The methodology is addressing several of the constraints we see for this scenario:

  1. The teams are not trained, indeed, not even aware of the lean way of thinking and solving problems
  2. The costs of the workshop need to be minimal, so the action needs to be fast

Our idea is to select the minimal effective subset of the lean toolset. Each day starts with a short training (short meaning a maximum of 1 hour) only focusing on the lean tools that will be needed for the day. The rest of the day will be spent on applying the tools the team learned on that day, to the problem that needs to be solved. The whole day the team has access to the coach, but they will have to apply the tools themselves. At the end of the day results will be summarized and the roadmap for the next day will be discussed.

Of course, for this to work, problem selection and expectation management are key. As such, the coach has to work with the organization to understand the problem before the RAW and to help the organization select an appropriate problem.  It would be totally disrespectful to assume that we, as lean coaches, can solve any organizational problem within a workshop of 4 days, but in most cases we can suggest improvements, achieve team buy-in and design a roadmap to be followed. Thus, we must work with the organization to define a problem, where this improvement justifies the effort of organizing the workshop. In optimal cases we do have the required tools to help them with, Intermediate Objectives Maps or Priorization Matrices, to just name a few. Nevertheless, the ultimate decision, and most important one at that, is the responsability of the target organization in the end.

The second step the coach needs to do is to select the right tools for the RAW workshop, This can be, in theory, different for each client and problem. In practice we have a set of tools that can be utilized well in many different situations – SIPOC, Process Mapping, Root Cause Analysis , Future State Definition, Risk Analysis and Improvement Plan will (in this order) generally work. I put the methods in uppercase, much like chapter titles, because we have a fair number of different methods for each “chapter” and the coach will have to pick the one that is best suited to the problem and team.

E.g. for Root Cause Analysis the coach might pick an Ishikawa diagram if she judges the causes to be simple (and uncontroversial) or dig deep with an Apollo Chart if the contrary. Of course the training for the day the team starts to apply the tool will have to be adapted, based on the choice the coach made.

Because we generally do not get to finish all the actions and we definitely aim for a sustained improvement effort  I will always discuss PDCAs as well – and make sure that the teams defines a rhythm in which the PDCA cycles will be performed and presented to the local management.

This is all nice in theory, but does it really work? I had the privilege to work for several years with two NGOs in improving processes in Africa and recently in the Middle East. The constraints I mentioned above apply very strongly for them and I found that this approach of combining minimal training with process improvement work met with enthusiastic support and was successful. So, hopefully we will be able to work and refine this approach further in the future .

Learning from a Line simulation

In this installment I would like to describe a first set of small experiments we can make using our simulation program, and show what we can learn out of it. This is by no means an exhaustive list – it serves to encourage you to try it out for yourselves.

First a quick description of the program: in order to make the effects more visible we made some changes to the original dice game by making the difference between the high and low variation stronger. Now, in the simulation, each workstation will process up to 50 pieces per round, the actual processed number being uniformly distributed between 1 and 50 for the high variance case and between 24 and 27 in the low variance case.

The average number of processed parts and the variance is, using the formulae for the uniform distribution,   mean=(a+b)/2 and Variance=(b-a)^2/12 where a and b are the endpoints of the uniform distribution.  This will give us for the high variance case a mean of (1+50)/2 = 25.5 and for the low variance case a mean of (24+27)/2=25.5 . The variances however will differ – in the first case we will have V=(50-1)^2/12  = 200 for the high and (27-24)^2/12=0.7  for the low variance case.

The simulation has a graphical UI that is shown in the  graph. We can select the number of the workstations and the number of simulation runs in increments of 500. We can also chose the line feed strategy – that is the way the new orders come into the system. Random Feed means that they come in the same way, uniformly distributed, with high variance between 1 and 50 while Regular Mean Feed means that we feed the line with the average number of orders the line can handle 25.5 per run , with a low variance.

There is also a button called re-run which will just create a new run with the same settings as the previous run. This is for our first experiments the most important button because by re-running the same setup we can really see the influence of randomness.

We have 3 output graphs :

  1. The number of WIP for each run of the simulation. This is the number of work items that are somewhere inside the production line after the end of a run – the number of semi-finished goods.
  2. A graph describing the throughput of the line. This is a bit more tricky – in order to clearly see the differences we do not display the number of finished pieces per run but the deviation from the expected number of finished goods per run. So a value of 0 would roughly mean 25 finished pieces, a value of 20 means 25+20=45 finished pieces while a value of -20 means 5 finished pieces. While this takes a bit of getting used to, this graph shows the deviations from the expectation much more clearly then the raw numbers.
  3. The number of WIP at each workstation at the end of the simulation. This can give us an idea where we have an accumulation of semi-finished goods and this is often proposed as a way of finding the bottleneck on a line. We will soon know better.

So, let us run a first simulation! I set the number of workstations at 5 , for 1000 runs. Now, I would be interested in the effects of randomness, so I pick the worst scenario – highly variable demand and higly variable machines. The output is on the graph.

Remember, from traditional point of view this is a nicely balanced line. Still, after 1000 runs, we have almost 2000 pieces scattered around production (graph 1) and our throughput was more or less unpredictable between 20% (5 pieces) of the expected volume and 180%  (45 pieces) .Obviously we had some good and some bad days without any chance of predicting what the next day will bring. By looking at the last graph we can readily identify the bottleneck –  workstation 2 with 3 trailing quite close behind it. So, to improve the line performance we need to focus on these two, right?

Before investing resources on improvement work, press the Re-run button. You will obviously have different results, but very likely something similar will happen to what I see now – the bottleneck at 2 disappeared and this workstation now has the best performance, while workstation 1 accumulated the most WIP. Obviously the threat of starting an improvement project at their workstation worked wonders – however at workstation 1 people became complacent and need more supervision . Also, the WIP in general has evolved in a much better way, we had a long stretch of runs where the WIP actually decreased and we are now at about 1500 pieces instead of 2000 !

So, let us press Re-run again: the WIP trend is a complete mess now, steadily increasing  to about 2500 at the end of the runs and the workstation 2 started to accumulate more WIP , however the workstation 1 is still the black sheep.  Press again : workstation 1 has improved , however 2 became the bottleneck again, while our focus on the overall WIP trend payed off and we are again down at about 1500 pieces.

The one thing that did not change across all these simulations was the daily performance – it stayed just as variable in all 3 re-runs.

Of course this is all a joke – there is no way the simulation would react to the threat of an improvement project. What we see is the effect of random variations in the line – and the most visible is the random movement of the “bottleneck”.  This is the core problem of the Theory of Constraints methodology by the way – we hear very often that the a line does not have a single bottleneck, but “moving bottlenecks”. This is the effect of random variations in the line.

The second visible effect is the accumulation of WIP. We would naively expect no WIP or very little, as the line is balanced on the average. In reality the random variations cause ever larger queues at random locations in our balanced line and the prediction of the factory physics formulae is that the WIP will continuously increase as a result of these random variations.  Looking at the graphs, it does seem that the WIP is kind of levelling off and stabilizing at around 2000 pieces but this is an illusion. If you change the number of runs to 2000 and higher  you will see that the increase is continuous.

The third, even more worrying effect is the high variability of the throughput. This line, though balanced, can give no guarantee of the number of pieces produced per day (run), despite the best efforts of the production people. So, what will happen in real life? Important orders will be moved forward (expedited) while less important ones will stay in the WIP – all this will create even more randomness and increase the negative effects we already see. I think this is not unlike many situations we see in production lines today.

As we are not trained to discern random effects we will necessarily come up with explanations for purely random phenomena – like my example of the threat of running an improvement project having positive effects or people becoming complacent.  This is also something we see very often in real life.

So, how would you investigate our line further? Please feel free to try.

A Simulation in Lean Says More Than 1000 Words

 

In general, we are sadly missing an intuitive understanding of random processes. Several authors wrote about this blind spots in the last years – Taleb in Fooled by Randomness

 

(https://www.amazon.com/Fooled-Randomness-Hidden-Markets-Incerto/dp/0812975219/ref=sr_1_1?ie=UTF8&qid=1503991947&sr=8-1&keywords=fooled+by+randomness)

or Sam L. Savage in the Flaw of Averages – just to name a few.

(https://www.amazon.com/Flaw-Averages-Underestimate-Risk-Uncertainty/dp/1118073754/ref=sr_1_1?s=books&ie=UTF8&qid=1503992030&sr=1-1&keywords=flaw+of+averages)

To my mind, the basic problem is that we rarely have the opportunity to experience long stretches of random phenomena. Of course we all saw dice being thrown and the stock market fluctuate – but this is not enough to develop an intuition about random events. To do this, we would need to observe and record long stretches of the phenomenon – and I would have yet  to see the person, who spent hours and hours throwing a balanced die and recording the results – not to say, analyzing the them as well . (Well, there was a truly renaissance personality called Cardano, who did this in the XVIth century, and wrote a book about it, but this was more then unusual).

 

Studying and improving processes is a field where we really miss an intuitive understanding of how random variations influence the behavior of the system. In principle Six Sigma acknowledged the role of random variations but, sadly, not when looking at process mappings. Lean followed in the same steps, simply formulating the request to reduce the variation as far as possible in order to achieve a uniform flow. So, in the end, some practical questions are left unanswered in both methodologies, like: How much variation is acceptable? What are precisely the negative effects of randomness in our concrete process? How could we make a cost-benefit analysis of the variation reduction?

To answer these questions, we would need to understand the effects of randomness on a process. To my knowledge there is one discipline that really attempted to do this: Factory Physics  (https://en.wikipedia.org/wiki/Factory_Physics)  . In this theory we apply queuing theory to the manufacturing processes and the result is that we can come up with a formula that will precisely describe the effects of random variations on a manufacturing operation., the (in)famous Pollaczek-Khinchine equation eventually combined with Little’s Law.

Now, if you are the type of person, who was trained in reading and interpreting equations (like physicists are, no surprise here) then all you would need is to look at this equation to understand the effects of variation on a value stream. Unfortunately, very few people can do that, so, for the rest of us the equations provide a tantalizing hint of what is going on, but fail to deliver the kind of intuitive understanding we would find useful. To achieve that we would need to be able to observe and track processes with random variations over a long period of time to see what is going on.

This is generally not feasible in real life, due to constraints of costs and time. The processes are rarely constant over a long enough period, so that we could observe their behavior at leisure. Changing some parameters, like the average processing times, or the standard deviation of the processing time at a given step, just to be able to observe the effects of a such a change, is of course impossible. So, we need a random system that is cheap and fast to change and  also easy to observe.

This is why in the 80-ties Goldratt invented the Dice Game – an abstract model of a processing line, where the only influencing factor was the randomness of the processing times. The game is played as follows: 5-8 people sit at a table, each with a six sided die. There are matchsticks to the right and to the left of each person, one person’s right matchstick heap being the left heap of the next person. There is also a “feeder” – someone who is providing new matchsticks at the input of the process. At the sign of the moderator, each person throws the die and moves as many matchsticks from their left to the right as the count of points on their die. If they do not have as many sticks to their left, (say they have 3 sticks and the die says 5) they move as many sticks as they can.  The feeder will input regularly 3 then 4 then 3 again to the process at the beginning of each cycle.

 

The operation is then repeated, for as many times as the moderator sees fit – generally not more than 20 cycles to avoid boring people to death.  Then the exercise will be repeated, but instead of a die a coin is tossed. The rule is that heads mean moving 3 sticks, tails mean moving 4.

 

Now, consider the following: in both cases the average performance of each workstation will be 3.5 sticks per cycle, that is, the line is perfectly balanced in both cases. Before reading on, think about the following questions:

  1. Would you expect a difference in these two cases (die vs. coin).
  2. Do you expect a difference in the throughput (finished products per cycle) between the two cases?
  3. Where do you expect to see the bottleneck?
  4. If each step can still move 3.5 pieces on the average, how would you improve the process?

 

As we are missing the intuitive understanding of random processes, it is not easy to answer these questions. Getting enough people together to play the dice game is not as easy either, especially if we want to see the longer term effects – no one will waste time to play this for several hundreds of cycles, right? Still, understanding the behaviot´r of the Dice Game is an important step towards understanding random variation in a value stream.

 

The solution is to write a computer simulation with a comfy user interface and let people simulate the system. In this case the individual steps are fairly simple so writing this simulation is not a complicated job, adding a user interface is also pretty much a standard using R and the Shiny interface. This way we can play several thousands of rounds, without getting bored, we can measure the results, change the parameters or simply repeat the runs several times, to see how the system behaves. There is no better way to develop our intuition, and in my experience, there wasn’t a singleperson who was not surprised by some aspects of what they saw.

 

As is happens, we, at ifss,  developed a such a simulation and it can be accessed by anyone via the Internet. It is not up all the time, but if anybody would like to figure out the answers to the above questions, we would be happy to start the simulation. Just send a message, some time in advance, and we shall find an opportunity to run activate the simulation,

So, feel free to explore and to start developing your intuition about random processes. We would be happy to hear from you, and if you try the simulated VSM Game, sharing what you experienced would be even nicer.

The Joy of Principal Component Analysis

Making sense of  a large pile of data, as in several hundred measurements of 5 to 10 different characteristics of a product, is a dream of a data analyst come true. In manufacturing environments, especially in the chemical industry, it often happens that we measure a wealth of data on each finished product batch . Some of these measurements are important as they might directly relate to customer requirements, some are “informative” only and the goal of collecting all the data is to have an idea of what is happening in the production process. This is the theory, at least. Of course, life is always a bit more complicated, and many operations will simply be overwhelmed by the amount of data without having the adequate tools to make sense of it. So, the data will be faithfully collected, after all it is in the SOP, but it will not be used to draw conclusions from it.

What I would like to show, is a method, to actually use this wealth of information. The method is by no means new but it definitely failed to find its way into the manufacturing environment. I still recall the first reaction of a colleague when I showed it to him: “oh, this is for PhD work” he said – meaning more or less “this is way too complex and practically useless”.

The method is called principal component analysis. The idea is to generate artificial variables out of the set of real measured ones in a way that the new variables capture the most of the variation present in the data set. To be more precise, we generate linear combinations of the original variables in a way that the first linear combination captures the most of the total variation, the second linear combination captures the most of the residual variation and so on.    If there is some systematic trend in the data, the hope is that the first few artificial variables (called principal components) will capture this systematic pattern and the latecomers will only catch random variation in the data. Then, we can restrict our analysis to the first few principal components and have cleansed our data of most of the random variation.

To make this more clear imagine a room in which we have temperature sensors in each of the 8 corners. We will have 8 sets of measurements that are somewhat correlated but also vary independently of each other. Running a PCA (principal component analysis) will yield a first principal component that will be the average of the 8 measurements – as obviously the average temperature in the room captures most of the variation of the room temperature (just think winter vs. summer, night vs, day). The second principal component will capture the difference between the averages of the top 4 sensors and the averages of the bottom 4 sensors (it is generally warmer on top, right?) , the third component the difference between the side where we have the windows and the side that has no windows….and the rest probably only meaningless measurement variation and possibly some complex but small differences due to the patterns of air currents in the room.

What did we gain? Instead of the original 8 values we only need to analyze 3 – and we will gain a much better insight into what is going on. We did not reduce the number of measurements though -we still need all the sensors to construct our principal components.

What did we lose ? As my colleague put it – this is PhD work , so we lose a bit of transparency. It is relatively easy to explain a trend in a sensor on the top of the room , but a trend in the first principal component sounds like academic gibberish. So, we will have learn to explain the results and to relate them to real life experiences. This is not as difficult as it sounds, but definitely needs to be carefully prepared and the message tailored separately in each practical case.

Now to close with a practical example :  I just had the good luck to run into a problem at a customer I work for. They measured 5 components of a product and had the respective data from 2014 onward. All data points represented product batches that were accepted by the QA and delivered to the customers, as they were all in spec. The problem was that the customers still complained of highly variable quality and especially of variations in quality over time.

Looking at five control charts, all consisting of quite skewed non-normal data, basically just increased the confusion. There were regularly points that were out of control, but did we have more of them in 2015 then in 2014? And if yes, was the difference significant? No one could tell.

So, I proposed to analyze the data with PCA.

In R one easy way to do this is by using the prcomp function in the following way:

Results=prcomp(InputDataFrame, scale=TRUE)

Here obviously the InputDataFrame contains the input data – that is, a table where each column contains the values of one specific measurement of the product (like viscosity or pH or conductivity etc.) and each line is one set of measurements for one batch of the product. The scale = TRUE options asks the function to standardize the variables before the analysis.  This makes sense in general but especially when the different measurements are on different scales (e.g. the conductivity could be a number between 500 and 1000 whereas the pH is less then 10). If they are not scaled, then the variables with greater values will dominate the analysis,  something we want to avoid.

The Results object will contain all the outcomes of the PCA analysis. The first component sdev is a list of the standard deviations captured by each principal component. The first value is the greatest and the rest will follow in decreasing order. From these values we can generate the so called “scree plot” which shows us how much of the total variance is captured by the first, second etc. principal component. The name comes from the image of a steep mountainside filled with rubble at the bottom – scree. The steeper the descent, the more variation is captured by the first few principal components – that is, the clearer the discrimination between systematic trends and useless noise.

A quick way to generate the scree plot out of the results of prcomp would be as follows : plot(Results$sdev^2/sum(Results$sdev^2), type = “l”)  .

Simply calling plot on Results will give us a barchart of the variances captured by each principal component, which has comparable information, though it is not exactly the same.

The next element of the prcomp return structure is called “rotation” and it tells us how the principal components are constructed out of the original values. Best is to see this in a practical example:

PC1                                PC2

Turbidity                        0.45146656           -0.59885316

Moisture                         0.56232155           -0.36368385

X25..Sol.Viscosity.         0.06056854             0.04734582

Conductivity                  -0.44352071           -0.59106943

pH..1..                               0.52876578            0.39686805

In this case we have 5 principal components but I am only showing the first two as an example. The first component is built as 0.45 times the standardized value for Turbidity plus 0.56 times Moisture plus 0.06 times the viscosity minus 0.44 times the conductivity and 0.52 times the pH.   For each measured data can generate a new data point using these values by multiplying the standardized original measurement with these weights (called loadings in PCA language).

What good would that bring us? The benefit is that after looking at the scree plot we can see that most of the variation in our data is captured by the first two (maybe three) principal components. This means that we can generate new artificial values that have only two components (PC1 and PC2 instead of the original 5, Turbidity, Moisture etc…)  and still describe the systematic trends of our data. This will make doing graphical analysis much easier – we can restrict ourselves to  calculating the new artificial values for PC1 and PC2 and to look at a two-dimensional graph to spot patterns. It also means that we eliminated a lot of random variation from our data. They are captured by the remaining components PC3—PC5 and we conveniently leave them out of the analysis. So, not only that we get a graph we can actually understand (I find my way in two dimensions much faster then in 5 ) but we also managed to clean up the picture by eliminating a lot of randomness.

As it is, we don’t even need to do the calculation ourselves – the Results structure contains the recalculated values for our original data in a structure called x – so x$x[,1] contains the transformed values for the first principal component and x$x[,2] for the second principal component. Plotting one against the other can give us a nice picture of the systems evolution.

Here is the resulting plot of PC2 versus PC1  for the example I was working on.

pca_example

The colors show different years and we can see that between 2015 and 2016 there was a very visible shift towards increasing PC1 values. Now looking at the loadings we see that the PC1 values increase if Turbity, Moisture and /or pH increase and  (or) conductivity decreases , Remember, these are all changes that are well within specs and also within the control limits, so they were not discernible with simple control charts. Still, once we knew what we were looking for trends in the increase of moisture and decrease in conductivity were easily identifiable.

Now, we can identify the process changes that led to these trends and strengthen the process control. PCA can be used to plot the measured values of future batches to see that we have no creeping of the measured values in the future. Not bad for an academic and useless method, right?