Metrics and Models for the Evaluation of Supply Chain Integration Jonathan A. Morell, Ph.D. A version of this article appeared in an issue of the EDI Forum that was dedicated to Electronic Commerce in Manufacturing (volume 10, #1,1997). The EDI Forum is published by the EDI Group, Ltd. 221 Lake Street, Oak Park IL 60302, 708 848-0135 www.edigroup.com. Reprinted here with their permission. Contents
Supply chain communication refers to the flow of information among several layers of a manufacturing supply chain. In such a system a set of customers and suppliers are constantly transmitting information about material needs, production schedules, and product availability. Supply chain integration (SCI) refers to a state where this information is transmitted quickly among trading partners, and where the partners are able to readily adjust their activities based on changes in requirements for material or the exigencies of production and delivery schedules. Supply chain integration is important because information flow affects the timeliness of delivery to market, because miscommunication at any point in the chain can cause serious problems at remote parts of the chain, and because the open-system nature of manufacturing requires constant adjustments to production plans. Supply chains compensate for these problems by building inventory, and inventory is expensive. Proper coordination is important because it allows information to substitute for material. Making SCI happen is difficult because the process requires new business process, new technology, and coordinated action among sets of companies, not all of whom are in a direct supplier - customer relations with each other. In addition to being difficult, SCI affects critical-path activities in delivering a product to market. The combination of difficulty and criticality makes SCI risky, and thus companies considering SCI need assurance that it has worked for others, and that it is likely to work for them. That assurance can come from systematic evaluation of many exercises in SCI, which in the aggregate, can build a body of information on what works under what circumstances. As knowledge builds about best practice and best process for SCI, it will become easier (i.e. less risky) for more and more sets of trading partners to coordinate their activities. A related justification for metrics is that their value in helping to monitor intermediate progress, and thus to provide a basis for continuous improvement, a basis for mid-course corrections, and confidence during the inevitable early stages when investment in SCI is high and its payoff is low. Good evaluation of SCI requires all the science and craft developed by evaluators for the assessment of real-world interventions. It is beyond the scope of this article to show how an entire evaluation should be done. Rather, my intent is to provide examples of how we have dealt with one particular aspect of SCI evaluation, namely the development and use of metrics and models to monitor SCI pilot programs. What makes a "good" metric? From a psychometric point of view, metrics must be reliable and valid. Reliability refers to consistency, i.e. if circumstances do not change, the metric should read at a consistent value. Unreliable metrics can be thought of as indicators subject to a large amount of random variation. They may truly assess whatever concept they are supposed to be measuring, but one cannot trust the precision of the assessment. A valid metric is one that actually measures the concept we think it is measuring. To illustrate. One indicator of the success of an electronic commerce trading partner relationship between an OEM and its suppliers may be the number of "expedited purchase orders". The validity question is whether this metric truly indicates the effectiveness of electronic commerce. One could argue in various ways. Part of EC is a good planning process supported by technology. Because successful EC implementation requires good planning and efficient communication, one would expect EC to decrease the need for expediting. Or in a contrary manner, Expediting is mostly driven by unexpected changes in the OEM's market, and thus is not an indicator of the effectiveness of OEM/supplier EC. And so on, as many variations, nuances, and partial affects may be considered. The correct answer, of course depends on context. If historically, expedited purchase orders in the OEM - supplier relationship came from poor internal planning and data transmission within the OEM, then expedited purchase orders are a valid indicator. If the source of uncertainty has been the market, then the metric is not valid. Whether valid or not though, the measurement itself may be reliable, as in both cases one may be able to accurately assess the number of expedited purchase orders. In addition to the "scientific" concerns of validity and reliability, good metrics must also be practical, in the sense that the data can be obtained at reasonable cost and effort; and salient, in that they must mean something to the people who will use the information. To build on the previous example, many companies, for reasons of running their own business, keep good records of the number of expedited purchase orders they receive. Further, managers care about this number because the greater the expediting, the greater the trouble and cost to all involved. The challenge is to jointly maximize the four characteristics of a good metric - validity, reliability, practicality, and salience. As in any design effort, good metric design requires not only a joint optimization of these characteristics, and also a cross functional design team whose members collectively understand, and have a stake in, all of the desired design parameters. Metrics are most useful if they are embedded in a model which shows how an electronic commerce relationship works, and how each metric relates to the others. To illustrate this point consider a very simple example in which a company implements EDI with its suppliers in the hope of improving production systems and thereby shortening delivery time to its customer. What could we say about the value of this electronic commerce innovation if upon its implementation, we discovered that delivery times to customers were unaffected? In fact we could say very little because we would not know whether EDI was the cause of the problem, whether other critical-path factors were unaffected by EDI, or whether the experiment in electronic commerce had any beneficial impact at all. Now imagine that evaluation were guided by the model presented in figure 1.
The model tells us that we assume a causal relationship in which EDI causes fewer data input errors by a supplier, which in turn lead to fewer improper deliveries or shipping delays, which in combination, lead to faster production and delivery by the OEM to the end user. The model has guided us to metrics. Because of the articulated model, we have measured the extent of EDI implementation, numbers of improper or late deliveries, production time, and delivery time to end-uses. And because we have that data we can tell what happened, as illustrated by the contrasting stories told in tables 1 - 3, which show three sets of possible results.
|
In table 1 we see that EDI was never really implemented very well, which implies that as an innovation in electronic commerce it was not tested. In this case we don't know if it would have worked or not. The data also tell us that our model is weak because it has not accounted for many reasons why change may take place. For instance: There is noteworthy change in suppliers' data input accuracy and "OEM production time", even though no useful change took place in "shipping by suppliers", "improper deliveries by suppliers", or "OEM delivery time". In table 2 we observe that EDI was implemented, that errors were reduced, but that improper or late deliveries were unaffected. We also observe, however, that considerable favorable change took place in the OEM's behavior - both production time and delivery time improved. On one hand these data show us again how weak the model is because "downstream" variables changed in spite of no change in what were hypothesized as upstream, critical path activities. On the other hand, the model saves us from making an incorrect inference. If we had data only for EDI implementation and OEM behavior, we would conclude (incorrectly) that our experiment in SCI succeeded. This pattern of results also illustrates the value of models in guiding continuous improvement. Here we know that EDI did work in terms of having the proximate effect of decreasing data errors, but that better data accuracy is not, by itself, sufficient to affect shipping delays or improper shipments. The model lets us identify where we need to look for further improvement. The pattern in table 3 gives us great confidence that SCI worked as intended. First, all metrics improved as expected. Second, the pattern of results was plausible, i.e. it made sense in terms of what we know about SCI. We all realize that no model can identify all relevant variables, and that the further one gets from "proximate effects", the more complicated gets the world, and the greater the number of unanticipated factors which may influence events. And what do we see? Large changes in suppliers ability to ship as planned, and lesser changes in the OEM's ability to ship as planned. This makes sense because the OEM's behavior is only partially determined by suppliers' behavior. The lesson in this example is that because we have a model we can understand what happened. Without the model we would not have been directed to collecting metrics on degree of EDI implementation, input errors, improper deliveries to the OEM, shipping delays to the OEM, and production time. We would only have data on delivery time to the end user, and thus, little ability to use the data in a constructive fashion. Tables 1-3 also illustrate several other important points about the value of models. First, metrics can differ in terms of scale, precision or their specific meaning. Observe how "extent of EDI implementation" is measured in the same way in tables 2 and 3, but in a different way in table 1. (Compare italics for variable 1 in table 1 with the corresponding definitions in tables 2 and 3.) Both methods are valid in that they accurately reflect "extent of EDI implementation", but they certainly differ in degree of precision, and in the type of statistical analysis that can be conducted. While ratio scales of measurement and high degrees of precision are always desirable, these concepts are different from the four critical measurement criteria discussed earlier (validity, reliability, salience and ease of collection). A metric may be precise and not valid or reliable (we have all seen meaningless data carried to four decimal places), reliable but qualitative (e.g. experts agree on categorizing into a few broad categories), and so on. Thus "scale quality" and "precision" must be seen as fifth and sixth attributes of measures, and added into the mix of trade-offs when models and metrics are developed. As an example of these trade-offs, consider, a company's information system may make it easy to accurately determine the number of shipping delays per month, but not the number of shipping delays per week. Do we care enough about the added precision of weekly data to add to the burden of a special data collection effort to get the data weekly? Are we willing to put up with increased risk of unreliable data due to a new and unproved data collection mechanism? Without careful consideration of specific context and needs for data, these questions are unanswerable. But left to chance, incorrect answers may cause great harm to the proper collection of powerful data. To see how meaning can differ, compare the definition of "data accuracy" (variable 2) and "OEM delivery time" (variable 6) in the tables. (Note the italicized text in the definitions in table 3.) These are all useful ways to measure accuracy and lateness, but they do not mean exactly the same thing across all three scenarios. A second insight from comparing our hypothetical results is that models are never likely to account for all relevant factors. One problem is that fixed models cannot ever fully specify an open system. Over and above this theoretical problem, models are only as good as people's imperfect ability to identify relevant metrics. Finally, each element added to a model requires the investment of time, effort, and money to collect data and to assure its quality. Two approaches can help bolster the inherently limited power of models. The first is to embed model-based data collection in a qualitative analysis of events and circumstances. As an example, consider table 3. Do we really want to believe that improvement in suppliers' ability to ship affected the OEM's ability to ship? We would feel a lot more certain about it if we had a set of interviews with key people within the OEM, all of whom said something like: "One of our big problems in satisfying our customers was always that we could never really trust our suppliers' shipping schedules. Since this EDI stuff started, that problem has pretty much been solved, and because of that we have gotten a lot better with our own internal planning." On the other hand, we might not trust the quantitative data if all interview responses were along the following lines: "Not being able to rely on our suppliers' deliveries was always a problem for us, but it never really got in the way or our delivery to our customers. Sure we had to scramble a lot more than we wanted, but we always knew what we had to do to get our shipments out. I can only think of a few times in the last year or so when problems with delivery from our suppliers really caused us to miss a production deadline." A second approach is to be ready and willing to add to a model as events unfold. To extend the previous example, imagine that the interview results came half way through an eighteen month Pilot. In such a situation, it may be worth the effort to add a metric to the model and to specifically measure the OEM's internal planning. As a final point about models, it is important to realize that models are context dependent. Models can differ a great deal depending on the people involved, their needs for data, and the way a project unfolds. As an example of how context can lead to different models, consider figures 2 and 3. Figure 3: Metric Model for the ECOTS Supply Chain Integration Pilot
Figure 1 is a model developed for the Manufacturing Assembly Pilot (MAP), an effort to integrate an automotive supply chain, starting with a tier-one automotive supplier (Johnson Controls), and extending four levels deep. Figure 2 is a model developed for Electronic Commerce for Oshkosh Truck Suppliers (ECOTS), an effort to establish electronic commerce relationships between Oshkosh Truck and six of its direct suppliers. Both pilots are similar in important ways. Both are in the automotive industry, and both focus on material flow, i.e. the ordering or scheduling of parts. Despite these similarities, the models used are very different. The MAP model (figure 1) shows a clear chain from three immediate changes wrought by SCI (faster information, more complete information, and more accurate information) to a variety of outcomes at intermediate levels, to the business outcomes that people really care about. It does not, however, detail internal changes within Johnson Controls, nor does it articulate which data elements are being collected from which participants. Levels of change are not as clear in the Oshkosh model (figure 2), but the Oshkosh model does detail internal changes in the OEM, and it does state which data are coming from which companies. There are considerable differences in the actual metrics specified in each model. Finally, style, form, and layout, are very different. The reason for these differences are rooted in the realities of what stakeholders in each Pilot cared about, what data were available, the number of elements in each model, and a variety of other factors that led to specific models for specific circumstances. Both though, are real models that were (are) used to evaluate the effectiveness of real industry Pilot projects in SCI. Each in their own way is equally useful. Neither Pilot would have good evaluation if it had only a laundry list of metrics without a model of how each metric fits into an overall system. Metrics and Model Development as an Iterative Process For the sake of clarity of presentation, I have broken the treatment of "metrics" and "models" into separate sections, and thus left the false impression that these tasks are independent. They are in fact, highly interdependent, and should be developed in an iterative and concurrent manner. Models are useless unless they are populated with good metrics, i.e. metric which jointly optimize validity, reliability, salience, and ease of collection. Metrics in turn, are useless unless they can inform a relevant model (i.e. one that speaks to critical business problems that may be affected by SCI), and map changes in critical business and technological processes that may affect business problems. In engineering terms, the challenge is to jointly optimize the quality of the metrics and the quality of the model. Developing Your Own Metrics and Models In light of the need for metrics and models to be context-specific, what are the general guidelines that can help a group develop useful metrics and models for their specific circumstances? Those guidelines fall into two categories - development process and meta-models. Development process In terms of development process, metric and model development must be viewed as any complex design problem wherein good process is characterized by the principles of integrated product - process design (IPPD). These principles, along with examples of applications to SCI, are presented in table 4.
The scale and criticality of model and metric development should determine the rigor with which these principles are applied. To see why contrast the following two cases. Case #1: Three people are developing an assessment of a small scale experiment in which technical product data is enclosed in mime-enabled messaging software and sent to a single supplier. The object of the exercise is to determine whether this process has any cost or speed advantages over sending faxes or express mailing disks. Case #2 Representatives of two large OEMs and many common suppliers are working on an effort to revamp internal systems and to put both EDI technology and supportive business process in place so that time and costs are reduced for all participants in the participating network of trading partners. While good design principles are needed in both cases, these two efforts differ greatly in factors such as the time it will take to do the development, difficulty of managing the process, consequences of failure, diversity of group members and their vested interests, complexity of the model, and number of metrics. Case two certainly requires a more rigorous design process than does case one. Meta-models While it may be true that models for evaluating SCI are unique to their specific purposes, it is also the case that all specific models can be seen as derivative of an over-arching model of system functioning based on general systems theory. By using basic concepts from system theory it is possible to assure that appropriate specific metrics are chosen. While there is not a one to one correspondence between specific models and a general systems model, that general model works well as a heuristic to guide metric development. This notion is illustrated in table 5 which presents specifics of the MAP and ECOTS models, and maps them into a set of characteristics that describe open systems.
Relation Of Metrics And Models To The Business Case For Supply Chain Integration One of the chief reasons for evaluating innovations in SCI is to help build a business case for change, i.e. to provide information that will help others institute similar changes. For this process to be effective it is important to appreciate the difference between metrics and models on the one hand, and a business case on the other. Metrics and models help tell the story of what happened during an exercise in SCI. A business case is a story that helps business people to commit to action. The two stories are not, and cannot be, the same. The challenge is one of approximation, i.e. to design a system of models and metrics that is as close as possible to a business case. Figure 4 represents this challenge. Why can't a business case be completely coincidental with metrics and a model? Because:
The challenge is to develop models and metrics that will, to the greatest degree possible, result in a business case for SCI. The keys to success are to include appropriate decision makers in the metric and model design team, and as the data collection process unfolds, to continually educate key people on the value of what is being discovered. The key to good mental health is to accept that given the constraints and complexity of good business practice, models and metrics should not equal a business case for SCI. Manufacturing faces a growing challenge to shorten cycle times, reduce costs, and respond more quickly to markets. Supply chain integration is a critical element in meeting this challenge. It is also, however, a risky endeavor which entails large-scale changes in business process and technology, and which can only be implemented during limited windows of opportunity. Because of the risk, scale of change and limited opportunity, high quality information is needed to encourage action and to improve the probability of success. That information can come only from a collaboration between social scientists who understand data and modeling, and the executives, managers and staff who are responsible for assuring the vitality of their businesses. A chief goal of that collaboration must be the deployment of SCI testing models which have both methodological power and practical value. Jonathan A. Morell, Ph.D.
|