602rp7

Evaluation of the Effectiveness of

Training Programs

(based on Don Kirkpatrick's 4 Levels of Evaluation)

by Debra M. Farmer

EDUC 602

Instructional Systems Development I

Dr. Charles Hodell, Professor

Table of Contents

Definition of Evaluation As It Fits Into the ISD Model 1

Need for Evaluation 2

Don Kirkpatrick's 4 Levels of Evaluation 3

Design and Implementation 6

Evaluation Matrix Worksheet 7

Questionnaire/Survey 7

Interviews 8

Tests 9

Observations 9

Performance Records 10

Summary of Evaluation Instruments 10

Recommendations 11

References 12

Abstract

Even though Evaluation is listed as the last phase of the Instructional Systems Design ("ISD) model, evaluation actually takes place during all of the phases. Evaluation is used during the "ISD" process to evaluate the training program itself. This type of evaluation is referred to as internal evaluation. Internal evaluations are done during the "ISD" process and are associated with the analysis, design, development and implementation stages. Each of these four separate steps are tied directly to each other through the elements of the evaluation process.

Evaluation is also done to determine whether the learners have mastered the objectives of the training program. This type of evaluation is referred to as external evaluation. This paper focuses on external evaluation as a method to determine effectiveness of training programs. Evaluation can be used to determine whether the training program achieves its objectives. Evaluation can also assess the value of training, identify improvement areas, and identify unnecessary training that can be eliminated.

Don Kirkpatrick's 4 levels of evaluation is the basis of discussion on evaluation of the effectiveness of training programs. Level 1 measures the learners reaction to the training program. Level 2 measures learning that has occurred. Level 3 measures changes in behavior on the job as a result of the training program. Level 4 measures the results of the training program as it affects the company's bottom line.

The levels are presented, in order, from simple and inexpensive to complex and costly. Each level has its advantages and disadvantages. It is important to plan the evaluation process as the training is being planning. It is important to consider all levels at the outset, even though only one or two levels may be used ultimately.

Various evaluation instruments are discussed, as they relate to the 4 levels of evaluation. Multiple instruments should be used in any evaluation. Each instrument has inherent strengths and weaknesses. Multiple instruments can compensate for the weakness in another instrument or complement the strengths in another. Multiple instruments also provide more credibility and may produce different results that could be missed with a single evaluation instrument.

Recommendations are presented. Evaluation of training programs need to be conducted both before, during, and after training to truly access the effectiveness of training. Level 1 evaluation should be done for all courses. Level 2 evaluation should be done for any courses in which the trainees need to retain a set of knowledge or apply a specific skill. Level 3 evaluations are necessary in cases in which the course objective(s) is to change behavior on the job. A Level 4 evaluation should be done in those cases in which the results represent a top priority to the company. With all the effort involved, however, it would be impractical for most companies to conduct Levels 3 and Level 4 evaluation on every single course. Recommendations include concentrating on the most expensive programs, the strategic value of a course, or courses that have high priority to upper management.

Definition of Evaluation As It Fits Into the ISD Model

The Instructional Systems Development ("ISD") model approaches training development through a series of 5 phases:

1. Analysis

2. Design

3. Development

4. Implementation

5. Evaluation

The Analysis phase involves gathering and analyzing information in order to determine the appropriateness of training, training goals and objectives, profiles of the training recipients, and available resources. Design is the phase where structure and sequence of the training program are determined. This phase is when the learning objectives are written and evaluation criteria and tools are determined. The third phase, Development, is when the actual materials are prepared. This phase also includes the actual preparation of the evaluation materials. The Implementation phase is when the training program is actually carried out. This phase would also include evaluation procedures. Even though Evaluation is listed as the last phase, evaluation actually takes place during all of the phases. (Grafinger, 1994).

Evaluation is used during the "ISD" process to evaluate the training program itself. This type of evaluation is referred to as internal evaluation. Internal evaluations are done during the "ISD" process and are associated with the analysis, design, development and implementation stages. Each of these four separate steps are tied directly to each other through the elements of the evaluation process. ("Introduction", 1994). Needs assessment and evaluation of training especially work hand-in-hand. They work together as parts of a continuous feedback cycle to help training planners determine content. (Stoneall, 1991).

This paper will focus on external evaluation as a method to determine effectiveness of training programs. Donald Kirkpatrick's 4 levels of evaluation will be the basis of this discussion, as proposed by him in 1959. The model maintains that there are four levels to measure the quality or effectiveness of a training course.

Need for Evaluation:

Training and development staff are more becoming more and more accountable for the effectiveness of their programs. Evaluation can be used to determine whether the training achieves its objectives. Evaluation can also assess the value of the training, identify improvement areas, and identify unnecessary training that can be eliminated. (Kirkpatrick, Kramer & Salinger, 1994).

Many training professionals agree that evaluation is important to successful training, but few conduct complete and thorough evaluations. Evaluation can seem anti-climatic to the excitement and creativity of creating a new course. (Birnbrauer, 1987). Typically evaluation is an afterthought or not done at all. "Evaluation builds in rigor. It's an integral part of the whole quality effort. If you don't measure, how do you know whether what you've done is worthwhile?". (Meeting Management News, 1992).

With more emphasis on return on investment, companies are asking what is the value of training. Too often, training departments have little or no idea how their training relates to the business objectives of the company. This could be due partially to trainers' lack of measurement and evaluation skills, which result in measurements that are not valid, reliable or even useful to the management of the company. (Davidove, 1993).

The training department that measures increase in number students is in trouble. A training department that is concerned only with counting the number of students in seats probably isn't measuring whether the students learned anything or whether the skills they learned are helping them to perform their jobs more efficiently. (Geber, 1994).

Don Kirkpatrick's 4 Levels of Evaluation:

One of the most widely used model for evaluating training programs is one that was proposed in 1959 by Donald L. Kirkpatrick. The model maintains that there are four levels to measure the quality or effectiveness of a training course. Moving down the column, the matrix presents these levels, in order, from simple and inexpensive to complex and costly. Each level has its advantages and disadvantages. It is important to plan the evaluation process as the training is being planning. It is important to consider all levels at the outset, even though only one or two levels may be used ultimately.

The following is a description of Kirkpatrick's 4 levels of evaluating training:

Donald Kirkpatrick's 4 Levels of Evaluating Training
Levels		Description	Comments
Level 1	Reaction	Trainee reaction to the course. Does the trainee like the course? Usually in the form of evaluation forms, sometimes called "smile sheets".	Most primitive and widely-used method of evaluation. It is easy, quick, and inexpensive to administer. Negative indicators could mean difficultly learning in the course.
Level 2	Learning	Did trainees learn what was based on the course objectives?	Learning can be measured by pre- and post tests, either through written test or through performance tests.
Level 3	Behavior	Trainee behavior changes on the job - are the learners applying what they learned?	Difficult to do. Followup questionnaire or observations after training class has occurred. Telephone interviews can also be conducted.
Level 4	Results	Ties training to the company's bottom line.	Generally applies to training that seeks to overcome a business problem caused by lack of knowledge or skill. Examples include reductions in costs, turnover, absenteeism and grievances. May be difficult to tie directly to training.

(Kirkpatrick, 1959), (Kirkpatrick, 1975), (Geber, 1995), and ("Meeting Management News", 1992).

Level 1 (Reaction) is the most commonly-used method of evaluation, probably because it is the easiest type of evaluation to administer and evaluate. This level produces what some people dub the "smile sheet", which measures how well the students like the training. Level 2 (Learning) is not as well-used in business settings as an evaluation technique; school settings are more likely to use Level 2 evaluation techniques. Level 2 evaluation techniques are most reliable when pre- and post- evaluations are utilized.

The message that managers are delivering is that the training department needs to show concrete evidence that training is achieving its goals of changing behavior on the job (Level 3) and is also contributing to the company's bottom line (Level 4). Reasons for this include the influence of the quality movement and its emphasis on measurement, cost cutting measures which forces training departments to use money more wisely. Trainers are realizing that their goal is to effect results, not just to put people in seats. "Learning that doesn't change the business isn't useful". Another reason attributed to the increased interest in evaluation is the rise of technology, which has eased much of the burden of data-gathering for evaluating training. (Geber, 1995).

Unfortunately, for most trainers, doing Level 3 and Level 4 evaluations are the "trainerly equivalent of flossing your teeth". (Geber, 1995, p. 27). Trainers will probably not do Level 3 and Level 4 evaluations unless they are told to do so. Executives who are getting sophisticated measurements from the rest of the company also expect the same from the training department. Level 3 evaluations are difficult because human behavior needs to be measured. Level 4 evaluations may be actually easier to accomplish than Level 3, since Level 4 evaluations are tied to measurable information. Some trainers believe that a positive Level 3 evaluation implies success at Level 4. Some executives are willing to assume that if employees are exhibiting the desired behavior on the job (Level 3), that will have positive influence on the company's bottom line.

Even though more difficult, Level 3 and Level 4 evaluations do provide other advantages besides contributing to company goals. These evaluations can be a "value added" service that the training department can provide. They can also be instrumental in over hauling current curriculum; if a course is not meeting company objectives, then either change the course or stop offering the course all together. Eliminating unnecessary courses could positively affect the company's bottom line. Another benefit of deeper evaluations is that it can uncover the barriers that prevent the training from being applied to the job. (Geber, 1995).

Design and Implementation

To collect accurate information with evaluation instruments, you need a basic knowledge of statistics and research methods. You need to know how to use various instruments and be able to select the most appropriate instrument for each evaluation. Multiple instruments should be used in any evaluation. Each instrument has inherent strengths and weaknesses. Multiple instruments can compensate for the weakness in another instrument or complement the strengths in another. Multiple instruments also provide more credibility and may produce different results that could be missed with a single evaluation instrument. (Marrelli, 1993).

Conducting evaluations, particularly Level 3 and Level 4 evaluations, can seem overwhelming, and if it is done as an afterthought it can be very difficult to do. However, it is much easier to design evaluation into the course as the course is being developed. Evaluation must be plotted while the training course is still a fresh idea. (Geber, 1995). The following evaluation matrix worksheet could used to assist in the design of the evaluation:

Evaluation Matrix Worksheet
Levels	What might be measured?	What are the data sources?	How should data be collected?	What are potential problems?
Level 1 (Reaction)
Level 2 (Learning)
Level 3 (Behavior on job)
Level 4 (Results)

(Birnbrauer, 1987)

Evaluation instruments need to be selected based on the design of the training program, as discussed earlier. Before selecting an evaluation instrument, the following should be considered:

· Will the instrument answer your question? The instruments you select must be appropriate for the questions you ask.

· Does the instrument suit the evaluation design?

· Is the instrument valid? The instrument selected must accurately measure course objectives.

· Is the instrument reliable? The instrument must provide consistent information.

· Is the instrument practical? Consider reading and vocabulary levels of printed materials and test; consider time and monetary resources needed to produce the evaluation.

The following are some sample evaluation instruments and what evaluation level they are most appropriately used for.

Questionnaire/Survey (Level 1, Level 2, Level 3)

A questionnaire or survey is a printed or computerized form using questions such as multiple-choice, ranking scale, rating scale, or open-ended. Questionnaires/surveys can be given to both the learner (Level 1, Level 2) or could be given to supervisors to access on-the-job behavior changes (Level 3).

Some suggestions for using questionnaires and surveys are as follows.

· Avoid using general "one size fits all" questions. Make specific for the training

· Ask to what degree training met objectives?

· Make it easy.

· Allow ample time (make sure that students are not in a hurry or tired)

· Ask about confidence level. If low, the learner probably won't use their learning

(Meeting Management News, 1992)

Other suggestions include:

· Include space for additional comments

· Don't require participants to sign the form.

· Establish numerical equivalents (e.g. 1=poor, 5=excellent)

(Kirkpatrick, Kramer & Salinger, 1994).

Interviews (Level 1, Level 2, Level 3)

A face-to-face interview involves an individual responding orally to oral questions asked by an interviewer. Interviews can be either structured or unstructured. Structured interviews consists of a list of predetermined questions. Unstructured interviews begin with standard questions but bases subsequent questions on the interviewee's responses to the previous questions. Interviews provide a means to collect in-depth information from participants who are reluctant to fill out a questionnaires. Interviews can be time-consuming and expensive and must be given by a skilled interviewer. Some participants may be less willing to reveal information in an interview. (Marrelli, 1993). However, interviews (vs. surveys) are most likely to get people to "tell stories" and give specific illustrations. (Stoneall, 1991).

Interviews can also be done in small groups of 5-12 people. This type of group interview is often called a "focus group" and can used to collect in-depth qualitative information. Before the focus group meet, methods for recording, reviewing and synthesizing information should be established. (Marrelli, 1993)

Tests (Level 2, Level 3)

Tests can be administered as a standardized method for measuring knowledge (paper and pencil test) and skills (performance test). To be able to measure training effectiveness, pre- and post tests are given to determine change after training.

To increase the effectiveness of the test, the following are suggested:

· Draft sample questions during program development; make sure all objectives are included and all questions not related to objectives are deleted. Relevant tests yield more valid results. Review test before administering.

· Plan test details: schedule, timing instructions, scoring.

· Avoid trick questions or questions with more than one answer. Write questions as clearly as possible.

· Use a random arrangement of answers to keep test-takers from guessing the pattern of correct answers.

· Vary the difficulty of the questions.

(Marrelli, 1993)

Observations (Level 2, Level 3)

The work behavior of trainees is observed before, during, and after training. A trained observer watches and records the behavior. Sometimes the behavior is videotaped or audiotaped to be able to study later. This method provides evaluation of both verbal and nonverbal behavior. The major disadvantages of observations include modified behavior of the participants as a result of the observers presence, poorly trained observers who collect unreliable data, and observations which are expensive and time-consuming. The impact of the observer can be minimized by carefully choosing observers, giving observers standard forms to fill out, and trying to minimize the presence of the observer. (Marrelli, 1993)

Performance should be measurable with observable results, based on the objectives for the training program. There should be a systematic appraisal of performance on the job both before and after training to determine changes, if any. The observers could include instructors, supervisors, co-workers or professional observers. As with any post-training evaluation, it must be determined if the performance is a result of training or some other factors. Barriers to using knowledge could include lack of management support, low priority, or lack of proper tools or equipment. Control groups, who do not receive training, can be used to measure other factors. (Kirkpatrick, Kramer & Salinger, 1994).

Performance Records (Level 4)

Performance records can be used to evaluate a training program's effect on the company's bottom line. Data such as costs incurred, amounts produced, revenue generated, or time required to complete tasks, would be measured both before and after training to be able to quantify the effects of training. Any measurable savings to the company could be compared with the actual cost of delivering training. (Marrelli, 1993)

Summary of Evaluation Instruments:

The following is a summary of various evaluation instruments and which evaluation levels they can be administered to:

Evaluation Level:	Evaluation Instrument(s):
Level 1:	Questionnaire, survey, interview
Level 2:	Questionnaire, survey, interview, observations, written/performance tests
Level 3:	Questionnaire, survey, interview, observations
Level 4:	Performance Records

Recommendations:

Evaluation of training programs need to be conducted both before, during, and after training to truly access the effectiveness of training. Ideally, more than one evaluation instrument should be used.

Level 1 evaluation should be done for all courses. Level 2 evaluation should be done for any courses in which the trainees need to retain a set of knowledge or apply a specific skill. Level 3 evaluations are necessary in cases in which the course objective(s) is to change behavior on the job. A Level 4 evaluation should be done in those cases in which the results represent a top priority to the company; the evaluation should be able to be realistically linked to hard financial information. (Geber, 1995).

With all the effort involved, however, it would be impractical for most companies to conduct Levels 3 and Level 4 evaluation on every single course. Recommendations include concentrating on the most expensive programs, the strategic value of a course, or courses that have high priority to upper management. (Geber, 1995).

References

Birnbrauer, H. (1987). Evaluation Techniques that Work. Training and Development Journal, July, 1987, 53-55.

Davidove, E. A. (1993). Evaluating the Return on Investment of Training. Performance & Instruction, January, 1993, 1-8.

Evaluation: A Way to Stay on Track. (1992, December). Meeting Management News, pp. 1-4.

Geber, B. (1995). Does Training Make a Difference? Prove It! Training, March , 1995, 27-34.

Geber, B. (1994). Re-Engineering The Training Department. Training, May, 1994, 27-34.

Grafinger, D. J. (1994). Basics of Instructional Systems Development. In C.. Hodell, Instructional Systems Development, (pp. 1-14). Alexandria Virginia: American Society for Training and Development.

Introduction (1994). In C. Hodell, Instructional Systems Development, (pp. i-xxxvii). Alexandria Virginia: American Society for Training and Development.

Kirkpatrick, D. L, Kramer, G., & Salinger, R. (1994). Essentials for Evaluation. In C.. Hodell, Instructional Systems Development, (pp. 191-207). Alexandria Virginia: American Society for Training and Development.

Kirkpatrick, D. L. (1975). Evaluating Training Programs. Madison, Wisconsin: American Society for Training and Development.

Kirkpatrick, D. L. (1959, December). Techniques for Evaluating Training Programs - Part 2: Learning, Journal of the American Society of Training Directors, 21-26

Marrelli, A. F. (1993). Ten Evaluation Instruments for Technical Training. Technical & Skills Training, July, 1993, 7-14.

Stoneall, L. (1991). Inquiring Trainers Want to Know. Training and Development, November, 1991, 31-39.

Galas, K. C. (1983). The Formative Evaluation of Computer-Assisted Instruction. Educational Technology, January, 1983, 26-28.

The Other CBT. (1995, October). Training, pp. 127-128.

Tracking Training on the Network. (1994, November). Training, p. 66.