Portfolio Assessment of Teachers:
A Critical Examination 

Martin A. Kozloff
[Revised January, 2003]

 I.  Introduction

Strengthening student involvement and raising student achievement require that educators know their business--for example, (1) that teachers know the subject matter (e.g., history, reading, math), know how to design instruction, know exactly how to teach, and know how to evaluate and revise instruction; (2) that school principals and state departments of public instruction know how students best learn reading, math, and other subjects, and know which assessment instruments, curricula (e.g. in reading), and teaching methods are validated by scientific research; and (3) that schools of education ensure that new teachers and administrators have the knowledge listed above.  Therefore, assessment of graduating education students, new teachers, school principals, and schools of education is essential.

Many persons and groups in education are dissatisfied with standardized and/or more quantitative (objective) forms of assessment of new teachers that involve focus on (e.g., direct observation of) specific performances; e.g., how well and how often teachers correct errors and how clearly teachers present information.  These persons and groups favor more qualitative, portfolio assessment, which allegedly provides new teachers greater leeway in presenting a case for themselves by selecting "evidences" of their skills and growth; e.g., lesson plans, summaries of books they have read, and "reflections" on their teaching.  Portfolio assessment is offered as a form of assessment that may largely replace (e.g., in the case of teachers and principals) or supplement (in the case of ed students) more objective forms of assessment.  However, as with many educational reforms over the past 100 years, it is likely that in the absence of careful critical analysis and substantial pilot projects and field testing, wide-scale (e.g., state level) portfolio assessment of new teachers will turn out to be far less useful, far more expensive, and have more unanticipated and undesirable consequences than anyone imagined.  This paper is offered as a way of encouraging discussion of wide-scale portfolio assessment.

II.  The Argument

Clearly, portfolios appeal to many persons.  Portfolios are consistent with these persons' theories of learning, pedagogical preferences, and styles of expression and self-development.  This is well and good at the individual level or in the relationship between individuals and mentors.  However, when portfolio creation and assessment are used on a wide scale (e.g., by state education agencies), and when the portfolio process is controlled by one group (a "dominant minority" to use Arnold Toynbee's term) for use with another group (a "subordinate majority"), then portfolio assessment is no longer merely a matter of individual preference.  Sociological and political issues must be considered, as must the examination of the validity of the entire portfolio assessment process and its details.

Research on and a critical examination of wide-scale (e.g., state level) portfolio assessment of new teachers must address the following propositions.
1.   The wide-scale use of portfolios is enormously expensive and organizationally unwieldy.

2.   Portfolio assessment has neither sufficient face validity, criterion validity, nor pragmatic validity.

3.   Portfolios have not been shown to be more valid, more feasible, and less costly than existing school-based assessment methods.

4.   Wide-scale portfolio assessment disempowers individuals and groups at the local level and increasingly empowers dominant minorities who control the assessment process.  Note that wide-scale portfolio assessment was created by the very groups who benefit most (social position, power) from its wide-scale use.

5.   Wide-scale portfolio assessment is an unwarranted invasion of individual privacy and a clear and present danger to individual liberties and to the civil liberties of minority groups.

                  III.  The Wide-Scale Use of Portfolios is Expensive and Unwieldy

The wide-scale use of portfolio assessment of new teachers means that every year state Departments of Public Instruction will have to process tens of thousands of portfolios created by students and by teachers seeking initial licensure.  These portfolios will have to be stored, mailed from one assessor to another, and mailed back to assessees.  Assessors will have to be selected and trained and be subjected to frequent reliability checks.  The assessment of any portfolio must require at least several hours.  These facts will require the development of a large and costly assessment apparatus, without which there will surely be chaos.  Indeed, considering the enormous number of portfolios every year, it is doubtful that the job can be done at all.

                    IV.  Portfolio Assessment Is Neither Validated Nor Validatable

The portfolio assessment process rests on: (1) the validity of scoring rubrics; (2) the validity of scorings; and (3) the validity or usefulness of portfolio assessment compared to other forms of assessment.  In all three ways, wide-scale portfolio assessment either has weak validity or is unvalidated.

Questionable Validity of Scoring Rubrics
There are at least three sorts of validity with respect to scoring rubrics: face validity, criterion validity and pragmatic validity.

1.  Face validity is the subjective judgment (or intersubjective judgments) of rubric creators that an item (measure) in a rubric actually measures what it purports to measure.  Face validity requires that words in items are sufficiently clear that their meaning (referent) is obvious.  For example, hitting another person probably has face validity as an indicator or measure of the concept "aggression." However, many items or measures in portfolio scoring rubrics do not have face validity.  In view of the many "approaches," "theories," and "perspectives" in education, it is doubtful that terms such as "learning," "classroom management," and "developmentally appropriate," to name a few, have common meaning to assessors and to the individuals (e.g., teachers seeking licensure) who are creating portfolios guided by these terms.  Moreover, there are no reported studies of the face validity (e.g., inter-assessor judgments) of the concepts used in portfolio rubrics.

2.  Criterion validity is the extent to which a new (candidate) instrument, measure, or item is correlated with a widely used, already-validated and generally standard measure.  For example, a standard measure of upper-body strength is the number of repetitions of bench presses a person can do with certain amounts of weight.  The criterion validity of a new candidate measure (e.g., the density of fiber bundles in chest muscles) would be indicated by the degree of correlation between the number of bench presses persons do and the density of muscle fiber bundles.  If the correlation is low, we judge the candidate measure to have little validity.

However, there are no substantial studies of the criterion validity of items in portfolio rubrics.  Moreover, criterion validation is simply impossible for a large percentage of items.  This is because the items are actually hypothetical constructs, psychological dispositions, or what some philosophers call "mental predicates."  In other words, they cannot be measured.  Therefore, one cannot determine their correlation with standard measures.  Examples include dispositions to consider the whole child, attitudes toward life-long learning,
reflectiveness, and openness to new ideas.

3.  Pragmatic validity is perhaps the most important test of portfolio assessment.  After all, the whole point is to determine how well a new teacher (for example) teaches.  Yet, there are no significant studies (indeed, there do not appear to be any studies) of the extent to which scores on any part of the portfolio assessment, or scores on the portfolio overall, predict or are correlated with the achievement of a teachers' students.  Naturally, portfolio scores ought to predict many other things that good teachers and principals do.  But there is no validation of portfolios as predictors of these, either.  In other words, there is little if any validation of the extent to which portfolio assessment serves its stated function (Hambeton et al., 1995; Koretz at al., 1994; Pacific Research Institute, 1999). 

V.  Portfolios Are of Little Use

In addition to the questionable validity of scoring rubrics, there is the question of the comparative validity or usefulness of portfolios. However, no substantial studies have been done of the extent to which portfolio assessment provides more information, more useful information, or more pragmatically valid information (Hambeton et al., 1995; Koretz at al., 1994; Pacific Research Institute, 1999) than:

1.  The current method of (for example) assessing and improving initial teacher performance by year-long interaction, supervision, and coaching by school principal and mentors; or

2.  An enriched method of this sort of assessment in which principals and mentors are trained to supervise, coach and provide formative and summative evaluation.

In the absence of solid data showing that portfolio assessment (by state agency personnel who never observe assessees) is more valid and provides more information than current or easily-enriched local forms of assessment, there is little reason to replace current forms of assessment with wide-scale and expensive portfolio assessment.

VI.  Portfolio Assessment Is Disempowering

The case for portfolio assessment often turns on an appeal to self-empowerment. Portfolios are supposed to enable persons to express themselves in a way that is comfortable for them and without intimidating scrutiny by supervisors.  This of course is laudable and highly desirable.  At the individual and interpersonal (mentor-mentee) levels,  portfolio creation and evaluation no doubt have these beneficent and beneficial effects.  However, at the wide-scale level, where sociological and political processes come into play, portfolio assessment reduces the power of individuals, mentors, and school principals, and shifts power to persons and agencies who created and who manage the assessment apparatus.

First, the items to be assessed and the scoring rubrics were not created by persons being assessed or their representatives.  This is an obvious example of alienation, disempowerment and "loss of voice." How can it be said that portfolios reveal and encourage self-reflection and self-development when the scoring rubric and portfolio instructions tell assessees what to write about and how they will be assessed?

Second, school principals may still be able to hire teachers and to write letters of recommendation, and mentors may still be able to observe and meet continually with assessees, but the major decision-making power is shifted to the agencies who have created and manage the assessment apparatus.  This is an example of disempowerment and loss of voice--in the very persons who work most closely with assessees and who, logically, ought to have the most compelling voice.

                  VII.  Wide-scale Portfolio Assessment is an Invasion of Privacy and a Threat to Individual Liberties and to the Civil Liberties of Minority Groups

This assertion may strike the reader as hyperbole.  However, if the reader will bear with me, I believe he or she will find the argument tenable.  I have spent at least 20 years studying despotism, totalitarian societies, witch hunts, religious and political persecution, spouse and child abuse, and the abuse of "inmates" in nursing homes, mental hospitals and other "total institutions."  Informative works include Reiff (Triumph of the therapeutic), Henry (Pathways to madness), Solzhenitsyn (The Gulag archipelago), Huxley (The devils of Loudon), Frank (Persuasion and healing), Gubrium (Living and dying at Murray Manor), Glasser (Prisoners of benevolence), Emberely (Values education and technology), Kelman (Crimes of obedience), Foucault (Discipline and punish), Sobsey (Violence and abuse in the lives of people with disabilities), Hughes ("Good people and dirty work"), Weber (Theory of social and economic organization), Bourdieu (Reproduction and Outline of a theory of practice), Mosca (The ruling class), Talmon (Origins of totalitarian democracy), Arendt (The orgins of totalitarianism) and others.

Three empirical generalizations stand out.

First, all forms of control (superordinate-subordinate relations, or what Max Weber called "forms of domination") involve an apparatus; i.e.,

1. A set of ideas that legitimize the relations and methods of domination.

2. A division of labor (often a bureaucracy) among persons running the apparatus; e.g., persons who find "deviant" persons and groups; who "test" suspects; who transport convicted or identified "deviants"; and who carry out "treatments."

Second, forms of domination are not necessarily overtly harsh or violent.  Orwell's 1984 and Huxley's Brave new world are examples of the difference between tyranny using force and tyranny operating under the guise of benevolence--sometimes called "friendly fascism."  Examples are found in the sort of psychotherapy that convinces vulnerable children that they have been subjected to satanic abuse by their parents and teachers.  Alexis de Tocqueville (Democracy in America) saw the same thing happening in the United States in the early 1800's.  I believe wide-spread portfolio assessment is the sort of thing de Tocqueville meant in the following lines.
 

It would seem that if despotism were to be established among the  democratic nations of our days, it might assume a different character; it  would be more extensive and more mild; it would degrade men  without tormenting them. (p. 335)
Above this race of men stands an immense and tutelary power, which takes upon itself alone to secure their gratifications and to watch over their fate.  The power is absolute, minute, regular, provident, and mild.  It would be like the authority of a parent if, like that authority, its object was to prepare men for manhood; but it seeks, on the contrary, to keep them in perpetual childhood: it is well content that the people should rejoice, provided they think of nothing but rejoicing.  For their happiness such a government willingly labors, but it chooses to be the sole agent and the only arbiter of that happiness; it provides for their security, foresees and supplies their necessities, facilitates their pleasures, manages their principal concerns, directs their industry, regulates the descent of property, and subdivides their inheritance: what remains but to spare them all the care of thinking and all the trouble of living.  Thus it every day renders the exercise of the free agency of man less useful and less frequent; it circumscribes the will within a narrower range and gradually robs a man of all the uses of himself.

Such a power does not destroy, but it prevents existence; it does not tyrannize, but it compresses, enervates, extinguishes, and stupefies a people, till each nation is reduced to nothing better than a flock of timid and industrious animals, of which the government is the shepherd. (pp. 336-337)  [Alexis de Tocqueville.  From Democracy in America, Volume II.  1840]

Third, forms of domination involve an obligation on the part of persons being tested or assessed to speak--to reveal their thoughts and feelings.  What is lost is not only the right to speak (e.g., to challenge the assessment) but the right to remain silent.

Silence--either the refusal to speak or reluctance to speak--are understood by those running the apparatus as signs of something to hide.  Both silence and incorrect thinking (religious, political, psychiatric, domestic, and now pedagogical) are punished by ostracism, imprisonment, loss of jobs, and sometimes physical violence.  Note that the details or the contents may vary from one situation to another (e.g., witch hunt, competency hearing), but the structure of domination, the ways it is legitimized, and its effects are exactly the same--one group increasingly controls another group.

Wide-scale portfolio assessment that is conceived, planned, and controlled by a small minority of officials, is an example of the loss of the right to remain silent; loss of the right simply to do one's job well and not have to reveal how one thinks or what one feels.  It is an example of the use of therapeutic and humanistic terms (reflection, authenticity, self-development) to make palatable--even desirable--what is in fact coercion to speak or be punished; i.e., not be licensed.

In addition to loss of the right to privacy, wide-scale portfolio assessment is a clear and present threat to minority groups.  It is well known, for example, that members of Asian cultures are embarrassed by the requirement to reveal personal information.  And it is well known that different cultures (e.g., African American) have different ways of revealing themselves and different histories that leave them better or worse prepared to satisfy the "reflection" rubrics created by individuals who do not share their culture.  Just as so-called IQ tests and other standardized (dominant-culture-biased) tests leave cultural minorities at a distinct test-taking disadvantage, there is every reason to believe that this will be true of wide-scale portfolio assessment.

Moreover, what evidence is there that the assessors are knowledgeable about and open to what are currently minority pedagogies?  Is it not fair to ask whether  teachers who strongly advocate focused or direct instruction on phonics and math will receive lower scores than teachers who express a more constructivist or allegedly "child-centered" philosophy?  And is it not fair to suggest that--when the pendulum swings once again (as it is already doing regarding reading)--that whole language teachers will be reluctant to reveal what may become a "deviant" orientation.  Even if the assessment process were "orientation-fair,"  assessees' knowledge of what is "in" and where they stand is likely to have a chilling effect on their experience of freedom to express themselves.

     VIII.  Summary

At the individual and mentor-mentee levels, portfolio creation and review are no doubt helpful.  They foster reflection, they guide improved practice, and they facilitate communication.  At the macro or wide-spread level, however, portfolio assessment transfers substantial power from individuals and local mentors to a small minority of functionaries in whose interests it is to increase the scope of portfolio assessment.  Moreover, as devices which require individuals to speak, and as devices whose scoring rubrics are clearly influenced by narrow pedagogical-cultural orientations, wide-spread portfolio assessment is a clear invasion of the right to privacy and a threat to minority cultures and pedagogical orientations.  In addition, there is no solid evidence that wide-scale portfolio assessment is valid with respect to scoring rubrics or has a comparative advantage over local, more empowering, and clearly less expensive forms of school-based assessment and teacher development.

References

Hambleton, R.K., Jaeger, R.M., Koretz, D., Linn, R.L., Millman, J., and Phillips, S.E., "Review of the Measurement Quality of the Kentucky Instructional Results Information System, 1991-1994," Office of Educational Accountability, Kentucky General Assembly, 20 June 1995: 4.

Koretz, D., Stecher, B., Klein, S., and McCaffrey, D, "The Vermont Portfolio Assessment Program," Educational Measurement: Issues and Practice, Fall 1994: 12-13. 

Pacific Research Institute (1999). Developing and implementing academic standards.  On-line at http://www.pacificresearch.org/pub/sab/educat/ac_standards/main.html