I. Introduction
Many persons and groups in education are dissatisfied
with standardized and/or more quantitative (objective) forms of assessment of
new teachers that involve focus on (e.g., direct observation of) specific
performances; e.g., how well and how often teachers correct errors and
how clearly teachers present information. These persons and groups favor
more qualitative, portfolio assessment, which allegedly provides new teachers
greater leeway in presenting a case for themselves by selecting
"evidences" of their skills and growth; e.g., lesson plans, summaries
of books they have read, and "reflections" on their teaching. Portfolio
assessment is offered as a form of assessment that may largely replace (e.g.,
in the case of teachers and principals) or supplement (in the case of ed students)
more objective forms of assessment. However, as with many educational reforms
over the past 100 years, it is likely that in the absence of careful
critical analysis and substantial pilot projects and field testing, wide-scale
(e.g., state level) portfolio assessment of new teachers will turn out to be far less useful, far more expensive,
and have more unanticipated and undesirable
consequences than anyone imagined. This paper is offered as a way
of encouraging discussion of wide-scale portfolio assessment.
II. The Argument
Clearly, portfolios appeal to many persons. Portfolios are consistent with these persons' theories of learning, pedagogical preferences, and styles of expression and self-development. This is well and good at the individual level or in the relationship between individuals and mentors. However, when portfolio creation and assessment are used on a wide scale (e.g., by state education agencies), and when the portfolio process is controlled by one group (a "dominant minority" to use Arnold Toynbee's term) for use with another group (a "subordinate majority"), then portfolio assessment is no longer merely a matter of individual preference. Sociological and political issues must be considered, as must the examination of the validity of the entire portfolio assessment process and its details.
Research on and a critical examination
of wide-scale (e.g., state level) portfolio assessment of new teachers must address the following propositions.
1. The wide-scale use of
portfolios is enormously expensive and organizationally unwieldy.
2. Portfolio assessment has neither sufficient face validity, criterion validity, nor pragmatic validity.
3. Portfolios have not been shown to be more valid, more feasible, and less costly than existing school-based assessment methods.
4. Wide-scale portfolio assessment disempowers individuals and groups at the local level and increasingly empowers dominant minorities who control the assessment process. Note that wide-scale portfolio assessment was created by the very groups who benefit most (social position, power) from its wide-scale use.
5. Wide-scale portfolio assessment
is an unwarranted invasion of individual privacy and a clear and present
danger to individual liberties and to the civil liberties of minority groups.
III. The Wide-Scale Use of Portfolios is Expensive and Unwieldy
The wide-scale use of portfolio assessment of
new teachers means that every year state Departments of Public Instruction will have
to process tens of thousands of portfolios created by students and by teachers
seeking initial licensure. These portfolios will have to be stored,
mailed from one assessor to another, and mailed back to assessees.
Assessors will have to be selected and trained and be subjected to frequent
reliability checks. The assessment of any portfolio must require
at least several hours. These facts will require the development
of a large and costly assessment apparatus, without which there will surely
be chaos. Indeed, considering the enormous number of portfolios every year, it
is doubtful that the job can be done at all.
IV. Portfolio Assessment Is Neither Validated Nor Validatable
The portfolio assessment process rests on: (1) the validity of scoring rubrics; (2) the validity of scorings; and (3) the validity or usefulness of portfolio assessment compared to other forms of assessment. In all three ways, wide-scale portfolio assessment either has weak validity or is unvalidated.
Questionable Validity of Scoring Rubrics
There are at least three sorts of validity
with respect to scoring rubrics: face validity, criterion validity and
pragmatic validity.
1. Face validity is the subjective judgment (or intersubjective judgments) of rubric creators that an item (measure) in a rubric actually measures what it purports to measure. Face validity requires that words in items are sufficiently clear that their meaning (referent) is obvious. For example, hitting another person probably has face validity as an indicator or measure of the concept "aggression." However, many items or measures in portfolio scoring rubrics do not have face validity. In view of the many "approaches," "theories," and "perspectives" in education, it is doubtful that terms such as "learning," "classroom management," and "developmentally appropriate," to name a few, have common meaning to assessors and to the individuals (e.g., teachers seeking licensure) who are creating portfolios guided by these terms. Moreover, there are no reported studies of the face validity (e.g., inter-assessor judgments) of the concepts used in portfolio rubrics.
2. Criterion validity is the extent to which a new (candidate) instrument, measure, or item is correlated with a widely used, already-validated and generally standard measure. For example, a standard measure of upper-body strength is the number of repetitions of bench presses a person can do with certain amounts of weight. The criterion validity of a new candidate measure (e.g., the density of fiber bundles in chest muscles) would be indicated by the degree of correlation between the number of bench presses persons do and the density of muscle fiber bundles. If the correlation is low, we judge the candidate measure to have little validity.
However, there are no substantial studies
of the criterion validity of items in portfolio rubrics. Moreover,
criterion validation is simply impossible for a large percentage of items.
This is because the items are actually hypothetical constructs, psychological
dispositions, or what some philosophers call "mental predicates."
In other words, they cannot be measured. Therefore, one cannot determine
their correlation with standard measures. Examples include dispositions
to consider the whole child, attitudes toward life-long learning,
reflectiveness, and openness to new
ideas.
3. Pragmatic validity is
perhaps the most important test of portfolio assessment. After all,
the whole point is to determine how well a new teacher (for example) teaches.
Yet, there are no significant studies (indeed, there do not appear to be
any studies) of the extent to which scores on any part of the portfolio
assessment, or scores on the portfolio overall, predict or are correlated
with the achievement of a teachers' students. Naturally, portfolio
scores ought to predict many other things that good teachers and principals
do. But there is no validation of portfolios as predictors of these,
either. In other words, there is little if any validation of the
extent to which portfolio assessment serves its stated function (Hambeton
et al., 1995; Koretz at al., 1994; Pacific Research Institute,
1999).
V. Portfolios Are of Little Use
In addition to the questionable validity of scoring rubrics, there is the question of the comparative validity or usefulness of portfolios. However, no substantial studies have been done of the extent to which portfolio assessment provides more information, more useful information, or more pragmatically valid information (Hambeton et al., 1995; Koretz at al., 1994; Pacific Research Institute, 1999) than:
1. The current method of (for example) assessing and improving initial teacher performance by year-long interaction, supervision, and coaching by school principal and mentors; or
2. An enriched method of this sort of assessment in which principals and mentors are trained to supervise, coach and provide formative and summative evaluation.
In the absence of solid data showing
that portfolio assessment (by state agency personnel who never observe
assessees) is more valid and provides more information than current or
easily-enriched local forms of assessment, there is little reason to replace
current forms of assessment with wide-scale and expensive portfolio assessment.
VI. Portfolio Assessment Is Disempowering
The case for portfolio assessment often turns on an appeal to self-empowerment. Portfolios are supposed to enable persons to express themselves in a way that is comfortable for them and without intimidating scrutiny by supervisors. This of course is laudable and highly desirable. At the individual and interpersonal (mentor-mentee) levels, portfolio creation and evaluation no doubt have these beneficent and beneficial effects. However, at the wide-scale level, where sociological and political processes come into play, portfolio assessment reduces the power of individuals, mentors, and school principals, and shifts power to persons and agencies who created and who manage the assessment apparatus.
First, the items to be assessed and the scoring rubrics were not created by persons being assessed or their representatives. This is an obvious example of alienation, disempowerment and "loss of voice." How can it be said that portfolios reveal and encourage self-reflection and self-development when the scoring rubric and portfolio instructions tell assessees what to write about and how they will be assessed?
Second, school principals may still be
able to hire teachers and to write letters of recommendation, and mentors
may still be able to observe and meet continually with assessees, but the
major decision-making power is shifted to the agencies who have created
and manage the assessment apparatus. This is an example of disempowerment
and loss of voice--in the very persons who work most closely with assessees
and who, logically, ought to have the most compelling voice.
VII. Wide-scale Portfolio Assessment is an Invasion of Privacy and a Threat to Individual Liberties and to the Civil Liberties of Minority Groups
This assertion may strike the reader as hyperbole. However, if the reader will bear with me, I believe he or she will find the argument tenable. I have spent at least 20 years studying despotism, totalitarian societies, witch hunts, religious and political persecution, spouse and child abuse, and the abuse of "inmates" in nursing homes, mental hospitals and other "total institutions." Informative works include Reiff (Triumph of the therapeutic), Henry (Pathways to madness), Solzhenitsyn (The Gulag archipelago), Huxley (The devils of Loudon), Frank (Persuasion and healing), Gubrium (Living and dying at Murray Manor), Glasser (Prisoners of benevolence), Emberely (Values education and technology), Kelman (Crimes of obedience), Foucault (Discipline and punish), Sobsey (Violence and abuse in the lives of people with disabilities), Hughes ("Good people and dirty work"), Weber (Theory of social and economic organization), Bourdieu (Reproduction and Outline of a theory of practice), Mosca (The ruling class), Talmon (Origins of totalitarian democracy), Arendt (The orgins of totalitarianism) and others.
Three empirical generalizations stand out.
First, all forms of control (superordinate-subordinate relations, or what Max Weber called "forms of domination") involve an apparatus; i.e.,
1. A set of ideas that legitimize the relations and methods of domination.
2. A division of labor (often a bureaucracy) among persons running the apparatus; e.g., persons who find "deviant" persons and groups; who "test" suspects; who transport convicted or identified "deviants"; and who carry out "treatments."
Second, forms of domination are not necessarily
overtly harsh or violent. Orwell's 1984 and Huxley's Brave
new world are examples of the difference between tyranny using force and tyranny operating under the guise of
benevolence--sometimes called "friendly fascism." Examples are found in the
sort of psychotherapy that convinces vulnerable children that they have
been subjected to satanic abuse by their parents and teachers.
Alexis de Tocqueville (Democracy in America) saw the same thing happening in the United States in the
early 1800's. I believe wide-spread portfolio assessment is
the sort of thing de Tocqueville meant in the following lines.
It would seem that if despotism were to be established among the democratic nations of our days, it might assume a different character; it would be more extensive and more mild; it would degrade men without tormenting them. (p. 335)
Above this race of men stands an immense and tutelary power, which takes upon itself alone to secure their gratifications and to watch over their fate. The power is absolute, minute, regular, provident, and mild. It would be like the authority of a parent if, like that authority, its object was to prepare men for manhood; but it seeks, on the contrary, to keep them in perpetual childhood: it is well content that the people should rejoice, provided they think of nothing but rejoicing. For their happiness such a government willingly labors, but it chooses to be the sole agent and the only arbiter of that happiness; it provides for their security, foresees and supplies their necessities, facilitates their pleasures, manages their principal concerns, directs their industry, regulates the descent of property, and subdivides their inheritance: what remains but to spare them all the care of thinking and all the trouble of living. Thus it every day renders the exercise of the free agency of man less useful and less frequent; it circumscribes the will within a narrower range and gradually robs a man of all the uses of himself.Third, forms of domination involve an obligation on the part of persons being tested or assessed to speak--to reveal their thoughts and feelings. What is lost is not only the right to speak (e.g., to challenge the assessment) but the right to remain silent.Such a power does not destroy, but it prevents existence; it does not tyrannize, but it compresses, enervates, extinguishes, and stupefies a people, till each nation is reduced to nothing better than a flock of timid and industrious animals, of which the government is the shepherd. (pp. 336-337) [Alexis de Tocqueville. From Democracy in America, Volume II. 1840]
Silence--either the refusal to speak or reluctance to speak--are understood by those running the apparatus as signs of something to hide. Both silence and incorrect thinking (religious, political, psychiatric, domestic, and now pedagogical) are punished by ostracism, imprisonment, loss of jobs, and sometimes physical violence. Note that the details or the contents may vary from one situation to another (e.g., witch hunt, competency hearing), but the structure of domination, the ways it is legitimized, and its effects are exactly the same--one group increasingly controls another group.
Wide-scale portfolio assessment that is conceived, planned, and controlled by a small minority of officials, is an example of the loss of the right to remain silent; loss of the right simply to do one's job well and not have to reveal how one thinks or what one feels. It is an example of the use of therapeutic and humanistic terms (reflection, authenticity, self-development) to make palatable--even desirable--what is in fact coercion to speak or be punished; i.e., not be licensed.
In addition to loss of the right to privacy, wide-scale portfolio assessment is a clear and present threat to minority groups. It is well known, for example, that members of Asian cultures are embarrassed by the requirement to reveal personal information. And it is well known that different cultures (e.g., African American) have different ways of revealing themselves and different histories that leave them better or worse prepared to satisfy the "reflection" rubrics created by individuals who do not share their culture. Just as so-called IQ tests and other standardized (dominant-culture-biased) tests leave cultural minorities at a distinct test-taking disadvantage, there is every reason to believe that this will be true of wide-scale portfolio assessment.
Moreover, what evidence is there that
the assessors are knowledgeable about and open to what are currently minority
pedagogies? Is it not fair to ask whether teachers who strongly
advocate focused or direct instruction on phonics and math will receive
lower scores than teachers who express a more constructivist or allegedly
"child-centered" philosophy? And is it not fair to suggest that--when
the pendulum swings once again (as it is already doing regarding reading)--that
whole language teachers will be reluctant to reveal what may become a "deviant"
orientation. Even if the assessment process were "orientation-fair,"
assessees' knowledge of what is "in" and where they stand is likely to
have a chilling effect on their experience of freedom to express themselves.
VIII. Summary
At the individual and mentor-mentee levels,
portfolio creation and review are no doubt helpful. They foster reflection,
they guide improved practice, and they facilitate communication.
At the macro or wide-spread level, however, portfolio assessment transfers
substantial power from individuals and local mentors to a small minority
of functionaries in whose interests it is to increase the scope
of portfolio assessment. Moreover, as devices which require individuals
to speak, and as devices whose scoring rubrics are clearly influenced by narrow pedagogical-cultural orientations, wide-spread portfolio
assessment is a clear invasion of the right to privacy and a threat to
minority cultures and pedagogical orientations. In addition, there
is no solid evidence that wide-scale portfolio assessment is valid with
respect to scoring rubrics or has a comparative advantage over local, more
empowering, and clearly less expensive forms of school-based assessment
and teacher development.
References
Hambleton, R.K., Jaeger, R.M., Koretz, D., Linn, R.L., Millman, J., and Phillips, S.E., "Review of the Measurement Quality of the Kentucky Instructional Results Information System, 1991-1994," Office of Educational Accountability, Kentucky General Assembly, 20 June 1995: 4.
Koretz, D., Stecher, B., Klein, S., and McCaffrey, D, "The Vermont Portfolio Assessment Program," Educational Measurement: Issues and Practice, Fall 1994: 12-13.
Pacific Research Institute (1999). Developing and implementing academic standards. On-line at http://www.pacificresearch.org/pub/sab/educat/ac_standards/main.html