Internet-Based Assessment 2002 - 2004
This paper focuses on pedagogy, and the improvement of teaching, learning and assessment. It analyses recent developments in test theory, namely the attention given to assessment as a support for learning. In particular it evaluates the work of Wiliam, Torrance, Black, Linn and Shepard. It draws attention, for instance, to discussion on the difference between formative and summative assessment, high stakes and low stakes testing and divergent and convergent assessment.
The paper has three sections:
1. The first sector discusses the role of assessment in the recent reform of higher education in Sweden.
2. The second sector discuss the literature cited above and
3. The third sector discusses testing, assessment and examinations as social practices to promote rather than control learning.
Overall the paper explore possibilities for introducing the new testing in
Swedish higher education.
In the 1930s and 1940s less than one percent of the Swedish population studied at universities and institutes of higher education in Sweden, compared with about 40 percent in 1999. This difference meant that there were 11 000 students in the beginning of 1940 and over 300 000 by the end of the century.
In the 1990s universities student-staff ratios increased from 10:1 to 15:1; whereas the number of students during the period increased by 86 percent, the number of staff increased by only 17 percent.
Such changes in student-staff ratios have changed the working patterns in universities. Students have problems getting feedback from the teachers and Universities report growing problem of stress and fatigue among staff.
Some authors claim that there is a need of a a 'paradigm shift' in assessment, from what they call the current assessment paradigm to the 'problem-solving paradigm' a shift from a testing culture to an assessment culture, associated with a shift from psychometrics to the assessment of learning (Gipps, 1994, chapter 9, Shinn & Good III, 1993 in Black & Wiliam, 1998, p. 45).
Differences in these paradigm can be demonstrated with three distinctions 1) formative- and summative assessment, high- and low stakes testing and 3) divergent and convergent assessment.
We can think about the function of assessment partly, at a methodological level, as goals of the assessment and partly, in a sociological or pedagogical context, as roles of assessment (Scriven, 1967). The terms formative and summative first appeard in Scrivens article "The Methodology of Evaluation" 1967. Robert Stake has defined the difference in culinary terms: "when the cook tastes the soup, that´s formative evaluation; when the guest tastes it, that's summative evaluation" (Scriven, 1991).
But how does assessment fit the distinction between formativ and summative? Black & Wiliam (1998) define formative assessment as follows:
all those activities undertaken by teachers, and/or by their students, which provide information to be used as feedback to modify the teaching and learning activities in which they are engaged. (p. 2)
If teachers make an effort to develop formativ assessment it can take the teacher closer to the students learning and give the students a more active role in their own learning (Black, 2001). Further, assessment must be seen as a moment of learning, and students have to be active in their own assessment and to picture their own learning in the light of an understanding of what it means to get better (Black & Wiliam, 1998, p. 22).
Black & Wiliam. (p 14) assert that the core of the activity of formative
assessment lies in the sequence of perception by the learner of a gap between
a desired goal and his or her present state and is the action taken by the learner
to close that gap in order to attain the desired goal.
The practices of formative assessment and feedback overlap. Because of its centrality in formative assessment, it is important to explore and clarify the concept of feedback (Black & Wiliam, 1998, p. 39).
Ramaprasad (1983) defines feedback as information about the gap between the actual level and the reference level of a system parameter which is used to alter the gap in some way (p. 4). For feedback to exist, the information about the gap must be used to alter the gap. If the information is not actually used in altering the gap, then there is no feedback (Black & Wiliam, 1998, p. 39).
If the term feedback is used to refer to any information that is provided to the performer of any action about that performance the actual performance can be evaluated either in its own terms, or by comparing it with a reference standard. Adopting the definition proposed by Sadler (1998), we would argue that the feedback in any assessment serves a formative function only through diagnosis (what do I need to do to get there?). In other words, assessment is formative only when comparison of actual and reference levels yields information which is then used to alter the gap. As Sadler remarks, 'If the information is simply recorded or is too deeply coded (for example, as a summary grade given by the teacher) to lead to appropriate action, the control loop cannot be closed' (Sadler, 1998, p. 121) the assessment might be formative in purpose but it would not be formative in function. This suggests a basis for distinguishing formative and summative functions of assessment (Black & Wiliam, 1998, p. 45).
The type of feedback that a task might generate, require a cognitive theory which can inform the link between learners' understanding and their interactions with assessment tasks, in the light of which assessment activities can be designed and interpreted. The conclusion might be the need of collaboration between psychometricians, cognitive scientists and subject experts (Black & Wiliam, 1998, p. 31).
Assessment is one of the most potent forces influencing what teacher should concentrates to teach and what students should concentrates to learn. The assessment sends a message to the students what is important to learn.
Amrein and Berliner defines high stakes assessment in the following way:
Themore important that any quantitave social indicator becomes in social decision-making, the more likely it will be to distort and corrupt the social process it is intended to monitor (2002).
High stakes testing, as indicated, has social consequences that may not always be desireable (see below). There is a relationship between high-stakes testing and summative testing. Summative assessment is 'high stakes' if it has important consequences. Typically, it affects the life chances of students. On the other hand low-stakes assessment does not have the same implications. It is a softer form of assessment which, for instance, is used to guide teaching and learning, rather than to make possibly irreversible decisions about different educational journeys that students can take.
A problem with high-stakes assessment is that it tend to inflate the impression of students achievement. High-stakes testing programs frequently result in improved test score, but such improvement does not necessarily imply a rise in the quality of education or a better educated student population (Moss, 1992). High-stakes assessment may have undesirable effects on teaching and learning. It tends to lay too much stress on basic skills and also led to a narrowing of the curriculum. High-stakes assessment also tend to drive the teachers to teach against the test. It can also imply that the teachers make their own curriculum that is closer to the test. A result of these unintended negative effects of high-stakes assessment is that their has started a discussion of the need to change the nature of assessment. A move to more classroom assessment and close it nearer the student and the students learning (Linn, 1998).
Test security is also an important issue in all high-stake assessment. This is not a problem in low-stake assessment. Low-stake assessment can be offered to the students at the place and time that is convenient for them. Students can be assessed at their desks or from their portables over a mobile phone line. It is not necessary to drag students into the classroom just to assess them. Less time and money wasted on travel (Kleeman, 1998).
To enhance the positive impact of assessment and minimize its negative effects Linn (1998) suggest that it is a necessity that have a various ways to assess the students. The teachers cant lean against one high-stake test when they judge the students. It is important to use multiple indicators when judging the students which will increases the validity. A key, for long term success, is to create a system for evaluating both the intended positive effects and the more likely unintended negative effects of the assessment system.
Convergent and divergent assessment arise fromteacher´s differing views of learning and the relationship of assessment to the process of intervening to support learning(Torrance & Pryor, 1998, p 153).
The key issue in convergent assessment is to find out whether the student has a predetermined specific kind of knowledge, understanding or skills. It focuses on students knowledge, understanding and skills in proportion to the curriculum. It uses tick-lists and can-do statements. It prefers pseduo-open questioning and focusing on contrasting errors with correct responses. Assessment with these characteristics can be described as behaviouristic, it has a pupose to assess in a linear way and the assessment is of the student and executed by the teacher (Torrance & Pryor, 1998).
Divergent assessment has students' understanding in focus. Divergent assessment has an aim to find out what a student know or can do The assessment is performed by the teacher and the student together. It is characterize by flexible planning, open forms of recording, emphasizing the learners understanding, open tasks, open questioning and descriptive, qualitative feedback. Divergent assessment strives towards teaching in the zone of proximal development. The theoretical inference from this is that divergent assessment has a social constructivist view of education (Torrance & Pryor, 1998).
Assessment , is far from being merely a technical process. Rather, it is deeply implicated, and may have serious conseqvences for the lives of those it touches (Johnston et al., 1995, p. 359 in Black & Wiliam, 1998, p. 12).
This point is clearly made by Messick:
Once it is denied that the intended goals of the propsed test use are the sole basis for judging worth, the value of the testing must depend on the total set of effects it achieves, whether intended or not (Messick, 1989, p. 85 ).
Linn has made a similar point, arising from his work on high stakes assessment:
As someone who has spent his entire career doing research, writing, and thinking about educational testing and assessment issues, I would like to conclude by summarising a compelling case showing that the major uses of tests for student and school accountability during the last 50 years have improved education and student learning in dramatic ways. Unfortunately, this is not my conclusion. Instead, I am led to conclude that in most cases the instruments and technology have not been up to the demands that have been placed upon them by high stakes accountability. Assessment systems that are useful monitors lose much of their dependability and credibility for that purpose when high stakes are attached to them. The unintended negative affects of the high stakes accountability uses often outweigh the intended positive effects. (Linn, R.L. 2000, p. 14).
The assessment processes are, at heart, social processes, taking place in social settings, conducted by, on and for social actors. There are (largely implicit) expectations and agreements evolved between students and teachers. A particular feature of such contracts is that they serve to delimit 'legitimate' educational activity by the teacher. For example, in a classroom where the teacher's questioning has always been restricted to 'lower-order' skills, such as the production of correct procedures, students may well see questions about 'understanding' or 'application as unfair, illegitimate or even meaningless (Schoenfeld, 1985 in Black & Wiliam, 1998, p. 47). Thus, all testing has to take account of these social phenomena in the design and administration of assessments.
The increasing numbers of students in Sweden makes it difficult for teachers to mark and return individual feedback to the students. The use of formative online assessment, however, allows students to test out their knowledge and get immediate feedback. However there is a danger that the students look at the result and the feedback as a confirmation of their adequate understanding, rather than like a way to discover their weaknesses (Judge, 1999).
How can we use information technology to transfer learning? Teachers can use systems that provide them with tools for analysing and tracking students responses. Teachers can help students with problems. The strength with this tool is that it can promote students learning. The use of online assessment has the advantage of enabling student responses to be marked and analysed with relative ease and speed. Properly designed online assessment allows students to test their knowledge of a topic and get immediate feedback. Important questions remain, however, about how and whether students organize, structure, and use this information in context to solve more complex problems (Miller, 1999).
There is an ongoing discussion about learning, just as there are competing theories of learning. Methods of assessment are also determined by our beliefs about learning. In this paper we suggest that assessment can support learning as well as measure learning. An important purpose of using online assessment is the possibility of giving students immediate feedback on their understanding of course material (Judge, 1999).
It is important that the student feedback is of a high quality to enhance the learning process. Students need not only feedback on how well they have done but also on what they haven't understood. The also need help to improve their understanding (Ramsden, 1992).
Much attention has been given to ICT as a solution to this problem surrounding teaching and learning in Swedish universities. As late as 1985 computers were used only by a small elite for word processing and simple calculation. Fifteen years later more than 50 percent of the Swedish people have access to the Internet at home or at work. In turn, new forms of on-line assessment have been proposed as a solution to the problem described above and are being examined in a EU project co-ordinated for Umeå university (see www.onlineassessment.nu).
This paper has pointed to problems in Sweden Higher Education. Sweden is committed to the Learning Society and the extension of access to education and knowledge. Historically examination have been used to separate successful and unsuccessful students. The question is whether examinations can undergo a paradigm shift. Can they be used to support learning for all students. International research on examination, discussed in this paper, has raised the same question. Is it possible to replace the assessment or 'audit' society (Power, 1999) with the learning society?
Amrein, A. L., & Berliner, D. C. (2002). High-stakes testing, uncertainty,
and student learning. Educational Policy Anlysis Archives, 10(18).
Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education: Principles, Policy & Practice, 5(1), 7-74.
Black, P. (2001). Dreams, strategies and systems: Portraits of assessment past, present and future. Assessment in Education: Principles, Policy & Practice, 8(1), 65-85.
Judge, 1999, The production and use of on-line Web quizzes for economics. http://www.ilrt.bris.ac.uk/ctiecon/cheer/ch13_1/ch13_1p21.htm#Walstead
Kleeman, J. (1998). Now is the time to computerize pen and paper tests!. A white paper by John Kleeman MA MBCS C.Eng Managing Director and Founder of Question Mark Computing Ltd. http://www.qmark.com/company/1998paper.html
Linn, R. L (1998) Assessments and accountability, CSE Technical Reprot 490, CREEST/University of Colorado at Boulder.
Linn, R.L. (2000) Assessments and accountability. Educational Researcher, 29, 4-16 (http://www.aera.net/pubs/er/pdf/vol29_02/9403AERA004_016a.pdf)
Messick, S. (1989) Validity. in R. LINN (ed) Educational Measurement (3rd edition, 13-103) (New York, Macmillan).
Miller, K. (1999). Which assessment type should be encouraged in professional degree courses - continuous, project-based or final examination-based? In K. Martin, N. Stanley and N. Davison (Eds), Teaching in the Disciplines/ Learning in Context, 278-281. Proceedings of the 8th Annual Teaching Learning Forum, The University of Western Australia, February 1999. Perth: UWA. http://cleo.murdoch.edu.au/asu/pubs/tlf/tlf99/km/miller.html
Moss, P. A. & Herter, R. J. (1992). Assessment, Accountability, and Authority in Urban Schools. http://www.longterm.mslaw.edu/Long%20Term%20View/Vol%201%20No.%204/mossherter.PDF
Power, M. (1999). The Audit Society: Rituals of verification. Oxford: Clarendon Press.
Ramsden, P. (1992). Learning to teach in higher education. London: Routledge
Sadler, D. R. (1998). Formative assessment: Revisiting the territory. Assessment in Education, 5(1), 77-84.
Scriven, 1967. The Methodology of Evaluation in Perspectives of Curriculum Evaluation in Perspectives of Curriculum Evaluation. Rand McNally & Company. Chicago
Scriven, M (1991) Beyond Formative and Summative Evaluation in Evaluation and Education: At Quarter Century, pp19-64. The National Society for the Study of Education.
Torrance & Pryor, (1998). Investigating Formative Assessment, Open University Press, Buckingham, Philadelphia