Dr. Gavin Brown and Prof. John Hattie
University of Auckland, New Zealand
Many states in the USA and many nations throughout the world have introduced some form of state-wide or ‘National’ Testing. While there are examples of innovative methodologies (e.g., Kentucky, Maryland, Queensland), most systems employ a variant of the following (but not always in this order of implementation):
a. Create or agree to a ‘National/State’ Standard Course of Study/Curricula;
b. Create ‘National/State’ Tests oriented to a set of outcomes (e.g., numeracy and reading) based on these curricula;
c. Administer these tests to “all” students (usually under test-like conditions on a certain day(s) of the school year, and consider which students should be “accommodated” to be in this “all”);
d. Score, analyse, and distribute a report form to students, parents, schools, and/or Government;
e. Expect that teachers and students will learn from these reports and thence improve the quality of teaching, coverage, abilities, learning, and attitudes;
f. Expect that the reports will lead to the public having renewed and greater confidence in the public school system and support additional resources being spent on such a value-added service.
Darling-Hammond (2003, p. 1) aptly summed up the goals of these ‘National’ models of assessment:
“Advocates hoped that standards outlining what students should know and be able to do would spur other reforms that mobilize resources for student learning, including high quality curriculum frameworks, materials, and assessments tied to the standards; more widely available courser offerings that reflect this high quality curriculum; more intensive teacher preparation and professional development guided by related standards for teaching; more equalized resources for schools; and more readily available safety nets for educationally needy students.”
Linn (2000) has argued that tests are used for this purpose because (a) they are relatively inexpensive to implement, (b) they can be externally mandated, (c) testing and assessment changes can be rapidly implemented, and (d) testing results easily reported to the public.
National assessment programs based on these principles have been shown to have significant and usually deleterious impacts (Hamilton, 2003) on curriculum (Darling-Hammond & Wise, 1985; Herman & Golan, 1993), teachers (Firestone, Mayrowetz, & Fairman, 1998; Smith & Rottenberg, 1991), and teaching (Shepard & Dougherty, 1991 – see also Darling-Hammond, 2003; Klein, Hamilton, McCaffrey, & Stetcher, 2000; Koretz & Barron, 1998; Koretz, Linn, Dunbar, & Shepard, 1991; Linn, 2000; Linn, Graue, and Sanders, 1990; Stetcher, Baron, Kaganoff, & Goodwin, 1998). Though it should be noted that some evidence of positive consequences is beginning to be reported (e.g., Cizek, 2001; Monfils, Firestone, Hicks, Martinez, Schorr, & Camilli, 2004).
We have argued that there are eight principles for building an excellent system of ‘national’ assessment that can help maximise the advantages and minimise the disadvantages of national assessment (Hattie, Brown, & Keegan, 2003). The principles assert that ‘national’ assessment should (a) mirror important rich ideas, (b) make rich ideas rather than items dominant, (c) have low-stakes consequences, (d) use more than tests to communicate standards, (e) ensure ‘national’ compatibility information is available, (f) ensure that teachers value it as part of teaching, (g) assess what is taught, and (h) provide meaningful feedback to all participants. It is the system that we have devised and implemented in New Zealand in conjunction with the NZ Ministry of Education—Assessment Tools for Teaching and Learning (asTTle)—that we wish to use as the basis for our conceptualization of how school-based assessment could be introduced in developing and under-developed nations.
asTTle is a software application that has been released to all NZ schools for free on a voluntary usage basis for the implementation of standardized, yet teacher-controlled and school-managed classroom assessment. asTTle, at Ministry requirement, generates a 40 minute test of any one subject customized to the teacher’s priorities in terms of curriculum content and difficulty. Once student, school, and question performance data are entered into asTTle, teachers and administrators have a wide range of graphical reports by which they can interpret student performance against norms, criteria, and standards and which, in conjunction with a website, can identify appropriate teaching resources. This system supports diagnostic, formative, and summative interpretations and gives teachers feedback as to priorities for teaching and learning activities and reporting to parents, students, administration, and government. This resource is an example of SBA that meets the requirements of accountability at the national level while providing improvement information at the school level. What lessons have been learned from the design and implementation of this system that could be used in implementing SBA in developing and under-developed countries?
First, SBA tools need to be clearly aligned to the jurisdiction’s required curriculum statements so that teachers, students, and parents know exactly what is being taught and learned; otherwise, SBA functions as a proxy for general ability or intelligence. In asTTle we did this through curriculum mapping, item signatures, and an interface for teacher-controlled test assembly.
Curriculum mapping is a process by which curriculum specialists and experts analyse and describe the rich ideas underlying the content being taught (in asTTle there are no more than 8 such big ideas) and note the major developmental or learning signposts by which progress in those ideas can be identified (see asTTle Technical Reports #4 & 34 for reading, #6 & 37 for writing, #11 & 36 for mathematics, #13 & 38 for pängarau, and #23 & 39 for panui & tuhituhi).
Item signatures are descriptions of the important educational and technical characteristics of the assessment materials contained within the asTTle software. See asTTle Technical Reports #12, 16, 25, and 28 for description of how these were conducted. An item’s signature or profile in asTTle is a relational database that contains:
a) important curriculum related information about the item:
a. curriculum rich idea,
b. achievement objective,
c. curriculum process (if required),
d. cognitive process (using the SOLO taxonomy—see asTTle Technical Report #43), and
e. Relationship to international curriculum categories (e.g., PIRLS, PISA, TIMSS, etc.).
b) Important psychometric information about the item:
a. difficulty (logit),
c. pseudo-chance value (if multiple-choice),
d. response format,
e. percentage correct at each year level, and
f. curriculum level (both as designed and actual through standard setting process).
c) Important item history information:
b. Answer rules,
e. Status (in use, in development, retired, etc)
f. Answer image (if required),
g. Stimulus material (if required),
h. Source of stimulus material (if required), and
i. Order of presentation within a testlet (if required).
All of this information is used within the asTTle software to create tests and to report the meaning of student performance. For example, teachers get the power to create a standardised test using curriculum rich ideas and curriculum level difficulty. asTTle generates reports that show performance of students by achievement objective, curriculum process, cognitive process, and curriculum level. Future versions of asTTle, currently on the drawing board, will allow teachers the ability to refine the test by type of stimulus material, to select out or in certain items, and even specify item response format.
Second, for SBA to be powerful to students, teachers, parents, and administrators, it must be calibrated to criteria, norms, standards, and progress. Such calibration requires creation of large banks of valid and valued assessment tasks and items, trialled with representative samples of students, and scored using item response theory. asTTle contains six banks of test questions in two languages (English and Maori): reading (1600 items), writing (60 prompts), mathematics (1500 items), panui (600 items), tuhituhi (31 prompts), and pängarau (600 items). All of these items have been calibrated on the performance of thousands of students (totally currently over 90,000 students) in Years 4 to 12 (ages 8 to 17) and calibrated against the achievement objectives of the national curriculum statements for those subjects at Levels 2 to 6 inclusive. Calibration is done through item response theory statistical analyses to locate items relative to each other using large spa*** matrices of data—this avoids the need for all students to do all items, provided around 250 responses per item are available. Tests of item quality are conducted to ensure that:
(1) students of equal ability are not discriminated against if they are of a different ethnicity or sex (DIF—no systematic bias discovered in asttle reading for sex or major ethnic groups);
(2) chance factors do not confound interpretation of student performance (3 PL IRT pseudo-chance factor identification—determined to be about .18 on asTTle’s 4 option multiple-choice questions);
(3) items discriminate consistently in favour of students who actually have more ability than those with less (2 PL IRT discrimination index—items with low values are eliminated or revised before publication);
(4) wrong options in selected response items perform appropriately (CTT distractor analyses allow elimination or correction of distractors and confirmation of keys before publication);
(5) items assess their intended location in the curriculum difficulty spectrum (standard setting through the ‘bookmark’ procedure—see asttle technical report #22 and asttle V4 Manual); and
(6) the items measure what they claim to measure (multiple validity checks are carried out through review panels, item proof readers, teacher administrators, readability analyses, sensitivity reviews, item signature studies described above, software that validates item information to ensure it complies with rules for items and testlets).
Based on these calibration procedures, users can have confidence that the items are of high quality, that the reports produce valid information about strengths and weaknesses of students, and that alternative explanations for student performance can be discounted. asTTle uses this information to generate IRT based reports (traditionally called ‘kidmaps’) that show student performance on achievement objectives based on the difficulty of the item relative to the student’s overall performance and based on whether the item was responded to correctly or not—in this way, teachers get a report on strengths, achievements, to be achieved, and gaps in a student’s learning of curriculum (see asTTle Technical Report #15 for details). Further, given the large sample of students on which the items were trialed, it is possible to report student performance relative to that of similar students—like with like comparisons help teachers make appropriate interpretations. In other words, if a student is performing below the average, it may be possible to excuse this on the basis that low scores are always associated with students of that kind. With a like-with-like comparison, performance below the average can no longer be excused—the teacher must accept that on average students like mine do better elsewhere. This looking-in-the-mirror effect focuses the teacher’s mind on what the teacher must do differently or better rather than on blaming or excusing the learner for below average performance.
Additionally, this reporting system allows comparison of schools to similar schools without the invidious effects of ‘league tables’; asTTle currently compares schools based on socio-economic status, region, size, ethnic mix, and sector (see asTTle Technical Report #14 for description of how this was done in New Zealand). Again comparing the performance of my students in my school to that of other similar schools focuses the administrator’s mind not on excusing or blaming poor performance based on characteristics of the school population, but rather on identifying in what areas his school might benefit from consulting with other schools who are doing a better job with the same type of students. This comparative information (norm-referenced interpretations) helps identify whether a learning objective or a group of learners represent a matter of concern or pride—but certainly without the criterion-referenced interpretations supported by the IRT diagnostic and the standards-based interpretations supported by the curriculum levels analyses, most users of assessments become complacent about rank-order scores. Further, the expense of asTTle items and software could not be justified if all that were provided to users were a rank-order score.
Third, teacher-administrators of SBA much be given choice over what is assessed, how hard the assessment is, when the assessment takes place, who is given which assessment, what interpretations should be made, and what actions should be taken; in this way, validity is ensured. The great criticism of externally-mandated, centrally-controlled national assessment is the poor fit of the test to the local school context. Teachers treat the test and scores as invalid if the material is “too hard”, “taught last year”, “taught next term”, “too easy” (see asTTle Technical Reports #27, 33, 42, 44, 46, & 49 for New Zealand teacher feedback on this issue). Another facet to this criticism is the impression that standardised testing requires everyone to take the same test (‘one size fits all’). Clearly, not all students in any class of 30 or more students are at the same level of ability in any subject—thus, one size does not fit all—teachers and students deserve a mechanism by which student abilities can be custom-fitted to the difficulty of the test. Generally, in SBA teachers can to some extent do this themselves without a great deal of external assistance. However, if we want these custom-designed tests to have rich norm-referenced, criterion-referenced, and standards-based interpretations, the test questions have to be calibrated to these scales; clearly teachers are unable to do this, even if they had all the necessary training in assessment development. The cost and effort required to produce this information is prohibitive at the school-level; only nations and large jurisdictions can underwrite such activity. But the validity of the test from the centre will always be questioned—thus a mechanism that gives teachers the power to create customised tests that are all standardised would be incredibly powerful for improving the quality of assessment and information available to teachers, administrators, students, and parents.
If the scores (or any feedback) is delayed through large-scale central scoring, data entry, item analysis, report generation (as it is inevitably by some 3 to 6 months), then the potential for that information to actually shape meaningful learning activities is practically nil—the students have changed class or grade, the teachers have moved on to new material, the class may have been successfully taught that content, and so on. Prompt feedback and teacher control over what is in the test are two key features to successful SBA and are delivered in the asTTle software—teachers review and validate the test content and difficulty and get rich interpretive feedback as soon as the test is scored and data entered; certainly well within the time to make a difference.
Fourth, SBA developers and publishers must seek to communicate to teachers, parents, and students in novel and powerful ways. Instead of focusing on numbers (especially rank order values such as stanine, percentile, grade equivalent, etc.), SBA is greatly enhanced with graphical feedback to participants as to where learners are relative to teaching goals (see asTTle Technical Report #15 for details of how and why the reports were developed). Displaying graphically performance relative to a norm or a standard, rather than reporting a numeric score, reduces the need for teachers to be assessment literate in the classical sense and takes advantage of well-developed intuitive understandings of what charts mean. Simple and consistent use of conventions greatly enhances the clarity and communicability of educational reporting. For example, in asTTle we use Tukey’s box and whisker plot to communicate distribution and central tendency of scores, we use red scores on blue fields to show ‘my group’ compared to the norm group, and we use point indicators broad enough to encompass the standard error of measurement. Indeed, the asTTle reports are so transparent that some schools report that students are leading parent-teacher conferences using the Individual Learning Pathways report—a variant of the ‘kidmap’ report.
Research into the accuracy of users’ understanding of the reports certainly improves the quality of the charts (see asTTle Technical Reports #1, 9, & 10) and identifies professional development or training requirements (see asTTle Technical Report #35 where answers to comprehension questions about asTTle reports showed that teachers who had received PD had more accurate understanding than those who did not). Without item response theory scoring and calibration of items educational reporting is more limited since classical test theory can simply provide a test based score only rather than an item based report. IRT is essential to enable meaningful reporting of SBA as students complete different items at different times from a bank of items that covers a much wider range of valued content than simply any one test could ever do.
Fifth, removing central control or central reporting or centrally-mandated consequences, has been found to assist in the uptake of SBA. This policy in New Zealand takes advantage of teachers’ professionalism and respects their vital role in enhancing school effectiveness, while minimising the negative consequences of national testing. Indeed, research into teachers’ conceptions of assessment in New Zealand (Brown, 2004), has shown that primary school teachers not only agree with the goal of using assessment to improve education but also agree that assessment can help identify schools and teachers who are doing a good job. Fundamentally, these teachers are willing to make assessment demonstrate accountability because Big Brother is NOT watching. Furthermore, they have high-quality resources that provide data that external agencies accept as being credible and not just a function of wishful thinking or bias.
New Zealand teachers, on the whole, are not afraid to look into the mirror of assessment and discover that students are not learning as expected or even as well as the norm. This fearlessness comes, not from special attribute of our teachers, but rather from a policy context of high-trust and school-based management of learning against national objectives. Teachers and administrators are expected to be the first to identify learning needs in cohorts and individuals and make appropriate plans and reports long before inspectors or external agencies come along to determine if the school is doing a good job. In such a context, the teacher and school are supported in identifying ‘bad news’, rather than in the classic high-stakes context, reported in many jurisdictions (Cannell, 1989), of make the bad news go away by cheating, making extra accommodations, and teaching to the test. Since the data belong to the school, not the government or the developers, the school can have confidence in inspecting the ‘bad news’ safely without fear of being blind-sided by unexpected public exposure or humiliation. What matters in New Zealand is not that there be NO bad news, but rather that teachers and schools identify needs, implement appropriate educational plans, and monitor the effectiveness of those plans prior to external vetting or inspection. The New Zealand inspectors want to know what evidence was used to identify a problem, what was done about it, and how the school knows that the planned intervention is working—all of which can be answered by using a high-quality SBA resource.
Sixth, the New Zealand experience has shown that making the use of a new government sponsored SBA resource voluntary can have positive impact provided the resource provides new and valuable information to teachers. Related to point 5, the psychology of educators in New Zealand is such that compulsion may have a negative impact. That teachers could choose among the various options available to them, meant that those choosing asTTle did so because they were convinced that it gave them information that they otherwise would not have obtained about the effectiveness of their teaching. New Zealand, consistent with Hamilton’s (2003) recommendations provides teachers with many resources to monitor learning—not just a bank of tests, but rather there are exemplars of learning, a wide variety of assessment resources, means of monitoring system wide developments, and high-quality curriculum based resources. Within this multi-faceted context, teachers are gravitating to the use of asTTle because it does tell them something they didn’t know; for example, surprisingly hard things that students could actually do, surprisingly easy things the students could not do, students who had not made much progress, and so on. Further, the asTTle website provided teachers with access to catalogued, high-quality teaching resources that could meet the identified learning needs of their students—that surprising resource (you are no longer on your own with this unpleasant assessment resource) meant that teachers could close the feedback loop and begin to respond appropriately to learning needs. Truly this is a matter of bringing the thirsty camel to the water and letting it choose which well it would drink from.
Seventh, the use of computer technology is critical to permit customised test creation and sophisticated IRT based score calculations. Teachers simply could not create photo-copy ready classroom assessments of a learning area at a certain difficulty area in the 7 minutes that it takes asTTle to create a 40-minute test. The many hours saved here can be easily transferred to the straightforward problems of scoring and data entry. Much more powerfully, teachers could only calculate total score for a test with the assumption that all items on the test were equally difficulty—although all of us know intuitively that items are not equally difficult even on the highest quality assessments ever made. Thus, the computer can accurately and rapidly estimate what a student’s strengths and needs are, freeing the teacher to concentrate on the important decisions and actions he or she will take based on that information. The transformation of scores into meaningful pictures requires computer technology that captures expert processes—again freeing teachers from the need to be assessment literate, while forcing them to be extremely literate about their teaching content and their students.
Nevertheless, the design, selection, and deployment of ICT must be done in a manner consistent with infrastructure status and development plans of the country. When asTTle began in New Zealand in 2000, all that was required was that tests could be created and reported upon with the kinds of computers schools and teachers already had. Hence, it functioned on stand-alone computers such as Mac Classic and Windows 95. Based on the positive response of teachers to that resource and the call for improvements, versions 3 and 4 extended ICT functionality until now asTTle users have the option of shared database systems (Multi-user asTTle V4) that operate on multiple servers; i.e., Mac OSX Panther and Tiger; Windows 2000, 2003, & NT; Linux Redhat 9 & Enterprise; & Novell Netware. In addition, asTTle V4 operates on laptops, desktops, and still supports Windows 98 while keeping pace with other technical requirements.
However, the requests for new technical features and options never stops. Teachers are now asking for onscreen testing to save paper and data entry work, parents want access to the reports about their children, teachers want more flexibility in test creation, others have asked for computer adaptive testing so that less time is spent testing and more time is spent teaching. Administrators want greater and more seamless interaction with the school or student management systems. These new wish list elements bespeak a fundamental commitment to using ICT to serve educational needs; this is in contrast to the traditional environment of ICT in search of a real application. asTTle is an educational resource that happens to use technology, not a technology resource for education. Assessing real learning is a real educational application desperately in need of better tools to relieve teacher workload and to improve teacher effectiveness. The touchstone of the asTTle project has not been “are we using the newest and best?”, but rather “when teachers use asTTle do they focus on the technology or the education?”. If the answer is the latter, then we have succeeded, and this is the benchmark against which all ‘smart’ education innovations need to be judged.
Finally, the deployment of SBA needs to take place on an incremental basis during which research is conducted, both to ensure that any technology is developed appropriately and to ensure that teachers understand and implement it for the purpose of improvement. The New Zealand experience clearly showed the benefit of gradual implementation. First, it allowed time for teachers to get used to the idea of customised, yet standardised tests that they controlled rather than the central agency. Second, it gave time for item development, trialing, reviewing, calibration, norming, and so on. Third, it gave time for teachers to understand what the system could do and to make clear what they wanted it do—experience-based requirements specification means that what the end user really wants is what is actually developed. Fourth, it gave a robust basis for determining whether what was done, was having the impact expected—research-based evaluations over time meant that those ‘summative’ analyses could feed ‘formatively’ into powerful revisions. Fifth, from a government perspective, it meant that each stage of development had a fixed budget and timeline that was met without fail—success in delivery on this basis is rare in the ICT industry and even rarer in government and so we are justifiably proud of what we did. Sixth, it meant that the level of technology was always at an appropriate level of complexity; it was not overspecified or overdelivered for the current infrastructure capacity at the time the software was released—in other words, we currently supply on-paper testing only because that’s what schools could deliver now and not what we or anyone else could dream of. As infrastructure capacity increases and as demand increases, then the enhancements can be put in place.
We have taken throughout the approach that the design and delivery of asTTle has been on a research basis—our design was informed by new information and theory throughout and we have not mindlessly followed a master plan from the beginning. Agile responsiveness to new knowledge was made feasible by an incremental approach. Further, as a research project we are proud to have made accessible to all users technical reports that provide a robust basis for confidence in our processes and results. This research commitment has extended to publishing articles, completion of research theses, and provision of training—our team teach SBA on the basis of their success at designing and delivering a powerful mechanism for doing it.
To conclude, SBA can be introduced effectively into any jurisdiction provided certain conditions are met. There must be a will to provide teachers with educational resources that help them improve the quality of their work—we suggest by giving them feedback as to goals, progress, and next steps based on real observation of student performance. It is not about giving them technology, nor about implementing a centralised system of checking on teachers. It is about respecting teachers so much that we trust their professionalism to monitor their own work and effectively respond to learning needs. It is not about making teachers into world-class assessors, but rather about helping teachers do what they really exist for—improving the life chances of a nation’s young people by easily, accurately, and appropriately identifying their learning needs and responding appropriately. SBA, developed in this manner, can effectively assist developing and under-developed nations meet the needs of the knowledge economy in the 21st century.
Brown, G.T.L. (2004). Teachers’ conceptions of assessment: Implications for policy and professional development. Assessment in Education: Policy, Principles and Practice, 11(3), 305-322.
Cannell, J. J. (1989). How public educators cheat on standardized achievement tests. Albuquerque, NM: Friends for Education.
Cizek, G. J. (2001). More unintended consequences of high-stakes testing. Educational Measurement: Issues and Practice, 20(4), 19-27.
Darling-Hammond, L., & Wise, A. E. (1985). Beyond Standardization: State Standards and School Improvement. Elementary School Journal. 85(3), 315-336.
Darling-Hammond, L. (2003, February). Standards and Assessments: Where We Are and What We Need Teachers College Record http://www.tcrecord.org ID Number: 11109, Date Accessed: 8/2/2005
Firestone, W. A., Mayrowetz, D., & Fairman, J. (1998). Performance-based assessment and instructional change: The effects of testing in Maine and Maryland. Educational Evaluation and Policy Analysis, 20(2), 95–113.
Hamilton, L. (2003). Assessment as a policy tool. Review of Research in Education, 27, 25-68.
Hattie, J.A.C., Brown, G.T.L., & Keegan, P.J. (2003). A national teacher-managed, curriculum-based assessment system: Assessment Tools for Teaching & Learning (asTTle). International Journal of Learning, 10, 771-778.
Herman, J. L., & Golan, S. (1993). The effects of standardized testing on teaching and schools.Educational Measurement: Issues and Practice, 12(4), 20-25, 41-42.
Klein, S. P., Hamilton, L. S., McCaffrey, D. F., & Stecher, B. M. (2000). What do test scores in Texas tell us? Santa Monica, CA: RAND. Available as ERIC Document ED447219
Koretz, D. M., & Barron, S. I. (1998). The Validity of gains on the Kentucky Instructional Results Information System (KIRIS). Santa Monica, CA.: RAND. Available as ERIC Document ED428131.
Koretz, D. M., Linn, R. L., Dunbar, S. B., & Shepard, L. A. (1991, April). The effects of high-stakes testing on achievement: Preliminary findings about generalization across tests. Paper presented at the annual meeting of the American Educational Research Association, Chicago.
Linn, R. L. (2000). Assessments and accountability. Educational Researcher, 29,(2), 4–16.
Linn, R. L., Graue, E. M., & Sanders, N. M. (1990). Comparing state and district test results to national norms: The validity of claims that “everyone is above average”. Educational Measurement: Issues and Practice, 9, 5-14.
Monfils, L. F., Firestone, W. A., Hickes, J. E., Martinez, M. C., Schorr, R. Y., & Camilli, G. (2004). Teaching to the test. In W. A. Firestone, R. Y. Schorr, & L. F. Monfils (Eds.). The ambiguity of teaching to the test: Standards, assessment, and educational reform (pp. 37-61). Mahwah, NJ: LEA.
Shepard, L. A., & Dougherty, K. C. (1991). Effects of high-stakes testing on instruction. Paper presented at the annual meeting of the American Educational Research Association and National Council on Measurement in Education, Chicago.
Smith, M. L., & Rottenberg, C. (1991). Unintended consequences of external testing in elementary schools. Educational Measurement: Issues and Practice, 10, 7–11.
Stetcher, B. M., Barron, S. I., Kaganoff, T., & Goodwin, J. (1998). The effects of standards-based assessment on classroom practices: Results of the 1996-97 RAND survey of Kentucky teachers of mathematics and writing (CSE Tech. Rep. 482). Los Angeles: Center for Research on Evaluation, Standards, and Student Testing.
asTTle Technical Reports Available at www.asttle.org.nz
Christensen, I., Trinick, T., & Keegan, P. J. (2003). Pängarau curriculum framework and map: Levels 2-6(Tech. Rep. No. 38). Auckland, NZ: University of Auckland/Ministry of Education.
Coogan, P., Hoben, N., & Parr, J. M. (2003). Written language curriculum framework and map: Levels 5-6 (Tech. Rep. No. 37). Auckland, NZ: University of Auckland/Ministry of Education.
Ell, F. (2001). Mathematics in the New Zealand Curriculum - A concept map of the curriculum document.(Tech. Rep. No. 11). Auckland, NZ: University of Auckland, Project asTTle.
Fairhall, U., & Keegan, P. J. (2001). Pängarau curriculum framework and map: Levels 2-4. (Tech. Rep. No. 13). Auckland, NZ: University of Auckland/Ministry of Education.
Glasswell, K., Parr, J., & Aikman, M. (2001). Development of the asTTle writing assessment rubrics for scoring extended writing tasks. (Tech. Rep. No. 6). Auckland, NZ: University of Auckland, Project asTTle.
Limbrick, L., Keenan, J., & Girven, A. (2000). Mapping the English curriculum. (Tech. Rep. No. 4). Auckland, NZ: University of Auckland, Project asTTle.
Murphy, H., & Gray, A. (2003). Review of Mäori literacy framework for koeke 2-6 panui/tuhituhi of the Mäori language curriculum statement, Te Reo Mäori i roto i ngä Marautanga o Aotearoa (Tech. Rep. No. 39). Auckland, NZ: University of Auckland/Ministry of Education.
Murphy, H., & Keegan, P. J. (2002). Te Reo Mäori literacy curriculum map. Levels 2-4 (Tech. Rep. No. 23). Auckland, NZ: University of Auckland/Ministry of Education.
Nicholls, H. (2003). English reading curriculum framework and map: Levels 2-6 (Tech. Rep. No. 34). Auckland, NZ: University of Auckland/Ministry of Education.
Thomas, G., Holton, D., Tagg, A., & Brown, G. T. L. (2003). Mathematics curriculum framework and map: Levels 2-6 (Tech. Rep. No. 36). Auckland, NZ: University of Auckland/Ministry of Education.
Brown, G. T. L. (2002). Item signature study: Report on the characteristics of reading texts and items from calibration 3 (Tech. Rep. No. 28). Auckland, NZ: University of Auckland, Project asTTle.
Meagher-Lundberg, P., & Brown, G. T. L. (2001). Item signature study: Report on the characteristics of reading texts and items from calibration 1. (Tech. Rep. No. 12). Auckland, NZ: University of Auckland, Project asTTle.
Meagher-Lundberg, P., & Brown, G. T. L. (2001). Item signature study: Report on the characteristics of reading texts and items from calibration 2. (Tech. Rep. No. 16). Auckland, NZ: University of Auckland, Project asTTle.
Thomas, G., Tagg, A., Holton, D., & Brown, G.T.L. (2002). Numeracy item signature study: A theoretically derived basis. (Tech. Rep. No. 25). Auckland, NZ: University of Auckland, Project asTTle.
Brown, G. T. L. (2001). Reporting assessment information to teachers: Report of Project asTTle outputs design. (Tech. Rep. No. 15). Auckland, NZ: University of Auckland, Project asTTle.
Hattie, J. A. (2002). Schools like mine: Cluster analysis of New Zealand schools. (Tech. Rep. No. 14). Auckland, NZ: University of Auckland, Project asTTle.
Hattie, J. C., & Brown, G. T. L. (2003). Standard setting for asTTle reading: A comparison of methods.(Tech. Rep. No. 21). Auckland, NZ: University of Auckland/Ministry of Education.
Hattie, J.A.C., & Brown, G.T.L. (2004, September). Cognitive processes in asTTle: The SOLO taxonomy. asTTle Technical Report #43, University of Auckland/Ministry of Education.
Meagher-Lundberg, P. (2000). Comparison variables useful to teachers in analysing assessment results.(Tech. Rep. No. 1). Auckland, NZ: University of Auckland, Project asTTle.
Meagher-Lundberg, P. (2001). Output reporting design: Focus group 1. (Tech. Rep. No. 9). Auckland, NZ: University of Auckland, Project asTTle.
Meagher-Lundberg, P. (2001). Output reporting design: Focus group 2. (Tech. Rep. No. 10). Auckland, NZ: University of Auckland, Project asTTle.
Brown, G.T.L., Irving, S.E., Hattie, J., Sussex, K., & Cutforth, S. (2004, August). Summary of Teacher Feedback from the Secondary School Calibration of asTTle™ Reading and Writing Assessments for Curriculum Levels 4 to 6. (asTTle Tech. Rep. 49). University of Auckland/Ministry of Education.
Irving, S. E., & Higginson, R. M. (2003). Improving asTTle for secondary school use: Teacher and student feedback (Tech. Rep. No. 42). Auckland, NZ: University of Auckland/Ministry of Education.
Keegan, P. J., & Pipi, A. (2002). Summary of the teacher feedback from the calibration of asTTle v2 pänui, pängarau and tuhituhi assessments (Tech. Rep. No. 27). Auckland, NZ: University of Auckland/Ministry of Education.
Keegan, P. J., & Pipi, A. (2003). Summary of the teacher feedback from the calibration of the asTTle V3 pängarau assessments. (Tech. Rep. No. 44). Auckland, NZ: University of Auckland/Ministry of Education.
Keegan, P.J., & Ngaia, T. (2004, August). Summary of teacher feedback from the V4 calibration of asTTlepänui and tuhituhi assessments for Curriculum Levels 2 to 6. (asTTle Tech. Rep. 46). University of Auckland/Ministry of Education.
Lavery, L., & Brown, G. T. L. (2002). Overall summary of teacher feedback from the calibrations and trials of the asTTle reading, writing, and mathematics assessments (Tech. Rep. No. 33). Auckland, NZ: University of Auckland, Project asTTle.
Source : East Meet West KL 2005 , An international Colloqium For International Assessment APEC Paper 2