NATIONAL RESEARCH COUNCIL
+ + + + +
BOARD ON TESTING AND ASSESSMENT
+ + + + +
VOLUNTARY NATIONAL TESTS OF
READING AND MATHEMATICS ACHIEVEMENT
REGULATORY AND LICENSING ISSUES
+ + + + +
THURSDAY,
JUNE 12, 1997
WASHINGTON, D.C.

C O N T E N T S
Participants
- Richard Shavelson
Opening Remarks - Robert Linn
Presentations:
Michael Feuer
Gary Philips
Eva Baker
Richard Duran
Richard Jaegar
Janell Byrd
Discussants
Eva Baker
John Fremer
George Madaus
Discussion

LUNCHEON RECESS

Licensing - Gary Philips
Afternoon Session Questions
Federalism and Inter-Governmental Relations
Presentations:
Bruce McDowell
Beryl Radin
John Shannon
Jack Knott
Summations

June 13, 1997 Proceedings

P R O C E E D I N G S
8:35 a.m.

MR. SHAVELSON: Good morning. On behalf of the Board on Testing and Assessment and the National Academy of Science, it's my pleasure to welcome you to our workshop on National Tests, Regulatory and Licensing Issues.

I'm Rich Shavelson, I'm Chair of BOTA. BOTA is a standing Board of the National Research Council, which is the research arm of the National Academy of Sciences. BOTA's charge is to advise the Federal Government and nation on issues of testing and assessment, policy and practice.

Its purview is broad, encompassing the multitude of test uses in our society, including education, health, labor, and the military. Consequently, this workshop on President Clinton's proposal for national tests of Reading and Mathematics, fits squarely into our agenda.

Let me say a few words about BOTA and its members before we jump into the agenda today. We're fortunate to have with us today, seven members from the Board on Testing and Assessment. We have on my left here, Bob Linn, Vice Chair of BOTA and distinguished professor of education at the University of Colorado, and Chris Edley who is professor of law at Harvard University.

In addition, we have Art Goldberger who's a member of the National Academy of Sciences and professor of economics at Wisconsin; Carl Kaestle, who's the immediate, past President of the National Academy of Education and professor of History and Education at Brown University.

I didn't see Richard Duran. Is Richard here? He's on the way. Richard is coming in from UC Santa Barbara, but if the weather's nice he may not make it. And we have Bill Taylor, attorney, and currently a visiting professor of Law and Education at Stanford University.

We thank them for their dedication to BOTA and the NRC, and for their willingness to share some of their time on behalf of the issues we're about to address today. It's worth noting again that all of our members act as volunteers, which is one way the Academy aspires towards objective and independent, scientific judgment.

A couple of words about the projects that are going on within the Board. We're currently involved in several activities that may be of interest to the group. We're involved in a 3-year evaluation of the National Assessment of Educational Progress; we're involved in a new study of Title I assessment which is about to begin this summer; a new roundtable on work, learning, and assessment, designed to bring together people from business, education, and government for discussions about schooling and the future of work, learning, employment testing, and post-secondary admissions.

We have a new project that will review and synthesize research from cognitive and neuropsychology, and its implications for assessment. And finally, last but not least, we have a new book coming out, Educating One and All, Standards-Based Reform and Students with Disabilities. It will be published next week.

In addition, the Board holds periodic workshops and colloquia on selected topics. Some of our reports are on topics as varied as IQ tests and special education, Goals 2000 and standards-based reform, the general aptitude test battery, and we're currently holding conferences on TIMSS and on science assessment.

Next week we will launch a new bi-monthly luncheon colloquium series with a visit of Nancy Cole, President of ETS, who will discuss the major study of gender bias in educational testing.

BOTA projects are sponsored mostly by the Federal Government; the Departments of Defense, Education, Labor, School and Work Opportunities, and the National Science Foundation, are or have been sponsors of our work. In addition, we would also like to acknowledge the support of several foundations: The PEW Charitable Trusts, the Spenser Foundation, and the TW Grant Foundation.

This conference was conceived and organized solely by BOTA and the BOTA staff and is paid for by core funding we receive under a cooperative agreement with the Department of Education. We are very pleased to be able to organize events such as this one, designed to provide a forum for sharing of the information and the integration of social knowledge, science knowledge, and policy and planning.

Finally, I'd like to thank the many individuals who are not currently members of the Board, who have agreed to take time to share with us their expertise. I won't name all of them -- you can find their names in your program -- but I do want to extend a heartfelt thanks for their time and efforts.

Now, it's my pleasure to introduce my colleague, and the Vice Chair of BOTA, Bob Linn.

MR. LINN: When President Clinton announced in the State of the Union the idea of a national test, it was a remarkable event. And when you think of the period of time since February 4th when that happened to today, just barely over four months later, a great deal has happened. It shows how fast the government bureaucracy can move when the President decides it's going to.

The national test is potentially important symbolically and practically, in terms of what impact it may have on education. It's obviously being seen as a major tool of educational reform, and it is within the spirit of the Standards movement that the Administration has backed for some time.

The Board, when we met in March, discussed the national test at some length, and we thought that there was a great deal that we should take advantage of in learning from efforts that have been made by States and districts of our country in the past, who have turned to tests as a major tool of educational reform.

So when we have a new program at the national level before us, we thought it's important that we could look at what could be learned from those past experiences, both in terms of what works well and what some of the risks or downsides there may be, when a test is put in place that may have impacts that are not always as positive as those intended by the policymakers when they put them in place.

It's also clear to the Board how well a system is going to work depends on the complex of technical issues, policy issues, and how governance takes place in the way the test is administered and the scores are used.

The goal then, was to think of ways that we could maximize the intended benefits of such a testing program while minimizing the risks, eliminating them to the extent possible.

Now, there are many issues with regard to the national test that might have been addressed in a 2-day conference like this. There is important issues about the context of the test. If you're a Math educator or someone concerned with reading, you obviously know that it makes a big difference what goes into the nature of that instrument that's out there. Is it going to represent the kind of Mathematics that people have in mind with the standards, for example.

There are also issues of how this is going to fit in with the States' own assessment programs, etc. Now, our intent is not to focus on any of those issues, however. The issues for this conference, really is on the licensing regulations. What sort of mechanisms can be put in place to support quality control? How it is that you're going to license and have regulations that maximize the benefits and minimize or eliminate the risks? The unintended, negative impact that the tests might have.

So the principal role of the meeting is in keeping with part of BOTA's charge. BOTA is in part, set up to provide a scientific forum and advice to the government on policy issues -- in particular, policy issues related to testing. Our goal here then, is to end up with some constructive discussion and suggestions that can improve the test.

We could come here to debate whether or not it's a good policy to have this testing program; whether it was the right idea in the first place. That's not the intent of this forum. The intent is not really to have a discussion about, but rather, given that we have this policy in place at this point, how can we make the best use of it, how can we ensure that we avoid the possible downsides that I know many of you are concerned about.

So in keeping with that, the meeting is set up as you see, we will start with some overview and background on the purposes of the assessment from Gary Philips; then we'll move into a session that talks about some of what we know about the potential risks and unintended consequences of testing from other experience with testing.

The second session will deal with what we, as a profession, what the profession of measurement and professionals concerned with testing, have set up by way of mechanisms to ensure quality and to avoid misuse, such as: the standards for educational and psychological testing, the Code of Fair Testing Practices, and mechanisms that might be set up in addition to those.

The third session, the afternoon, will deal with what we can learn from other government agencies; other experiences outside the realm of testing that might be relevant for learning about how we might do a better job in this arena. In many ways, it's a new idea to have something like licensed agencies for the Administration's scoring and reporting of test scores.

At the end of the day, we'll try to come up with a few key questions so that we'll have something to do tomorrow, and tomorrow really, is to pull that together with panels to see if we can come to some agreement on sensible advise to give to the government with regard to this important initiative.

So with that, I'm ready to turn it over to the Workshop Chair -- who Rich has already introduced -- Chris Edley, who will be chairing the meeting. Thank you.

CHAIRMAN EDLEY: Thanks, Bob. According to my watch, in order to get on schedule I have to do my throat clearing in negative-2 minutes. So I think the best way to do that is for me to simply introduce Michael Feuer who is the Director of BOTA who, along with Pattie Morrison, has really done the great work, both logistical and intellectual -- in assembling this gathering, and ask Michael to introduce our first speaker. Michael?

MR. FEUER: Good morning.

CHAIRMAN EDLEY: Be brief.

MR. FEUER: Thank you. Fortunately, we have a requirement for the BOTA staff that they take some training in improvisational theater. I didn't anticipate having anything to say here this morning -- just to listen. But it's a pleasure to welcome you all, and I think Bob and Rich have given you a very good and eloquent description of what our plans are for these two days.

There is actually a theme and a structure to the way these sessions have been organized and I hope that becomes implicitly clear as we go along. But if it doesn't, I just may occasionally remind us what that structure is.

The first thing is to give everybody here the opportunity to actually hear about the origins and status and plans for the Voluntary National Reading and Mathematics Tests, and so it's a pleasure to introduce Gary Philips who is in many ways, if not the architect of the program, he is certainly the chief engineer, and is I think, ready to give us a presentation. So this is Gary Philips from the Department of Education.

MR. PHILIPS: You have a copy of the overheads in your packets, so rather than messing with it I thought maybe we'd just -- you can use what you have.

Well, thank you. I'm very pleased to have this meeting organized by the National Academy of Sciences. This is one of those meetings where the timing is perfect in that we are in the process of constructing the RFP for the licensing of the Voluntary National Tests. We really do need this input and every word you say today I can assure you, we'll be listening to and we will consider.

Right after this meeting, today and tomorrow, we will be reviewing a draft of the RFP internally -- probably late next week -- in the Department, and then shortly after that it will be on the Web for several weeks for public comment. So we're zeroing in on the licensing procedures for the test.

In this new job I've been to lots of meetings with the Secretary and the White House, and every chance they get they talk TIMSS and NAEP. For example, two days ago there was a release of the TIMSS 4th Grade report in the Rose Garden of the White House. And again, the President mentioned the Voluntary National Test and how this is such an important aspect of the whole TIMSS effort.

The whole idea here centers around the fact that the information that TIMSS and NAEP gives is really good information; it's very useful. But not a single student, not a single parent, not a single teacher, has that information about their students. The whole idea here is to take that same kind of information down to the classroom level and to provide it to parents.

The goal is to empower parents and teachers with that information, just like policymakers now have it. So it resonates well and it -- I think it's usually well received. It's hard to argue against giving parents and teachers good information that they don't currently have, and that really is the whole idea of the project.

What I would like to do is to give you a brief overview of what the plans are, and this will lead up to the licensing issues which we'll be getting into as you proceed in the meeting -- give you some background -- you know, you can ask questions as we go along. If you give me like 15 minutes or so, I'll cover some of the basic stuff and then I think you'll have some good information.

So let's do that. If you go to the overhead that says Overview of Plans, that's what I'll be talking about. First of all, the Voluntary National Testing Program is voluntary. It's voluntary in the sense that the Federal Government will not be requiring anybody to take this test.

What the Federal Government is doing is it will be funding the development of the test, standing behind its technical integrity, making sure that it's administered properly, and making it available to districts and States for their use. It really is a testing program. It's a test that will be used by districts and States; this is not a test that the Federal Government is using to collect data.

The tests are intended to provide an overall indication of proficiency in Mathematics at the 8th Grade, and Reading and English at the 4th Grade. When we say overall that means what we're trying to find out is, how well do students read? What is their Math proficiency?

This is not intended to be a diagnostic test, like the kind that districts and States already have, to get intimate information or detailed information about the content or the learning of students. This is to give an overall indication of how well they're doing compared to a national standard, and to international standards as well.

The Reading and the Math will be linked to the NAEP assessment and the TIMSS assessment, and in the case of Mathematics at Grade 8. And there will be a separate RFP and a contract to make sure that linking is done properly.

The items will be released to the public every year. So after the test is administered the items, along with scoring guides and other ancillary materials, will be made available to the public through out Web site and through the Press and that sort of thing.

The first administration is scheduled for 1999, and we're thinking about March as the month of that administration, and we're working now to think through what days and how many days, and things like that.

Okay, let's get into some more detail, which is the second page of your handout. No individually-identifiable data will be sent back to the Federal Government on this test. Not a single test score from a single student will be sent back to Washington. This is not a Federal testing program; it's not a testing program that we're using to collect data.

When we want to get information about how districts and States and the nation is going, we will rely on NAEP; that will continue to be the primary mechanism for understanding and monitoring and reporting on the progress in the States and the districts and the nation.

So no information on the test from individual students comes back to the Federal Government. The only information the Federal Government will get on this test will be the same way that anybody else gets information. If the district or a State produces a report, we'll get a copy of it. That's the information that we'll get back.

There will be no -- not a single dollar of Federal money will be linked to taking the test in the sense -- and what I mean by that is, Federal funding will not be contingent on taking this test. It could be used, for example, in a Federal program to assess students and report on students, but it is not required; no money is contingent on the test.

The test will be consistent with the joint technical standards that are being revised for the APA, AERA, and NCME. Those will be available I think, about the time of the administration of the test. We will make sure that what we do -- is that right, Eva? Okay. What we will do is to make sure that this test is consistent with those standards.

There will be inclusion criteria and appropriate accommodations will be available. We have already committed ourselves to having a bilingual version of the Math test at Grade 8. We won't have that at Grade 4 in Reading because it's reading in English.

Those inclusion criteria, I want you to know that we are absolutely committed to making this a testing program that all students can take. And the inclusion criteria, we'll start with the NAEP inclusion criteria and we'll work from there. And those inclusion criteria and the accommodations will be developed as part of the development process.

When the contract is awarded in September, we in earnest, will get started on that. We will have many meetings on the topics. Lots of people that have an interest in this will have an opportunity to influence the outcome. But the bottom line is, we're committing to make a test that all students can take. So we want to err on the side of inclusion and not on the side of exclusion.

We want to have the tests reported in a metric that parents and teachers can understand. Again, part of the contract is to have focus groups with students, parents, and teachers to work on reporting strategies, so that when we report the results we want to make sure that it's something that they easily grasp, that they understand, that they resonate to, and that they appreciate and can talk about. So a lot of work will go into making the reporting understandable to parents and teachers.

We will begin with the NAEP framework; that's a given. We want to use the NAEP framework because that was developed through a national consensus process. There's not a unanimous agreement that it's a great framework, but there's a vast majority of people that agree that it's a good framework.

And one of the reasons why we're able to get the testing program off the ground quickly is that we don't have to do all that work that the National Assessment Governing Board has already done to develop a framework. We also want to use the achievement levels that the National Assessment Governing Board has developed.

Again, those were developed through a national consensus process. Again, they're not universally accepted but a vast majority of people agree that those are good achievement levels and they communicate what we want to communicate. So we'll be using those two givens in the project and we'll go from there.

As I said, the tests will be linked to NAEP so when we report on achievement levels in this test it will be by way of that linking what we did to the national assessment, and the same thing will be true in TIMSS. And as I said, there will be a separate RFP that will conduct that linking process.

The tests will be up to 90 minutes of testing time; that's about twice what NAEP gives to an individual student. We think that's generally about the right amount of time to get a good, reliable, valid score on individual students. It's generally consistent with what other testing programs do as well. And of course, there's a lot of variability but -- when I say generally, I mean it's sort of in the middle, generally consistent with what other testing programs do.

About half the testing time will be spent on non-multiple choice items, and about half on multiple choice items. Eighty percent of the test will be machine-scorable and the other 20 percent will be constructed response which will have to be scored by raters. We want to have the test as machine-scorable as possible because obviously, we have a large testing program and we want the results turned around quickly. And so that's a constraint.

On the other hand, we want this test to be such that the Math community and the Reading community can stand behind it and agree that this is good Reading, this is good Mathematics. And then the item and test specification work that we're doing right now through the Chiefs and in PR, those meetings are going on now and I'll talk about them in just a moment.

That is the issue that they're wrestling with and at the end, we will want them to sign off on this and to say this is good Mathematics, this is good Reading.

There will be a special booklet available at the time of administration. The special booklet will be a complete booklet of extended and constructed response items that will be developed as part of the field testing, and there will be national data on the items, for example, just like there will be on the regular test.

There will be a new booklet each year. This booklet can be used by teachers for a variety of purposes: instructional uses, classroom testing, whatever it might be. And there will be other materials given to teachers as well, and parents, as part of the testing activity.

But there will be a separate, entire booklet on extended constructed response items -- along with scoring guides, of course.

There will also be a sample test available prior to the 1999 administration. So that will be field-tested in 1998 and then prior to the actual administration we will make available to the public, a sample test along with scoring guides; again, to take the mystery out of the test so that teachers and parents and the general public will know the type of material that will be on this test.

The tests will be released, as I said, every year. The actual test itself will be released every year along with scoring guides and other materials. It will be kept secure up until the completion of the administration. There will likely be like a 1- or 2-day administration period and then maybe a day makeup, or something like that. Right after that it will be released to the public.

We want to have the results reported within the same year, so if we are administering the test in March that means the results will be out probably in May. And again, this is one of the constraints on the program, is that we want to make sure that the test is such that it can be scored quickly enough so that the results can be released during the same school year.

There will be an ongoing research component, so each year we'll be looking at research questions that need to be dealt with. The first year we'll likely look at the validity of the tests for special populations and for certain uses, and then each year new questions will come up and new research will be conducted, and that will be done on an ongoing basis. Funds will be set aside to make sure that research is done.

There will be an ongoing evaluation component so there will be an independent, prestigious group that will report on the activities of the testing program and the success of the program with an annual report to the President and to the Congress. And we're working to try to get that group in place as soon as possible because we'd like to have them here watching what we're doing now, so they can report to the President and the Congress on the success of what we're doing and make recommendations for improvements.

There will be an ongoing advisory structure. The advisory structure, or the panels that are in place now with the item and test specifications is sort of a mini-version of the more permanent structure that we'll have. And I'll get into that in just a moment when I talk about the item and test specifications.

The test is going to be on a 3-year cycle. If you go to the graph that looks like this, what this shows is that in 1999 there will be three testing assessments going on at once. There will be the administration of the 1999 assessment, the field testing of 2000, and writing items for 2001.

As you know, there's a little bit of a snag for the first assessment and that when the contract is awarded in September we would like to have had it awarded earlier in the year but the President didn't make his comments until February. So when the contract is awarded the contractor will have to do some catch-up, and hopefully they will be caught up by March of 1998, and then after that -- I think the work will be at a more leisurely pace once they get caught up, about March of 1998.

A few other things. The administration and scoring of the test. This in fact, is what you're going to be dealing with the rest of the meeting. What we want to do -- and I'm only going to briefly mention this right now -- but the idea here is that the Federal Government, through a contract, will make sure the test is developed properly and that it's -- stand behind as I said, its technical integrity.

But the scoring, the analysis, and reporting is a local responsibility. So this is a test like any other test. If you go enter into a contract with a testing company, or you develop your own test as a State or district, if you don't have the internal ability or capacity to do it yourself, you contract it out. This is the same thing here.

This is a test that will be made available to you. If you as a district, don't have the capacity to let's say, score it and report on it, but maybe you are able to administer it with the proper training, then you would need to go to a company that can do that for you.

What we will have in this project will be licensed companies that will be licensed by a separate contract -- that's the one that we're working on now -- so that if you're a district and you want to administer the test, you would go to one of these licensed contractors, enter into an agreement with him, and they would do the scoring and reporting for you.

We have a commitment from the Department and given Congressional funding that the scoring sites and the companies that do this will be reimbursed in 1999 for their cost. And possibly in future years, again depending on Congressional appropriations. So that's the general way it will work, and we'll get involved in more of the details, the discussion of that later today, because that's sort of the whole reason you're here.

But that's sort of a different twist on this in that we're developing the test, we're making it available to districts and States. And in order for us to maintain the quality control that we like to have without being too restrictive, we're using this licensing mechanism as a way of guaranteeing to ourselves and to the public, that there's a level playing field; that it is being scored appropriately across various districts and States; and the results are being reported appropriately -- things like that.

If you look at the RFP and timelines, the first set of activities is in the item and test specifications, which I'll get into in just a moment. Those are being conducted now by the Council of Chiefs, State School Officers, and MPR -- which is a company located in California -- and those meetings have already begun. I'll talk about them in just a moment.

Right after the item and test specifications are complete -- which was planned for August -- then the test development contract will be awarded in September and the test development contract begins where the item and test specification stops.

What the item and test specifications are, they're like the blueprint for a house, so we need to have somebody develop the blueprint. The contractors in September -- is like the general contractor is going to build the house and they start with this blueprint.

The reason we wanted to have the item and test specifications developed now rather than waiting for September is, we didn't want to wait until September to start that work; we wanted to have that work done so that when the contract is awarded in September they can start the item writing and all the other activities that are associated with the actual test development.

Other awards that will be made soon will be a technical panel. We're trying to decide how we want to structure the technical panel. It will not be a Federal Advisory Committee. It will not be under the Federal Advisory Committee Act; it will be a technical group, probably associated with the contractors to do technical work.

There will be a separate linking contract that will be awarded in October to do the linking to NAEP and TIMSS. The evaluation -- this says October but as I said, we're trying to get this nailed down before October because we really would like to have the evaluators here to see a lot of the work that we're doing right now.

The licensing and certification contract, which is the one that we're working on at the moment, will be awarded in November; hopefully the first part of the month of November. Let me go now to the item and test specifications -- which is the work that's being done right now -- and you have another set of overheads on that.

The item and test specifications are being developed for the test by the Council of Chiefs, State School Officers, and MPR, and this is very consistent with the way that the test specifications were developed for the national assessment. Those turned out to also be done by the Council of Chiefs, State School Officers, for both Reading and Mathematics. So it's a nice continuity from the NAEP project to the Voluntary National Test.

MR. FEUER: Gary, excuse me. Can you just tell people which handout they should be looking at?

Mr. PHILIPS: Yes.

MR. FEUER: The cover says National State Panel and it's in there? Okay.

MR. PHILIPS: The item and test specifications -- the goal here is to take NAEP framework which has already been developed, and to take the specifications for NAEP that have already been developed, and to modify those so that this test can be administered to individual students, and that we can get individual student data from the test. So whatever needs to be done to accomplish that goal, that's what the Chiefs are working on.

And some of the issues that they're looking at for example, is the content and coverage of the test. For example, the national assessment has a lot of content coverage in each administration. We have 90 minutes so we don't have the same total amount of time that NAEP has; because you know, NAEP may only give 45 minutes to each individual student, but across the whole system it might be giving 180 or 200 minutes.

So they're working on ways that we can still have the same content coverage and so that's one of the things, the content coverage: the mix of items, how to weight items, how the things should be sequenced on the test, what are the scoring procedures, the use of calculators in Mathematics, the number of passages in the reading. All those things are being discussed and looked at by the item and test specification committee.

The committee is broken up into a number of groups. There's the national test panel -- and the members of that committee you have there. The meetings of the committee are at the bottom of each sheet. So you have the national test panel -- this is a sort of a policy-oriented group. And what will happen is, the Reading committee and the Math committee will report to them, and ultimately, the item and test specifications will be signed off by this overall, national test panel.

So there's a national test panel. There's already been an initial meeting of each of these. The national test panel has met once; the Reading and the Math and the Technical panel have all had one meeting. And there's another meeting, for example, next week of the Math panel along with a public hearing, in Denver, of the Math panel.

So I just wanted to make sure that you knew who the panel members were. There's also a Math committee chaired by John Dossey; the national test panel is chaired by Bill Cody; the Reading, Dorothy Strickland; and there really isn't a Chair of the Technical Advisory Group -- there eventually may be one.

So those are the people that are working, along with contractual support and commission papers, consultants, and things like that. These are the people that will complete the task of changing the item and test specifications. That work will be completed by August.

Now, I think you need to know, we are moving at a brisk pace but I don't believe we're rushing it. The item and test specifications that NAEP developed, they had basically about another month or so to do this, but they had to start from scratch.

This group is starting from the test specifications that NAEP has already developed, so it's just a matter of modifying what's already been there. So really, we are able to move, I think, quickly and still do a really good, thorough job, because we're able to build on the good work that the National Assessment Governing Board's already done.

The meetings of the item and test specification panels are all open, public meetings; you're invited. The announcements of those are on our Web site. There may or may not be a Press statement of some sort about them. They are public meetings. Each of them will have a transcript. The transcript will be on our Web site; within about a week of the meeting it appears on our Web site.

All the public meetings that we've had are on our Web site. This is a place -- we're using the Web as a way of disseminating information, announcing upcoming events, but also archiving information. If you go there you'll see in order, all the things that have happened: all the speeches, all the papers, these overheads. Everything that's publicly available is going to be on the Web site and we're going to continue doing that. So there's a complete chronology of everything that's happened.

In fact, one of the characteristics of this whole project is that it will be done in a fishbowl from the beginning to the end. When the contracts are awarded to develop the test, the linking, the licensing -- all of these are going to be public meetings; all will have transcripts; all will be on the Web.

The only thing that will not be public meetings will be of course, the writing of the items and the review of the items, because those will be kept secure. So we'll get a chance to see those items publicly. The first opportunity will be the sample test in preparation for 1999. There will likely be some items available as part of the item and test specifications, and then of course, the release of the items at the end of each Administration.

So that's the general plan, and if you have any questions I can take them. Or do you just want to go on? I don't know what the plan is. Do you want to ask a few questions? Yes?

MR. JENNING: Gary, you said materials on the test would be released. Your document says after the test but I think you said orally that materials could be before showing each of the students what was going to be tested.

MR. PHILIPS: There will be a sample test available prior to the administration in 1999. It will be an example of what the test is like; it won't be the test.

MR. JENNING: But it won't be the matter, the principles that are going to tested?

MR. PHILIPS: Oh, yes. That will be available -- the framework is already a public document. The item and test specifications will be a public document. That will be available as well. So the blueprint will be available, and then prior to the administration in 1999, an actual example of the test will be available, along with scoring guides.

Then in '99, in March of '99, the actual test will be administered, and then that will be made available. And so every year there will be a new test that will be made available to the public, right after the administration.

MR. JENNING: So conceivably, a teacher could prepare students during the year for the test?

MR. PHILIPS: You can use the sample test for whatever you want to use it for.

MR. JENNING: A second question. Are you preparing any other test in languages besides Spanish and Math?

MR. PHILIPS: For 1999 we're only planning to do Spanish for Mathematics. In future years, we have to revisit that, but that's the plan for 1999. It is important to note that, you know, we can't do everything in 1999, and the goal is to do the most that we can. There will be incremental improvements and changes in future years as we learn more and, you know, and that sort of thing.

We're also building on the work that the National Assessment has already done. One of the reasons why we're really comfortable with doing a bilingual version of Mathematics is, NAEP has already field-tested a Spanish and a bilingual version and has used a bilingual version in Mathematics at the 8th Grade, so we can build on that. There has not been another language version of Reading in NAEP because reading is in English in NAEP.

MS. BYRD: Could you tell me what the government's purpose is -- at least at this point -- in adopting a national test? What's the kind of stated purpose that the government expects to accomplish with this?

MR. PHILIPS: Well, okay. I think you have to look at sort of a -- there has to be some context to this. You have to -- standards-based reform has been going on for over a decade. Many states have adopted standards -- content standards and performance standards -- and it's been, I think, a successful effort. There's been a lot of national attention to standards-based reform.

This is sort of like another step in that whole process. And what the goal here is, is to provide the same kind of information about content and national performance standards that policymakers have, down into the classroom.

Right now, when NAEP for example, or TIMSS produces this report -- which many people like and many policymakers make decisions as a result of it -- not a single teacher or parent or student has information about how they're doing on that test. And so the goal is to give them, simply the same -- this is an information activity -- it's give them the same information that policymakers have.

And so that really is the whole purpose of it: simply to provide information down into the classroom level about how students are doing on a national standard and an international context. So that an individual student, starting in 1999, will know how they stack up against a criterion-referenced performance standard -- which are the ones that NAEP uses: basic, proficient, and advanced -- and they'll be able in Mathematics, to see how they stack up against students in other countries that have taken a Mathematics test.

MS. BYRD: Is it anticipated that this will affect the actual instruction in the classroom by giving it to the teachers? Do you know, is the Department going that far to make those kinds of projections as to the use --

MR. PHILIPS: Well, I would certainly hope that when teachers get better and more information that they would make use of it, and that sure, absolutely.

MR. KNOTT: What is the role of the States in this? You've referred several times to districts, about the districts contracting with licensed contractors and so on. What is the role for State governments in the implementation of --

MR. PHILIPS: Well, this again -- this is a test like any other test. A State can adopt a test from a norm reference testing company, they can develop their own test. This is a test that will be made available to States -- six States in fact, have already agreed that they want to use the test, so they're just like a district.

So yes, States will -- we expect States to use it, and it's completely voluntary. You know, if a State or a district decides this is not for them, that's great; it's not for them. This is a voluntary test. If they want to use it they can, and we're trying to make it, you know, something that's useful to them.

MR. HEUBERT: We're going to be discussing appropriateness in a number of uses of the tests. Before we get into any of the specifics though, is it the Department's position that it will do what it can to prevent inappropriate use of test results, or to promote their appropriate use? Or is the government's position that once we administer the test the use of it is a matter for State and Local educators and officials to decide?

MR. PHILIPS: As part of the administration of the test in 1999, there will be a number of guidelines which again, will be developed over the next several months -- well, actually not -- it will be, once the award is made in September there will be guidelines for test utilization -- which I think is what you were getting at -- and of course we will deal with the high-stakes nature of the information.

And we fortunately will have the benefit of the Code of Fair Testing Practice and the joint technical standards to help us think through that. and there will also be guidelines on reporting and other types of guidelines. And so the level of specificity I don't know yet because that hasn't happened, but the tests would be administered and utilized within the parameters set by those guidelines.

MS. BAILEY: Could you clarify your earlier point about the voluntary nature of the test? In the case that a State chooses to offer the test, does the Federal Government have a position that once within the State, whether LEA has an option to opt out, and who can make that ultimate decision?

MR. PHILIPS: Well, that's I think, a State decision, just like the State right now. If it adopts the test and the LEA refuses to do it, that's something the States are going to have to deal with. Now there will be, again, guidelines for reporting, so when it comes to reporting, we're likely to have some guidelines here -- I don't know what they are yet -- so that we are assured, and the public is assured that reporting is done in a valid way and that it's not misleading, and things like that.

MS. BAILEY: So your earlier comment about local district choice, that choice is available if the State would not be offering the test? I mean, that's the only voluntary option that the district has if the State has decided it's mandatory? I mean, I just want to clarify that.

MR. PHILIPS: The same situation holds with this test as it would like if a State decided to adopt a norm reference test and the district said, I'm not going to do it. That has to be dealt with internally within that State, so that's not really an issue for us, I don't think, to get involved in.

CHAIRMAN EDLEY: Could we get people to please say their names -- that was Adrienne Bailey -- for purposes of the transcript.

MR. TAYLOR: Bill Taylor. The President has said he's against social promotions. What role if any, does the test have in furthering his view that social promotions are a bad thing?

MR. PHILIPS: I really -- I don't think I should comment. I mean, the President has many policy directions on a variety of topics. If you don't mind I'd like to stick to the test and --

MR. TAYLOR: Well, I'm not asking about -- your comment about whether you're for or against the President's policy. I'm asking you what role if any, the test has in --

MR. PHILIPS: This test --

MR. TAYLOR: -- the Federal plan in dealing with questions --

MR. PHILIPS: This test, like any other test, is information that districts and States use for a variety of purposes. So there's nothing special or different about this test.

MR. MADAUS: George Madaus. You said that there would be no Federal monies linked to this. What's the relationship between this test once it gets going, and Title I?

MR. PHILIPS: Of the uses of this test for Title I, I think is a Title I issue. So that has to be dealt with -- again, this test, George, is like a test that you might want to use: a norm reference test or a State test or a local test, this test.

There is nothing different about this test, so if this test is used for Title I then it's used in all -- then the law about Title I applies to this test.

MR. MADAUS: Then in future reauthorizations of Title I, would there be a firewall that would protect the test from being used as an evaluated mechanism? Because if it is linked to Title I, it's not voluntary any more.

MR. PHILIPS: Again, if you don't mind, I would like to restrict my comments to this test and not the Title I program and future policy directions of that program.

MR. SHANNON: John Shannon. Have any other States indicated to-date that they would not go along with the testing?

MR. PHILIPS: As far as I know, no State has indicated they will not. Six States have indicated that they will, and several -- many, actually -- others are in various stages of making a decision.

PARTICIPANT: Larry -- at one point -- plans to reimburse the -- or whoever is administering the test -- is that still in the plan?

MR. PHILIPS: That's still in the plan for 1999, and depending on, I'm assuming on the success of that, and of course, Congressional appropriations, that may happen in the future as well. But for 1999, it's still the plan. But that does require Congressional appropriations.

CHAIRMAN EDLEY: If folks in the audience can either speak into a microphone or really boom it out so that it can be picked up for the transcript, please.

MS. LEWIS: Gary, does that reimbursement cost include costs for professional development of the administration of the test?

CHAIRMAN EDLEY: I'm sorry, and the name, please?

MS. LEWIS: Sharon Lewis.

MR. PHILIPS: That cost would include reimbursement for the administration, the printing, the scoring, and reporting.

MS. LEWIS: Professional development, the training of teachers to administer the test?

MR. PHILIPS: Training is part of administration, right. And it's likely to be a fixed cost, although of course, the whole reimbursement -- part of the RFP for the licensing is to work through with us, the details of that reimbursement procedure.

CHAIRMAN EDLEY: Just a couple of more questions and then we'll take a quick break. We have guards posted at the door to keep Gary from leaving, so we'll be able to continue asking him questions through the day.

MR. DUNBAR: Steve Dunbar, University of Iowa. Gary, you mentioned the purpose of this program is to provide information about individuals to parents and teachers. Are there plans, provisions, guidelines, that you have thought about in the area of aggregated reports, either at Local, State, or national levels?

MR. PHILIPS: Yes, and this is an ongoing -- we are currently having many discussions about this. Ultimately, it will result -- there will be, as I said, a set of reporting guidelines, and the type of aggregations, levels of aggregations, and just how much we want to weigh in on that will be there. I don't have that yet, today. But I can assure you, we're talking about that quite a bit.

MR. DUNBAR: One quick follow-up. Is there any anticipation of a national report of performance on these things?

MR. PHILIPS: At the present time there is no such plan. However, let me say that as part of the field testing -- in 1998, for example, when we field test the forms, we will have national data from a national probability sample.

It will not be a large one -- it will not be like NAEP, so you can't do all the stuff you do with NAEP -- but there will be some information at a national level, on the various forms of the test, and each year as we're doing the field testing we'll continue to do that.

But we do not plan on collecting information from districts and States and adding it up and getting a national estimate. That won't happen. If you want that, that's exactly what NAEP does very well, and will continue to do that. We don't want this test to duplicate what NAEP is already doing a very good job of, and has been doing for 25 years.

MR. HAKUTA: Kenji Hakuta. How do you resolve the apparent contradiction between the voluntary nature of the test and the requirements that included or excluded special --

MR. PHILIPS: What do you mean? I'm missing the --

MR. HAKUTA: Well, one of the points here is that you'll be requiring -- you have required criteria for inclusion and exclusion of special populations.

MR. PHILIPS: No, we're not requiring. What we're doing, what we're saying is that we don't want to -- we do not want to have certain populations excluded simply because they're members of that special population.

MR. HAKUTA: Right. Under equitable design point bullets and inclusion criteria and appropriate accommodations will be required. If we --

MR. PHILIPS: It will be required of districts -- what that means is that district -- maybe I should rephrase that. Those that administer the test will be required to use inclusion criteria and to use various accommodations -- those that we agree to -- and so, that's what that means. It doesn't mean that a student has to take the test. That's not what I was intending there.

CHAIRMAN EDLEY: So if I can just jump in to clarify. So there obviously are contemplated, at least some regulatory -- I use the term loosely -- constraints on the way in which the test is used.

MR. PHILIPS: There will likely be some -- yes. Regulatory is not the word. There will be some -- for example --

CHAIRMAN EDLEY: Operational constraints?

MR. PHILIPS: Well for example, it might be that if a student, a blind student, there may be Braille/enlarged print versions available. And depending on the degree, the decisions made by the IAP, if you administer this test for certain students, there needs to be a version, Braille/enlarged print. I'm not saying that those will be the accommodations because that has to still be worked out, but there will be accommodations.

CHAIRMAN EDLEY: I'm absolutely certain we will return to this subject a little later in the morning. I think we ought to -- Gary, why don't we cut it off there, to be continued? And if we can take a 4 minute, 59 second break, for people to get more drugs out in the lobby?

(Whereupon, the foregoing matter went off

the record at 9:40 a.m. and went back on

the record at 9:45 a.m.)

CHAIRMAN EDLEY: Dick Elmore, Professor Richard Elmore from Harvard's Graduate School of Education just joined us. Just to refresh your recollection, the goal in this next chunk of the discussion is to try to get out on the table some of the particular sorts of risks, based on historical experience with other large-scale assessments, that might be identified -- anticipated if you will -- in implementation of the President's testing initiative.

So my hope is that through the presentations and in the Q&A, you will all start to assemble a short or long, as the case may be, list of concerns -- questions, issues -- that we would want to see addressed in the design and implementation of the program.

And we have an extraordinary group assembled to help us with that. You have some biographical information in your booklets so I won't belabor that. We've asked each of the presenters to hold their remarks to 10 to 15 minutes, and then we have discussants as well.

And I guess we'll start with Eva Baker who's a professor in the Division of Psychological Studies and Education, and the Division of Social Research Methodology, and Acting Dean of the Graduate School of Education and Information Studies at UCLA. Eva?

MS. BAKER: Thank you. I have even more impressive titles but we'll share those later. Thank you very much for inviting me; I'm honored to make this presentation to the Board. And I have to say that Gary's presentations have evolved from each time I hear them, so some of the comments he has made have caught me up a little short, and some of the questions that you have raised have anticipated concerns that I have.

However, I'm not going to be extremely flexible because I took the red-eye and I'm more likely to stick to my prepared remarks than I normally would be.

Let me simply say that we're all together here because we acknowledge the importance and the complexity of this undertaking, and because we believe that determining what and how well children learn is logically essential to promoting their intellectual growth in an optimal way.

With the right testing systems, students and parents should be able to benefit from feedback, and teachers and administrators could assess responsibility and reset their instructional priorities, and in the longer view, the public could better appreciate the goals and progress of educational systems.

But of course as we know, for many reasons, large-scale testing programs have been criticized for failing to serve well their various constituencies. Test administration and interpretation processes are alleged by sizable numbers, to constrain, warp, underrepresent, overindicate, and generally mechanize education.

Nonetheless, test results and particularly, test results in a comparative framework, have world-wide credibility in public policy and for most parents as well. So I think our problem as a group is to think hard about what we can do to reduce the negatives and to improve the role tests play in our practices.

The general principle that I'm going to advocate -- it's going to be woven through and it may not be as explicit as you would like -- but the general principle that I want you to think about is, how can we remove as much as possible, the incentives for misuse in the national testing system?

I will make some suggestions that might be acceptable, but I'm more interested in your thinking along those lines. My able and charming assistant -- fabulous.

Any consideration of large-scale testing, the frame of discussion normally encompasses at least some of the test purposes listed on the slide, and the extent to which a different or unanticipated use misleads us in our interpretation of results.

We consider here two types of issues. One is the quality of test for the purpose it is attempting to serve, and that is, goes by the name of validity or validity inference, in general. And then secondly, the specific kinds of errors brought in by extending a test purpose beyond the original design.

At the heart of the discussion, again, is the validity of judgments we make from test findings, and with either sort of analysis -- either focusing on a test purpose and its validity for that purpose, or the extension idea -- there are a number of points in the chain of testing events where errors can occur.

And those include: the point of communication of purpose or purposes; the understanding taken from those statements; test administration practices, scoring, reporting, and inference-related actions.

We may make inappropriate inferences because of a mismatch of purpose, the technical design characteristics of the examinations, and/or the conditions under which particular respondents are actually involved in the testing.

Interpretation errors may apply to the entire set of results, or may be focused on the fairness of interpretation for subgroups of students -- and I know some of my colleagues are going to be addressing that issue particularly. I mean, an example is: what about the history that particular subgroups might have; think about their instructional backgrounds; think about Debra P. as an example.

Similarly, misuse analyses often focus on reporting, interpretation, and inappropriate consequences ensuing from adapting tests to purposes for which they were not designed. But before I read for you some of the -- I think it's in our general lore about what test misuses are likely to occur -- I want to raise three linked perspectives that I regard as underlying realities, that will I hope, inform the means we use to select and to promote the best use of tests.

First is the exploration of the concept of control as it seems to underlie the thrust of this conference, at least by its title, and to some degree, the model of the proposed national tests themselves as I understood them. I'm not sure as I understand them this moment.

Regulation of practice, in fact, the acceptance of purpose -- that is, the government's purpose as a context for appropriate use -- assumes that there is an optimum purpose held by an acknowledged authority, then implicitly sets the boundaries for desirable, acceptable, and unacceptable uses.

Without dropping too rapidly into the intellectual morass of multiculturalism and deconstructed meaning, let me simply assert that test purpose resides in the hearts and minds of the beholders. If a common test purpose is to be accepted by a wide range of users, that I do not believe, can be accomplished by dictum and regulation. It's a communication challenge, one in this case exacerbated, I believe, by an unrelenting schedule.

Second, there is a continuum of adequacy about test use, ranging from something that approaches perfect kinds of validity inferences where the purpose, technical quality of the measure, scoring, administration, inferences yielded, line up pretty well to some where there are logical extensions or inferences made, or other applications of data made, to some where there are clearly deliberate misuses.

Deviations then, from the intended purposes, need to be considered I think, from two perspectives. One is the intentionality of the people who are doing it; that is, what's going on and why do they believe this is appropriate.

But more searchingly, I think, from the perspective of the damage done: on the one hand to individuals, and on the other to public understanding of the educational enterprise. Balancing the consequences to individuals and to the larger enterprise creates a tension we've experienced before, and I believe we do not yet know how to resolve.

Third -- and this is the one I think is the best -- in this country we value exploration and innovation. The concept of use control and the measurement specialist's well documented arguments that certain tests should be used solely for certain purposes, flies directly in the face of a powerful legacy of tool development.

Since our prehistoric days, humans have learned and been rewarded for creating an object and testing its broader applicability in areas outside its original purpose. Tom Glenn called this propensity Technology Push almost three decades ago, when describing how new technical applications were conceived and developed.

My favorite -- although unfortunately dated example, is the creation of masking tape, an innovation that led to transparent tape to use for paper repair, color tape for decoration, and -- there's nobody in this room as old as I am, but -- tape for hairstyling -- don't nod; you don't have to reveal yourself -- and for carpet tacking, and so on.

So it's the nature of people to look at a creation, and especially in times of scarce resources, to find other reasonable ways to apply it to meet another important end. Testing, I think, is not immune from this human propensity.

So the pervasive search for this generalized application -- sometimes focused, sometimes opportunistic -- suggests to me that an idea, purpose migration, extending the use of a test to another, somewhat related need, shouldn't continue to be the annual surprise regarded with despair, but should be considered and anticipated. It shouldn't be an unanticipated outcome; it should be understood that it's going to happen.

Instead of bemoaning misuse it is our job, I think, to harness this propensity. And if not warmly welcomed, at least more than one potential use should be sketched out in a risk analysis provided for making inferences from classes of unintended uses. And maybe Gary's comments suggest that that's something that's on people's minds already.

It's my view that particular strategies of test design can also actually, help us optimize use for a broader range or purpose, but even if I'm only partially correct, I think costs will be more broadly amortized and acceptability might grow.

Initial remarks made -- let me talk about historically, two major kinds of unanticipated use problems. Actually the first will be focused on the specific migration of test purposes and what happens there, and the second will be an administration -- more forward-looking issue in the administration of these tests that deal with the security issue.

Throughout, I'll try and indicate, or at least stimulate your thoughts, about possible solutions in this area. So let's go back to the purpose chart. I think I had two of them in there, Rich.

To understand purpose migration, let's look at the chart. Look at column A, column B. For example, it's each to see that many column A purposes based on individual student data, can to some degree, be aggregated to meet institutional purposes in column B.

For example, reporting frequencies or trends of students who receive certification such as diplomas, students who are promoted, or students who need remedial instruction, can be used to make inferences that are program evaluation inferences, or system monitoring inferences.

In each of these extensions of a test, from one kind of purpose to another, concern for the details of context have to be acknowledged. Fewer students may be placed in remedial programs because budgets were cut and not because performance was raised. SAT scores may be higher because of background characteristics rather than efficacy of a school program. We know this.

Considering this idea of concept of purpose migration from within column A, I think, great errors could be made in using assessments designed for placement tests for instance, and move that over to certification.

Because obviously, the test content may be inappropriate and the degree of certainty that one would need for irrevocable kinds of decision would need to be higher for certain kinds of uses; that is, if we were actually certifying somebody, letting them go out, as opposed to placing people into a program where there's an opportunity to regroup if we made an error.

But I think -- and here's where I'm a little confused -- but I think the case-in-point here is using a test that -- Gary talked about it today in a way that I hadn't heard -- and maybe it's my understanding of this that is different. I really thought of this test as principally a system-monitoring test that was given at a census level, and not as an individual test provided for communication and motivation in the column A.

Now, that may be wrong-think, but let me continue and say that, if we were thinking about this test as a test that was going to be given to everybody in a system -- let's say a State agrees or a district agrees or even, I believe that there will be enormous pressures, Gary, to aggregate and report whatever data we have at some sort of higher level -- national reporting.

The question really is, what happens as some of the participants anticipated, if States wish to use these kinds of measures for other higher stakes purposes? There's obviously a history of the attempting to use these tests for system monitoring, such State assessment tests being used to make decisions about the effectiveness of educational administrators.

And the lesson is -- and I think Dan Corets and others have said this quite well -- is that States become attached to tests any time broad-based public reporting occurs; whether or not it's the intended purpose of it by the promulgators of the test. And of course, sometimes interpretive errors will occur.

My favorite example is from Lee Bernstein's work based on a California school district that reassigned principles based on changes in performance on the State assessment, when in fact, the real changes were due to in-migration of different kids coming to different districts. It had nothing to do with the principles, you know, propensity to be an instructional leader, but all these people were moved around because of that.

A second type of purpose migration -- I'm almost done -- involves the extent to which a test, created for broad system monitoring, is appropriately used to make student retention/promotion decisions. In the present plan, I believe this is considered as a type of misuse. It would undoubtedly result in relatively unreliable classifications of students into promotion and retention categories.

And even if the test were of such length to permit adequate classification of students, such uses would also require -- I think the measures be closely connected to curricular offering in the particular local setting to assure that decisions about students were made on instructionally-relevant grounds.

The next slide -- we should recognize that an assessment designed to provide student level data and perhaps also to provide system monitoring, undeniably creates an expectation for improvement in all participating districts and States. When expectations are raised, no matter what the nominal purpose of the test is, pressure for improvement occurs, perhaps specifically linked sometimes to job performance goals for Superintendents and Administrators.

If no reasonable avenue is provided for instructional improvement -- such as empirical guidance, clear and plausible choices for teacher action, or strategies for teacher preparation -- systems people have fallen back on their logical options. Students have been urged to practice test formats in the hope that they will be able to raise their scores, rather than given instruction in the content and concepts underlying the items selected.

Curriculum narrows and increased time is spent on test preparation as a separate event, sort of apart from the regular curricular, and instructional focus of the schools. The test works out to become a barrier, an enemy, something that pulls times away from reality, rather than a neutral benchmark.

The design of any large-scale test should take into account the certainty of this intention to improve scores and help educators to find the balance between appropriate focus and the dysfunctional narrowing of attention.

Releasing items may simply exacerbate the problem, reinforcing some notion that the best way to do this is item-by-item practice. So I applaud the idea of the search for high quality, understandable, and concrete test specifications. If they're done right they can provide some bridges to show how the underlying constructs relate to State, district, and teacher goals, as well as legitimate instructional changes.

Precepts may guide us, reinforcing the expectation that validity evidence will be needed for every new purpose. That will be something I'll allude to briefly when I talk about the standards. But in practice, the question really will arise about who or what groups have the capability to provide such evidence, entwined with the reality of rapid and unexpected adaptations of uses of tests.

So while we have sort of rules and guidance about how we made validity arguments for different purposes, I think the reality sometimes gets ahead of us.

This next slide says, Test Administration Tech Challenges. To return to the topic of control and practicing items, the question of test security is on my mind. Let me raise two concerns, and I'll do it very briefly. The first is that even within certain school districts, instructional schedules vary considerably.

If it were intended for instance, to test all students in year-round schools in Los Angeles, a fixed testing window might very well result in great numbers of students missing the test, and the question is how make-ups would be handled, or if they would.

My second concern is far more pervasive, It involves a rapid obsolescence of the idea of centralized control in a world that is becoming far more accustomed to information access and its distribution. I've talked in the last month with a lot of people about the extent to which we believe, or it is believed, that security can be maintained in the era of the Internet, given just the numbers of people who have access to this examination. And my guess is that it's functionally certain there will be a breach of test security.

Just for a moment, even if you believe that, what would you do about it; what's the backup plan? My suggestion is probably one that -- I won't make a comment. I propose dropping the concept of test security entirely and instead, create specifications and release the large set of items in advance to all students in schools.

If the set is large enough and the specifications are clear enough guiding, then practicing individual items won't be seen to be the optimum strategy for learning these kinds of skills, and public release would of course, up the pressure on the providers to assure that high quality matched between specifications and item occurs.

I won't go on about that, but I think that is really an important consideration for you. And if that's not the strategy you adopt I think that the message of that is that you're -- what I'm trying to do is to find a way to create a disincentive for cheating.

The larger comment that I have to make -- I'll stop -- but it really does have to do with, how do we help people understand -- and I know this isn't the topic -- but understand the relationship of these tests to NAEP, to what they've been doing in standards-based test development, commercial tests. How do we help them resolve different information?

And it seems to me that that's something that's extraordinarily important for us to do if we want to demonstrate that this assessment has value-added, over and above what is already going on in the system. I think that we have to find a way to sort through all of these needs and test interpretations, and I think the struggle is worth undertaking. Thank you very much.

CHAIRMAN EDLEY: Thank you. We clearly made a mistake in that Eva's background and her role as co-Chair of the Joint Committee on Standards for Educational and Psychological Testing and so forth. I think we should have just arranged to spend half a day with Eva. It would have been time well spent, of course. And it's really Michael Feuer -- Michael Feuer is to blame for not organizing this in that way.

Next up is Richard Duran who is a Professor in the Graduate School of Education at UC-Santa Barbara, and is a member of BOTA. His fields of expertise include assessment and instruction of language minority students, and design and evaluation of interventions assisting language minority students. Richard?

MR. DURAN: Okay. Thank you, Chris. My remarks will focus on a variety of issues surrounding exclusion and inclusion of students with disabilities and limited English proficient students in the new assessments. I'm going to focus in particular, on some of the language of the RFP in trying to get us to understand the possibilities that are there and some of the challenges, but I will move into a discussion about the purposes of the assessment and the connections of inclusion to what research knowledge tells us might be possible with examinations that might be examined in their nature.

I will not be speaking about legal requirements surrounding the need to include these students, but note that these legal and statutory mandates ought to be interpreted in a manner making best use of contemporary measurement theory, and research and policy analysis regarding fair and effective measurements of students with disabilities and LEP students. And we have plenty of experts here today that can help us with the legal origins of inclusion in terms of how it's represented within the education system currently.

So a typical cut through these issues would examine issues of inclusion, would look at validity, reliability, comparability of scores of students who receive accommodations in testing, and the issues of fairness in testing in terms of being able to actually assess what students are capable of.

But I'm going to weave the discussion of these issues moreso that it's oriented to the actual language of the RFP so that we can begin to work at clarifying some of the questions in the actual design that's unfolding.

One of the goals of making the new assessments inclusive is to permit inferences about the achievement levels in Reading and Math of all students -- all students. It is recognized that exclusion of students with disabilities and limited English-proficient students from large-scale assessments has distorted the validity of large-scale assessments such as NAEP -- at least allegedly, when we asked the question carefully.

Achievement levels are most likely higher when students with disabilities or LEP students are omitted from assessments -- and some of the work of the NAE has suggested this. Maximally including students with disabilities and LEP in the new assessments would increase the capability of policymakers, educators, and the public, to make accurate inferences about the performance levels of all children, the schools, districts, and States; subject to the caveat that we need to ask questions about the history of the students and ask whether the assessments are appropriate given their academic history.

Further, maximal inclusion of students would send the message to students, teachers, parents, and others, that education change based on assessments is seeking the same improvements in achievement for all students. Increasing inclusion of students with disabilities and LEP students in the new assessments will require accommodations in the instrumentation and administration of new assessments.

I note that the definition of assessment accommodations is -- there are many definitions out there. But a general definition that we would use is, non-standard forms of test administration or responding on tests. So a deviation from a standard version that is meant to allow students to show their ability.

So specific accommodations to be available in the new assessments are stated as follows. Students with disabilities -- the deliverables include Braille and large print on both Reading and Mathematics assessments for blind students and students with limited vision.

And for LEP students, English audio cassette version of an examination plus for Spanish LEPs, a bilingual Spanish/English version of the Mathematics examination. I need a little bit of clarification on the audio cassette version -- whether that's just going to be for the Reading or for the Math.

Now, it is noteworthy that the RFP then, for the new examination, specified deliverable accommodation only for visually impaired students for students with disabilities and not for other categories of disabled students who form the bulk of students categorized as students with disabilities.

One must keep in mind that other assessment accommodations will be implemented for students with different disabilities, and indeed, must be implemented if so stipulated in the disabled student's IEP Plan, or by State regulation. Indeed, other forms of assessment accommodation are mentioned and intended in the design statement of the new assessments.

For example, on page 23 of the RFP in Appendix A, mention is made that it is expected that reasonable accommodations for students with disabilities or with limited English proficiency, will be provided by the school administrator. And on the top of page 19 of the RFP, the section on task 17 mentions some further details on that.

The contractor shall conduct ongoing research into the reliability and validity of the national tests. A number of issues have already been identified in order of importance, and others will arise over the course of this contract.

Validity of test scores under non-standard test conditions; that is, the impact of testing accommodations -- Braille, large print, extra time, one-on-one testing, etc. -- upon the validity of test scores and the feasibility of developing glossaries for use in the national test of mathematics for native languages other than English. That's on the table.

Now what I'd like to do is to skip the next transparency from the two following ones, the first. So mention is made then, of the need for research on validity of test scores under a variety of conditions. Now a recent report that has come out -- that I'll reference to you more carefully, by Jon Olson and Arnie Goldstein from NCES -- drew on another study looking at the kinds of -- on 22 States -- of the kinds of accommodations that disabled -- students with disabilities receive in testing.

The relative frequency of these varies quite a bit. Some of these are very rare; others are found more. But this gives you an idea of some of the variation that's going on out there, and so when we look at what accommodations mean, and we look at how they might influence test scores, then there's quite a bit of stuff going on out there, and a lot of that is not necessarily going to be under the control of the examination system.

If you'll go to the next transparency you'll see a list of accommodations that are also provided for LEP students. Now, you'll see some overlap of course, with some other accommodations that are provided students with disabilities, but these have -- in a survey that was done by NCREL, they found that among 22 States, these were some of the accommodations that were given LEP students.

Okay, so that's kind of under control of the education system and part outside of this testing system, but raises significant challenges for thinking through how accommodations might affect test scores.

Now, if you put on the transparency we skipped -- thank you. A new NCES report by Jon Olson and Arnold Goldstein is an invaluable source for analyzing the foregoing issues on both policy grounds and research. The report reviews ongoing and previous research by NCES and NAEP contractors on NAEP, recent OERI-sponsored State-based studies, including studies by CCSSO and CRESST, and studies by the college board and ETS, among other agencies on these issues.

This is a landmark document in terms of pulling together what the issues are for large-scale assessment. It's a very useful resource.

One of the main outcomes of previous research that's cited in this report has been that allowing accommodations can make an assessment easier for all examinees, regardless of student's disabilities or language status. But this finding is not uniform.

Research on the effects of accommodations on performance on the new Reading and Math test will need to contend with the mix and variation in the use of different accommodations among students with different disabilities and among LEP students. I mean, it's going to be an analytic dilemma, about how precise to be in terms of looking at how performance might change.

Further, the effects of accommodations will need to be studied among students who would otherwise not be allowed accommodation, in order to determine where the patterns of accommodation change a measurement target. So there is a lot of ongoing research in experimental design research looking at whether accommodations help students who would not be labeled as students with disabilities or LEP students, and what this means.

The new assessments are presumably intended to be power tests rather than speeded tests. Power tests are intended to gather valid and reliable information about students' maximal proficiency in a subject matter area. Effects for time is found to improve for example, to improve the performance of all examinees substantially, and not just the performance of students with disabilities or LEP students.

Then it would seem very important to analyze the constructs being targeted for assessment by the new exams. Are they appropriate constructs, if they show speeded effect? Another alternative would be to build the notion of speededness into the construct, and I would not dismiss that possibility based on cognitive research and shows that speed of processing and ability to make verbal associations are very related in terms of performance on verbal ability tests.

Design of studies to investigate the effects of accommodations will be a considerable challenge. It would seem useful, obviously, to begin carrying out while the exams are still in the pilot administration phase, and that's the situation here -- at least it's targeted.

These studies will be very difficult because States and LEAs, as we've seen, vary in how they implement the definition of students with disabilities and limited English proficiency. This is a big problem. What do those categories mean? There's a lot of variance in the operationalization of the definition of disabilities and in how states characterize limited English proficiency.

Studies of this kind will be made further complex by the need to evaluate commonality in the actual criteria used by local assessment administrators to exclude students who are judged as incapable of being examined. And also, the actual procedures used to assign students to accommodated versus non-accommodated examinations.

And here, what I'm referring to is the difference between saying this is the way you ought to do it, and what really happens. That can be extremely noisy and is a very important issue to investigate with these new exams.

The use of an audio recorded Mathematics test and Spanish/English Mathematics examination raises special questions. More details on the precise administration procedures and materials for these accommodations are needed, though some important procedural details have been made. And these are policy decisions and you know, one needs to think carefully about what they mean in terms of what we know from research.

In the current specifications, LEP students with more than three years academic instruction will be asked to take the national test of Reading and Mathematics in English. LEP students with less than three years academic instructions in English would be given the English Reading and Mathematics assessment, unless school staff judge them as incapable of assessment in English. Criteria for this latter judgment are inneed of elaboration.

Another problem to be faced is test item development procedures for handling socio-linguistic variation in Spanish -- a notorious problem to developers of assessments in Spanish. Exactly how will this be handled?

One obvious strategy is to enact an item review process that catches and edits terminology or phrasing in Spanish that would not be recognized universally by competent, native language speakers of English. We strive for that, but it's hard to attain.

Scoring of LEP students' short response performance items in Mathematics raises an important issue. Will scorers be trained to focus on students' mastery of intended knowledge of problem-solving skill given limitations students might show in the English language proficiency?

Research on the former class assessment in California suggests that non-English background children's writing may convey evidence of student's mastery of subject matter knowledge, despite infelicities in English, and that scorers might be trained to be sensitive to appropriateness in written content despite children's limited familiarity with English.

These are some of the underlying questions that cut across use of accommodations with respect to inclusion. I'm not going to go into these in detail, but they are issues that need more elaborated discussion.

In order to control my time, I'm going to move ahead and I'm going to talk about a couple of controversial points. We mentioned a little bit about the Spanish and English Math exam, the criteria for use, the development of Spanish translation, the training of scorers.

Now, I'm going to bring up something that is an example where, you know, researchers in the field, looking at what Reading is -- what the development of Reading is among bilinguals. Now here I cite the recent NRC report, Improving Schooling for Language Minority Students -- a research agenda that was published this year.

In terms of dealing with inclusion -- and as Eva and I just had an exchange -- what do we do with a school district like L.A. on the Reading exam? What are we doing? Are we understanding the distribution of concentration of students with different language characteristics and how we have to contend with that in terms of actually getting at a kind of grassroots understanding of what achievement is?

Now, reports such as the NRC report on Improving Schooling for Language Minority Students, I think cites plenty of evidence that students' development of skills in Reading and language skills, are transferrable into a second language.

Now, like any area of research there are controversies about this, but there is a fair amount of consistency about this point, and it's one that bilingual education researchers have made over and over and over as a working hypothesis that seems to be difficult to challenge; that's a very good way of posing education change for students in terms of developing students in terms of the resources that they're capable of managing.

It's my personal opinion -- not representing BOTA or NRC -- that we still face issues of inclusion that are not dealt with well in this examination system, and that adding the possibility of an examination in Spanish in the area of Reading might be an example of a good development that would help lead to a better understanding of what students can do.

I'm not talking about knowing English when you leave high school; I'm talking about what you're doing in 4th Grade Reading as a foundation for being able to deal with text. So that's an interesting question for me.

In conclusion, I want to raise one other issue that's going to come up later on -- certainly, John Fremer's going to deal with this, and Eva in terms of the Standards. I'm not sure how responsibility for inclusion is going to be distributed across -- and responsibility for analyzing what accommodations mean and how they influence test scores -- across different agencies. It's blurry to me.

If we look at the JCTP documents on guidelines for fair and equitable testing it's clear that we can assign responsibility across just about every agency that has something to do with the development, administration, and use of tests.

But I think that in the area of inclusion and the use of accommodations, those responsibilities have to be sharpened and it has to be clear exactly what's going to happen. And there's a potential here that this could be a very litigious matter if it's not dealt with properly.

One closing comment and that is, I haven't here addressed in any depth, issues of the academic history of students in the appropriateness of tests. I've taken the tests as a given, but I think that there are other issues to pursue that deal with what inclusion means.

If students with disabilities and LEP students tend to perform lower as our data indicates so much, then I think we need to deal with appropriateness of the tests in terms of their achievement proficiency, given where they're at.

And that's a very basic issue that needs a lot more attention in order to really get at the heart of what these assessments are supposed to be doing in terms of providing information for improving educational outcomes. Thank you.

CHAIRMAN EDLEY: Richard, thank you very much for that. Our next presenter is another Richard. Richard Jaegar is the Excellence Foundation professor in the School of Education at the University of North Carolina, Greensboro. I, myself, are from the fairly adequate foundation, professor of Law.

His fields of expertise include educational research methods, educational measurements, standard setting and performance assessments, teacher certification, and the understanding and use of test results by policymakers and others. Professor Jaegar.

MR. JAEGAR: Well, I'd really looked forward to the opportunity to hassle Rich Shavelson about handling my overheads because I have only one, but the timing is critical and now it's been blown by Gary Philips. But I too, will stick very closely to my prepared remarks because of the time limitations.

One could argue that Voluntary National Tests in 4th Grade Reading and 8th Grade Mathematics are like any other standardized tests adopted by States or school systems to assess their students' achievements. Indeed, that argument was put forth a number of times during public meetings on the national tests held on March 4th, March 26th, and May 19th, and again, here today.

However, tests that carry the Federal imprimatur and serve the catalytic objectives envisioned by the President, the Secretary, and the Deputy Secretary, cannot be like any other. Their very purpose invokes burdens of fairness, precision and validity that surpass those imposed on tests used solely for description or pulse-taking.

The consequential side of the matrix weighs heavily on a national test and the strategies and procedures used for reporting test results warrant particular scrutiny.

During the public meetings mentioned earlier, Mike Smith and Gary Philips identified reporting to parents and teachers as the central goals of the national testing program. Four challenges must be met with both groups, and those are the challenges that are on the overheads here.

Parents and teachers must be motivated to consider the results of national testing. The results of national testing must be presented in ways that parents and teachers can readily understand. Parents and teachers must be convinced that the results of national testing should be valued. Test results must be communicated to parents and teachers in ways that foster valid interpretations and inferences.

Although the proposed national tests in NAEP differ in important ways, the NAEP experience must be considered. The issue of audience motivation has plagued NAEP since its inception in 1968 when a professional journalist was employed in an attempt to develop interesting copy for major newspapers.

Granted, parents should be more interested in the test performances of their own children than in the distribution of achievement scores for their State or the nation. But the parents of children most at risk of failing to read or solve challenging mathematics problems are those least likely to attend to their children's test scores.

As a former teacher in inner-city New York testified at one of the public meetings, parents there typically did not respond to requests to sign and return their children's report cards.

The major reporting challenge is finding ways to reach parents of low-achieving children and to convince them that they should review their children's scores on a national test. It won't be easy.

From my own work, that some of the nation's most talented classroom teachers, through the National Board for Professional Teaching Standards, I can tell you that teacher's reactions to standardized testing in any form, range from indifference to revulsion. Most teachers consider the information provided by externally-imposed tests to be largely irrelevant in the context of their detailed, daily observations of the academic strengths, weaknesses, capabilities, and needs of the children they instruct.

Further, they regard the high-stakes test used in local and State programs of accountability, with fear and loathing. The best teachers regard such testing programs as unwarranted intrusions on their opportunities to function as independent professionals in selecting strategies for effective instruction, in methods for evaluating their student's growth and development.

The impact of high-stakes testing programs on the content and methods of classroom instruction have been documented extensively in studies conducted by Mary Lee Smith and Laurie Shepherd. The picture painted by their findings is neither benign nor encouraging. They discovered endless days of mindless drill and practice on the form and format of standardized test items, with consequent loss of curricular depth, breadth, and innovation.

It would be difficult to report results in ways that convinced teachers that the national tests are worthy of their attention. The proposed composition of the test -- fourth-fifths multiple choice items -- will only exacerbate this problem.

Test results must be communicated to parents and teachers in forms they can readily understand, despite rampant innumeracy and general ignorance or fear of statistical terminology and data summaries. Aschbacher and Hermann indicated that many readers of test reports do not understand such basic terms as "average" and "norm".

The Gallup Phi Delta Kappa poll of public's attitudes toward the public schools has been singularly successful in obtaining and communicating parent's evaluations of public schools on the traditional A through F scale. But most parents and many teachers cannot interpret test results in such widely-used scales as percentile ranks, grade equivalents, and State line.

Hamilton and Slater found that policymakers holding advanced degrees had difficulty understanding and correctly interpreting the tabular summaries used to convey national assessment results. Parents, who are typically less well-educated and have far less frequent exposure to data summaries of any form than the Hamilton and Slater interviewees, therefore can be expected to have even greater difficulty.

One particularly disheartening finding from the Hamilton and Slater research was the difficulty their interviewees had interpreting graphical summaries. The picture being worth a thousand words adage might not hold when achievement test results are summarized unless the picture is especially simple and straightforward.

Even if parents could be motivated to read about national tests and such results can be communicated in ways that are understandable, it is not necessarily the case that they will value the results as indicators of their children's achievement or of the quality of their children's school.

Findings from three studies bear on this issue. Shepherd and Blye interviewed 105 parents of 3rd Graders in a Colorado school system about the usefulness of different types of information for learning about their child's progress in school. They found that two-thirds of respondents rated standardized tests below the midpoint of a 5-point scale, with 1 meaning not at all useful and 5 meaning very useful.

Only 14 percent regarded standardized tests as very useful in contrast to 77 percent who so regarded, "my children's teacher talking about his or her progress". And 43 percent who so regarded their child's report cards. Yeager and colleagues conducted detailed analyses of the content of over 500 school report cards produced by school systems throughout the nation.

Using protocols grounded in their content analyses researchers interviewed 166 parents of public school students in Greensboro, North Carolina, and Sacramento, California, to determine among other things, what parents most wanted to know about the condition and effectiveness of their children's schools.

When faced with paired choices among categories of information, parents in both cities agreed that, "school environment information, information on the safety of the school, and the extended involvement in the school by parents and other members of the community" was most important to their evaluation of the quality of their child's school.

Parents ranked school success information -- that is, information on the school's graduation rate, student promotion rates, number of A grades awarded, student's after-graduation plans, student special awards or honors earned, and student's athletic accomplishments as second most important to their evaluation.

It is of interest here that standardized testing information defined as statistics that could tell you about the standardized test performances of all students in your child's entire grade or your child's entire school, was rated by parents as only third most important and the scale just higher than a category labeled, "student engagement information" -- which was information on the school's attendance rate, its dropout rate, the number of students who had been suspended or expelled from the school.

These findings are consistent with those reported in the 23rd Gallup poll in the public's attitudes toward the public schools. In that survey parents were asked to read the importance of various factors in selecting the school for their child were school choice a possibility.

Quality of the teaching staff was rated very important by 85 percent of responding parents; followed by maintenance of school discipline by 76 percent; curriculum -- that is, the courses offered -- by 74 percent; size of classes by 57 percent; and grades or test scores of the student body by only 46 percent. One percent had a track record of graduates in high school, college, or on-the-job.

The bottom line here is that standardized test results are not regarded by many parents as important indicators of the quality of their child's school or of their child's progress in school. If the national tests are to stimulate the reforms proposed in the President's State of the Union Address, test results will have to be presented in ways that convince parents they're important and worthy of their attention and concern.

And finally, test results must be communicated in ways that sponsor valid interpretations and inferences. Again, it won't be easy. Murphy conducted research on the effect of reporting format on elementary school teacher's interpretations of standardized test results. He presented 671 teachers with score reports in both narrative and graphical tabular formats, followed by a series of interpretive statements with which they could agree or disagree on a 5-point scale.

Each statement represented an intentional overinterpretation of the data presented. Murphy included such statements as, "compared with students nationwide this class is below average in Math concepts and above average in Math computation. This student has the Math solving skills of a 3rd Grader. And compared with the nation's 5th Graders, this student is above average on the skills covered under language analysis".

Murphy found that sample teachers accepted gross overinterpretations of achievement test results regardless of the format used to report the test results and the number of courses and workshops on testing and measurement they'd completed. That last finding was particularly disheartening to me.

His conclusion is summarized in the following statement. "The overinterpretations concern concepts that are central to the field of testing: concepts of reliability, error, probability, and approximation. And if teachers cannot interpret such concepts ably, the most central concept of all, validity, becomes at issue. The inferences from test scores that were presented as part of this study were simply not valid."

In a chapter titled, "Five Common Misuses of Tests", first published in the 1982 NAS volume on ability testing, Eric Gardner cautioned against acceptance of the test title for what a test measures: ignoring error measurement in test scores; using a single test score for decision-making; lack of understanding of test score reporting; and attributing cause of the behavior measured to the test that conveys the information.

Each of these cautions applies without modification of the planned voluntary tests in Grade 4 Reading and Grade 8 Mathematics. First, although correlations among Reading subscores are high, it is not the case that Reading is Reading is Reading, particularly when results are interpreted as an indication of what students know and can do, rather than how they fare in some relative sense.

As Gardner noted, "There is a tendency for unsophisticated users to accept the name assigned to a test as an accurate and complete description of the variable being measured".

Research by Shavelson and colleagues indicates that examining by exercise interaction variance is a major contributor to individual differences among test scores, particularly when performance items are used. Student performances depend critically on the specific content of the test, not merely on the test framework, so performance information must be generalized with caution.

At a recent NAS conference on NAEP performance standards, Linn noted substantial differences between performance standards based on NAEP's dichotomously scored items and extended response items, and concluded that the proportion of students would be classified as basic, proficient, or advanced, is quite sensitive to the composition of the assessment.

He reported that 78 percent of 4th Graders would have been classified as performing in the basic category or above, had the cut score been determined using the only dichotomously scored items on NAEP. But that only 3 percent of students would have been so classified had the cut score been based on extended response items.

This finding is particularly troublesome when viewed against the intention of linked national test to NAEP and to its achievement levels. Although I realize that the technical design of the national test is a work-in-progress, and that statements quoted out of context from transcripts of public meetings must not be regarded as definitive. The juxtaposition of Bob Linn's findings and Gary Philip's statements during the public meeting held on May 19th is disturbing, and I quote Gary.

"This is intended to be a test that specifically is focused on giving good information to the parents and teachers. The Reading and the Math will provide national standards and will do that through statistical linkage to NAEP, so we'll be able to provide basic, proficient, and advanced information on the test."

Linn's findings indicate that what parents are led to believe about their children's performances on the national test will depend substantially on the particular items that compose the test, and on the proportions of those items that are presented in dichotomously scored formats.

Whether Johnny Jones is a basic, proficient, or advanced 4th Grader -- 4th Grade Reader, and the percent of 4th Grade students who are classified as basic, proficient, or advanced readers in Johnny's school, district, or State, will be highly manipulable, in an artifact of the construction of the national tests. To the degree that the format composition of the national tests differs from that of NAEP, the NAEP achievement levels will not carry the same meaning for students who complete the national tests.

Public disclosure of the full test might help a bit here, but most parents cannot be expected to review the test and most will believe that it measures 4th Grade Reading or 8th Grade Mathematics regardless of its content and composition.

In keeping with the title of this session -- Potential Risks and Unintended Consequences of Testing -- the principal message conveyed by this paper is one of gloom and doom. I'd like to end on a more positive note.

The measurement literature contains several good papers on how test results should be organized and reported. The Aschbacher and Hermann report mentioned earlier -- although not grounded in new, empirical work -- draws heavily on related psychological literature and research and business and marketing.

Generalization of these findings through achievement test reports is somewhat an act of faith, but the recommendations they make are certainly sensible. Similarly, the suggestions made by Hamilton and Slater make sense, even though they haven't been validated with real consumers or test reports. They call for simplification and narrative explanation, combined with graphical display of results.

The same is true of recommendations provided by Howard Wainer in his lead article in spring 1997 issue of The Journal of Educational and Behavioral Statistics. Wainer's recommendations are appealing and sensible. He illustrates a number of clever ways in which tabular and graphical data displays can be formulated so as to emphasize important results and eliminate the unimportant.

Even in the absence of validation it seems obvious that Wainer's recommendations must result in improved communication. Well, these recommended reporting strategies effectively address the challenges described in this paper: motivation, understanding, valuing of results, and valid interpretation.

I cannot emphasize enough the importance of exploring this question through a sound program of research. If the national tests are to facilitate educational improvement as the President and the Secretary hope, the message must be understood and must be compelling.

To make it so, we must learn what parents and teachers will examine, what they infer, and how the packaging and presentation of test results can foster accurate and useful interpretation. Study of effective score reporting must be a major component of the program of research and evaluation envisioned for the national tests. Thank you.

CHAIRMAN EDLEY: Thank you very much, Richard. That's why he's the Excellence Foundation professor. Let me just -- we've just heard from the very eminent experts in this field of testing, and I was venting some anxiety before the session with Michael and Pattie that I wanted to make sure we included in the discussion some frank concerns about the program; that I didn't want unrelenting cheerleading.

I think the warning flags -- by my count, I think we're about up to 84 warning flags. But this is just to remind everybody that by the end of the show the goal of course, is to try to figure out strategies that the department might use to try to minimize these risks. So I think it's appropriate that we begin with a fairly exhaustive and comprehensive look at what those risks might be.

Let me just add that one thing that we have not been doing thus far, and I hope we'll get into it in the discussion, is trying to get from people some sense of the magnitude of these risks, and perhaps the relative importance of these risks beyond simply being exhaustive in our enumeration of them.

We're going to shift gears slightly now and hear from Janell Byrd who has really done us a great favor by, at the last minute, agreeing to come and talk informally, presenting not the perspective of an expert on testing, but rather, that perspective of several elements of the civil rights community.

Janell is an attorney with the Washington Office of the NAACP Legal Defense Fund where she's been for some years. I hope she won't mind my saying this, but she is one of the most accomplished and highly regarded Civil Rights litigators of her generation, and in particular for our proposes, has done a tremendous amount of litigation under Title VI of the Civil Rights Act, and is widely regarded as having done a state-of-the-art job of assembling expert witness testimony in some recent litigation.

So Janell, thanks very much for coming, to raise some questions from the Civil Rights perspective.

MS. BYRD: Thank you very much. And Chris correctly stated when he said I didn't have a -- my arm was somewhat twisted and I don't have a prepared text. But I did have a lot of questions before I arrived and I guess in part, because my arm was twisted, I don't really have to play by all the rules. So let me say this.

I understand the premise of the session is to come up with ways to minimize harm and to give advice and guidance to the Administration and the Department as to how to design and implement these tests. But I cannot, having listened to these three prior presentations, I can't stand here and say, why the rush?

I mean, why are these questions not being answered in advance of the decision to move forward with these tests? It is obviously the first question: why are we doing this; is this the right thing; should we be moving forward?

It seems to me that we're saying, how do we correct this when we haven't decided that this is what we should be doing? I think that question has to be on the table, and I think that it is quite a serious problem to pre-empt it. So I encourage you strongly to take a step back and ask the first question, which the panelists, I think their presentations -- I mean, Chris said 84 alarms. I was thinking -- I was only counting the panelists -- I said three alarms. I said a 3-alarm fire. I said, my God, what are we doing?

From the Civil Rights perspective I would say that, one thing that comes to mind is the obvious concerns for minority children, for poor children, for having a national test. I mean, we're talking about communities which are often and increasingly, isolated from the majority-wide community; communities which have fewer resources and where the parents and the teachers and the schools are often under siege in ways -- from disadvantage and poverty -- which is not experienced in the broader community.

And we look also at the minority and poor students who are in majority institutions and ask how will those disadvantages that they face in this society be translated through the use of this exam? Now obviously the question is, what are the anticipated uses? And unfortunately, Gary Philips -- I mean, since he's from the government so he's in the hot seat, so everything's been directed at you, and to a certain extent I apologize -- but I guess you knew what was coming when you agreed to come.

But I think first and foremost, the question of how these tests will be used, and in your presentation you made a point that these tests will be used like any other tests. Well, will they be validated for any other purpose, other than just giving information to parents?

It is inconceivable to me that the test could be appropriate for use for tracking, for special education placement, for high school graduation, for promotion from grade or retention in grade, without being validated for that purpose.

And it would seem to me that the government would have the responsibility of making sure that if it is anticipated that these tests will be used for any other purpose -- and I think Eva Baker made it absolutely clear that the tests will be used for other purposes, if they're in the student's file, if the teacher has the test -- I do not believe, and I don't think many people in here would believe that the tests will not be used for high stakes purposes.

Being honest about that I think, requires us to say, one, you know, will it be valid for that purpose, and to the extent that it's not, what enforcement mechanism will there be; is it capable of being policed if there is an enforcement mechanism?

So for example, if you decide to recommend that we should say these tests should not to be used for high stakes purposes, well what does that really mean? And is anybody prepared to -- is that even capable of being policed? I don't think that we can expect the Department of Education, Office for Civil Rights, the Department of Justice, to enforce that.

I think the track record in those institutions in enforcing these kinds of mechanisms is not a good one, quite frankly, and so I don't think -- I think in being honest about this, simply saying don't use it for this purpose, will be meaningless. And so it will be used for high stakes purposes, and that there are not enforcement mechanisms in place to make sure that that doesn't happen.

What is the meaning of this information to parents if there's no other information about opportunity to learn? I mean, if the purpose is you think you're empowering parents, if there is no information about, you know, teacher/pupil ratios, resources, funding, the variety of things that might conceivably put this in some context which would allow parents and teachers to do something with the information, then it's really I think, pie-in-the-sky to expect that simply giving a test score is going to have any different impact than giving grades at the end of the year that might be poor grades.

I mean, is this going to be a punitive measure? Is this just another way in which the kids and the communities and the schools which are most disadvantaged, will be blamed and said, they are the problem, they are at fault?

Further, as we talk about measuring students against a national standard but it's a voluntary test, obviously the question that presents itself is, who's most likely to opt out of this? From some of the things I've been hearing, the States that are most likely to opt out are some of those States -- some of the deep Southern States, some States with high minority populations, and query whether we then have any kind of national standard if that is indeed, at least one of the purposes of this exam.

That's just the beginning of the questions. I mean, it's obvious from the audience here, the wealth of knowledge that you have, that you all probably had all these questions to begin with and more.

I'll just say that my main point here is that this is a frightening proposition, it is being rushed into by the Administration I think, without adequate forethought, and there is reason for all of us to be concerned, and there is reason for pause, research, reflection before we move forward. Thank you.

CHAIRMAN EDLEY: Thank you very much, Janell. We now turn to two marvelous discussants, and let me start with -- Kati Haycock? Are you the first up? No, Constance Newman, who's the Under Secretary at the Smithsonian Institution and formerly directed the Office Of Personnel Management. And we've asked her and Kati to provide a concomitance before we open it up for general discussion.

MS. NEWMAN: Thank you. I thought you would mention the thing that I'm most proud of. I was in the original BOTA group, and it's a pleasure for me to be here and to participate in this conference in Regulatory -- sorry Gary, but the title did say Regulatory -- and Licensing Issues associated with the tests.

At the outset, I'd like to say that nothing I say should be taken as an indication of my opposition to the Voluntary National Test. To the contrary, I believe that these tests represent the hope that all children everywhere will begin to master the basics.

I've been involved in the last two years in the District of Columbia on the Control Board and recognize, if you took the District of Columbia alone, testing could or should mean that in the District there would be a reversal of the trends in recent years. Over the past five years, the erosion in the District's public schools has accelerated for thousands of children, particularly those in the poorest Wards.

In the comprehensive tests of basic skills, the Math scores have declined by 6 percent and Reading by 10 percent. And on NAEP, the trial assessment, 78 percent of the 4th Grade students scored below the basic reading level. So I believe -- I understand the reservations here -- but I have hope that the tests will give parents and teachers and leaders in the school administrations an opportunity to measure progress or the lack thereof, against national standards.

So what do I have to say since I've said that? I do have some concerns and some, I will maybe call them observations, in three categories: observations with regard to the impact of the test on parents and children; secondly on teachers and school administrators; and finally, the impact of the test reporting to elected officials, funding sources, and the public.

With regard to the impact on parents and children, everyone in the testing program, developing the policies, should be concerned about the challenges presented this morning by Richard Jaegar. It's important -- it's clearly important that parents understand what the results mean. They must not become so confused by the results that they are unreasonable with their children and unfair to the teachers.

And what I mean by saying unreasonable with their children, by overreacting to what may be viewed as low scores, or by not reacting, and thus not providing the reinforcement of the teacher's effort, we, I think, would all have failed in what it is we're trying to accomplish. They can be unreasonable with regard to relating to the teachers by being unrealistic about the speed with which change and test performance can take place.

We were involved in a major change in the District of Columbia in the Administration of the school system. They've been in place about eight months and we're getting beat up on a regular basis because the student's test scores haven't improved; that they aren't reading at a higher level.

And I am concerned that unless there's clear discussion of expectations going out with the test strategy, there's going to be a great deal of unfair pressure on the teachers and the school administrators, which is not going to improve the relationships that parents have with teachers.

With regard to the impact on teachers in schools, the requirements that students with disability and limited English proficient students be included is not only fair but it's an honest way for schools to really understand the performance level of all the children and to act accordingly.

Hopefully, hopefully, the teachers will not view this whole process of one of competition and thereby reacting in a negative way that they are having to include all of the students and therefore bringing down the scores to not allow them to compete with whomever they think they're competing with.

We should all be concerned about the findings of Mary Lee Smith and Laurie Shepherd, that it will be difficult to report results in ways that convince teachers that national tests are worthy of their attention. My question is, how are we going to get the buy-in of the teachers?

And if there's not buy-in -- and this was pointed out earlier today -- if there's not buy-in we'll have all these scores and nothing will have changed in the classroom with regard to method and content. And so the purpose, what I believe is the ultimate purpose, will not have been met.

And finally, with regard to the impact of reporting to elected officials, to funding sources, and to the general public, I heard Gary today and have read also, the point that no data from individual students will go to the Department of Education.

So this doesn't relate -- my concern does not relate to the individual data going to the Federal Government, but it does relate to concern about the reporting through media to the public, particularly about aggregated test results at any level -- and I'm marrying some things that have been said earlier --because it can cause incorrect conclusions to be drawn about groups -- that's minorities, the disabled, and people with limited English.

I've always been very sensitive on this point and some people say, somewhat unreasonable, but I believe this country is already divided too much. There are too many assumptions about people by groups, and a constant barrage of information that says, African-American children are performing below the national average by X percent in every venue of testing. Even though it is true, if it is not described in the proper context is going to give heart to those who want to believe and say that African-Americans are inferior.

Now I know all the academics and everybody here is going to say, you know, we have to be honest with the data. I'm only saying that when we report the aggregated data we have to be very careful about how it is used and who uses it, and what words are used, because it could be used as ammunition to further divide the nation.

And I will just say that I do believe what mitigates against this concern are some things that Gary said. If it is true that in this process, guidelines are going to be developed and the public is going to be able to participate in the guidelines, others who are concerned about this and know how best to communicate this will influence what goes out, then maybe I have no reason to be as nervous as I am.

And I will just close by saying that I picked up a few action items that I hope you have on your notes, Gary. One being that there should be extensive work done to remove the incentives for misuse of these tests. There is a need to grapple with the fact that there are students whose characteristics are not being met in the initial design; for example, students with language other than English and Spanish.

You need to be sure to spend time on communication strategies, and -- this I'm just repeating -- particularly communication strategies with parents. And the first question that Janell mentioned, I think you do have a purpose. I hope you do. I thought I heard a purpose, but what I pick up from what was said today is, there is need to have a clear statement of that purpose that is broadly communicated, and a clear statement and understanding by all, about the uses.

So with that I will end. I do echo another statement made by Janell that in getting this information out, it has to be in the proper context, and we all have to have to be working toward ensuring that once the information is there, somebody does something to fix whatever is broken.

CHAIRMAN EDLEY: Thanks very much, Connie, for all of those thoughts. Kati Haycock is one of the nation's leading child advocates in the field of Education, and she was formerly the Executive Vice President of the Children's Defense Fund, the nation's largest child advocacy organization, and is currently the Director of the Education Trust which was established in 1992. And the focus of its activities concerning children are really the interest and concerns of poor and minority kids. Kati.

MS. HAYCOCK: Thanks, Chris. As Chris indicated, I head up an organization called the Education Trust, whose sole purpose is to improve the education provided to minority and poor children in this country and in so doing, to close the gap between groups.

I want to take a couple of minutes today and explain to you why someone like me is persuaded that done right -- and I want to emphasize "that done right" -- a national test can be an important tool in the larger effort to accomplish that goal. I want to do that really, by taking us away from the technical language and talk with you a little bit about kids in classrooms.

When you spend as much time as I do, along with my staff, in classrooms, you can't help but be overwhelmed sometimes by the enormous inequities, many of which Janell described: the differences in facilities, the huge differences in instructional equipment like computers and laboratory stuff, and the large differences in the training of those who teach for minority youngsters and those who teach other youngsters.

But at least in our judgment, none of the inequities are more damaging to the achievement levels of poor, minority youngsters, than the low level curriculum and low expectations that guide their education.

In middle grades classrooms in inner cities, we typically see more coloring assignments than Writing or Mathematic's assignments. And I'm not kidding about that. I can take you to countless school districts where youngsters are asked to draw complicated borders on the outsides of their Mathematic's homework and are graded as much for staying in the lines in their coloring as they are for the quality of their Mathematics.

I can take you to urban high schools where there's a lot of coloring that goes on too, and where English teachers think it's a criminal act to assign more, for example, than a 3-paragraph essay to 11th grade kids.

In fact, my staff came back not too long ago from Philadelphia with an assignment that was given by a Philadelphia teacher to her largely poor and minority student population. This was the assignment. The assignment was to choose an historical figure who interests you, do some research -- and then here's what you do.

You find a picture of that person, you xerox it, you glue it on the center of a posterboard, and then around the picture you illustrate, decorate the poster with colors and glitter and paint. And then on a 3X5 card in each of the four corners of that poster, write a sentence or two about what you learned.

Now if I asked you to tell me what age level kids that would be an appropriate assignment for, most of you would probably say about 4th Grade. This was an 11th Grade classroom though, and the kids had one month to do that assignment. Moreover, what they got the other months looked quite a bit the same.

This problem may be even worse in LEP classrooms where teachers routinely conclude that because youngsters lack English language skills that they also lack cognitive ability. So their science content is about building dioramas and coloring pictures of fish and hanging them in a diorama. That's the sort of sum total of the content of their science.

Basically, what we do in American schools is we take kids who have less to begin with and we teach them less in school, too. Now, these practices continue, at least in part, because they're very much hidden from public view. They're hidden from parents by the A's that their kids bring home for work that would earn a C or a D in the suburbs. They're hidden from communities by reports that their students are achieving at the 45th percentile or the third stanine, whatever that means.

They're hidden even from teachers and principals by reports from State education agencies that, your kids are performing about like kids in similar schools. Which routinely means that teachers learn that their kids -- in a high poverty school, for example -- may be performing at the 40th percentile of schools like them. They're never told that what that really means is their kids are in the bottom percentile of their State.

Moreover, as Dick suggested, the testing that we do in these schools often reinforces these patterns rather than helping to improve them. In my judgment at least, there's no more urgent agenda for this country than to change those practices, for after years of narrowing the gap between poor minority kids and other kids is actually growing again, and rather rapidly, just at a time when the numbers of these youngsters are rising often rapidly.

Now, I am not so naive as to assume that just having a new test will change all that. But if you ask the question in reverse -- is an assessment an important tool, and more important, can you actually improve achievements if you continue to have only a low-level assessment -- I think you have to say the answer to that is, no.

Now, many of us this in this room -- Bill and George and I and others -- worked very hard in 1992 to 1994 to get a new Title I law that would bring about a more comprehensive attack on these problems. The foundation, the sort of core of that law, was supposed to be a single set of high-level standards in each state and guided by a high-level assessment in line with those standards.

The truth of the matter is that many states have not adopted high standards and that many more continue to use low-level assessments, and to tell their citizens routinely that their youngsters are doing much better than they're doing if you look at the performance of those same youngsters on national tests.

So the question for us is, given that reality, can this test begin to change some of those patterns; get better information into the hands of parents and teachers and to others who can bring about these changes? And again, we think the answer to that is, yes, with a few important "ifs".

Yes, if parents see not only the results of their own child but also the results for the school as a whole, and that is a fundamentally important part of the usefulness of this test. Yes, if communities are able to examine the data -- by race, by class -- and to be able to look at the gaps in performance so they can prioritize their own resources.

Yes, if we at least work very hard at precluding irresponsible uses of this exam. And yes, if we manage to do what Gary suggested is the aim of this test and that is, to include everybody we can. And I know I at least, feel particularly strongly, given what I said earlier about LEP classrooms, about what Richard suggested.

Our goal, obviously, for all LEP children, is reading in English. But we are educating -- at least a third of LEP children -- we are teaching them to read in their native language. It's very important that their parents and teachers too, get information on the reading capacity of their kids. So to say this is a test of reading in English and to not deliberately move ahead as Richard suggested, to build tests in Spanish and other language, is to me, a serious mistake.

In conclusion, Janell talked about her own worries that it's essentially frightening to move ahead with an exam that's less than perfect. I feel a little bit differently. For me, I would ask really, what is more frightening? To allow the miseducation of poor minority youngsters to continue and to continue to be hidden from public view, or to move ahead with a test that's not yet perfect, but at least can provide some leverage for changing those practices?

CHAIRMAN EDLEY: Thank you, Kati. Okay, so here's our problem, everybody. Well, you know the problem, right? So what we need to do is, we'll take a few minutes for some comments. Let me start by asking people around the table for quick comments, and I think rather than do Q&A, let's just get people's comments out on the table, and then we'll take a -- well, I don't think we will break. We'll just go into the next group and we'll really have to figure out a way to leave out adverbs or something, in their presentations so that we can try to catch up and have more time for discussions.

So, folks around the table, are there any particular comments before I open it up more widely?

MS. BAILEY: Adrienne Bailey. I guess this is just an observation of everything we heard --

CHAIRMAN EDLEY: I'm sorry, is your microphone switch on?

MS. BAILEY: Yes. An observation that pulls together I think, what was said up here, but also I think a message to Gary is that, it just strikes me of how unconnected this new test is to some of the other administration leadership issues. And that we could perhaps be well served, certainly in consideration of what Eva said about inappropriate uses, but figuring out how in fact, this connects to other issues: such as Title I, such as other initiatives having to do with parent activities, such as other initiatives where the Federal Government is investing resources in children's ability to read.

And if we're going to send a message out there for public informing, the relationship of these things to each other would, I think, be greatly improved if you could spend some time on that.

CHAIRMAN EDLEY: George.

MR. MADAUS: The trouble I'm having in weighing the dangers that were so well put, and the possible benefits is, how is it actually going to play out? In case you didn't know from my accent, I'm from Massachusetts, and Massachusetts I understand, is going to sign on to do the voluntary test.

And I guess the question I have is, if John Silver gets his way, then Dick's problem about parents understanding the results is taken care of because it's pass or fail. You get promoted or you don't get promoted. Then teachers are going to take it very seriously, believe me, and scores are going to go up, believe me.

I guess the question I have is, who's responsible for validation of that use? Is it Massachusetts or is it the Federal Government? And if Massachusetts doesn't do a good job then does the Federal Government have any obligation to say you can't use it that way? Those are the kinds of questions I have, and I think each State is going to play out slightly differently.

CHAIRMAN EDLEY: You mean you're wondering who will be the defendants in the Title VI lawsuit? Is that what this is? The Janell Files?

(Laughter.)

MR. MADAUS: The problem I have with that approach is, I don't think it's going to get very far.

CHAIRMAN EDLEY: Yes?

MS. AUCHTER: Joan Auchter. I have a whole liturgy of concerns then, and I don't even know where to start so I'll save a lot of them for tomorrow. But answering George, we played Hi-Ho-Silver with the GED testing program and John Silver who wanted to use it at the end of this year in a 3-month decision to withhold high school diplomas. It will be asked.

And so one decision you have to make, even though you're going to contract out many parts of the program, there are central responsibilities and guidelines that really have to be clearly identified up front, and you will have to have some sort of a review board to accept each proposal that comes forth to you.

But I think bottom line for me on this whole thing is trying to play between Eva's question and what Gary said. I'm hearing that it's a test for children, for the students and for the parents, but it's not diagnostic so there's not enough information to change instruction.

However, I'm also hearing it's a benchmark of performance. You want to see how your child is performing against some benchmark. And I think if we could clearly identify that benchmark as -- this shows that your child will be successful in school, or will be successful in the workplace, or will be successful as a citizen -- some sort of benchmark that the parent can then value and buy-in outside of that school system, it may be a way to help frame the reporting.

But then that also links to, are the test specifications designed to support that type of decision? And since I haven't seen those, I really don't know. But then going into Duran's comments on the Spanish test being offered -- the test being offered in Spanish in the 4th Grade -- we're also faced with that same problem, and we're translating and giving the English tests in Spanish to show if the person actually has those skills, they can be transferred to another language.

Now there's the whole issue of non-non: students who don't perform well in either language. the parent needs to know that. They need some type of benchmark. So again -- and you need to be aware that you're going to get a lot of lawsuits on. If you're translating to Spanish you'd better be ready to translate to other languages. You need to have your defense, why, up front.

But there are, in the development stages, ways you can take care of that. There's a whole list of issues that, they've put well on the table. I do agree though, that the tests -- I think that it is a good thing as Kati said, that there needs to be a standard, but it's how you get people prepared to deal with it that's going to be the underpinning of success.

CHAIRMAN EDLEY: Jay, and then Jack.

MR. HEUBERT: There are underlying assumptions on which we don't have consensus. some people say that having these tests will increase motivation; others say it will push students out, increase their motivation to leave; or encourage students to set lower expectations for what they need to be able to learn.

We have differences of assumptions about what the effects will be on poor children and children of color. Janell, you've suggested I think quite plausibly, that it will have an effect that will harm minority children and low SES children to be subject to high-stakes tests anyway, of this kind. Kati, you suggested that that very same use might be the one thing that will save education for poor children and children of color, or improve it if done right.

Those are all plausible assumptions. What's distressing to me is that we lack research, even after 20 years of various kinds of testing programs at the State level, on the basis of which we can say, which of those plausible assumptions is correct.

I know a group of students did a study two or three years ago that suggested that even students who passed high-stakes, 8th Grade tests across the United States -- this was based on Nells' data -- showed to a statistically significant degree, less gain at low levels and high levels, between 8th Grade and 10th Grade, than did students who weren't subject to a high-stakes test at all.

Now, that's certainly not the last word on anything, but it seems to me that to be going into a program like this when we have national data on the basis of which we could be assessing the educational effects of various testing policies and not to do the research, not to use it in informing policy, is a mistake.

Even if we are to go forward with the planning of this, we have several years in which to encourage scholars and others to do that research so we have some more solid basis for believing that a national test of this nature is likely to produce improved learning.

We should know, for example, whether there are districts in the country where accurate information for parents did in fact, improve educational quality in the district. We don't know, and we're proceeding in a vacuum unnecessarily.

CHAIRMAN EDLEY: Jack Knott, and then last word from Michael Feuer before we move on.

MR. KNOTT: I'm trying to understand the connection between comments like Kati Haycock's and the nature of what's being proposed here. What's being proposed sounds very weak from a national perspective, in order to carry out what Kati is saying should be the major benefit of the testing program.

And what I suspect underlies this is issues around Federalism and regulation and the role of the Federal Government in education, and that we are proposing a tool here which is, you know, a very weak, national tool. It's voluntary, you know, it's just for information, you know, it's not anything that's going to be used for all these other, unintended consequences that Eva described.

But if that's the case, why are we talking then, about -- like Janell and Kati are talking about -- comparisons between groups, improving schools, doing something about anything? The policy mechanism being proposed doesn't correspond with those purposes, and so to me there's some kind of disjunct there.

And I don't know if our country is capable, given the kind of Federal regulatory and legal system we have, to address those issues in a national way, in the political context that we're in.

CHAIRMAN EDLEY: Michael? Bill Taylor?

MR. TAYLOR: Let me just say briefly, that I think there's a broad question within which many of the concerns that have been expressed this morning falls in it, and the last speaker I think, helps to identify it.

Whatever the role of -- and I think the Federal Government needs to address. Whatever the role of the Federal Government may or may not be in other areas in education, it has a fundamental role in guaranteeing equality of opportunity. I don't think that's disputable.

What a lot of people have said this morning is that there are serious implications for equality of opportunity in this test -- how it's given, how it's used, and so on. The Federal Government, the Administration has a real interest in having a buy-in to these tests by States and localities, and in the interest of that it is recoiling from the use of the word "regulation", which we heard from Chris this morning.

And it seems to me that's the problem that needs to be faced up to. How do you guarantee, how do you meet the concerns of people about equality of opportunity under whatever name you are describing it, you're not engaging in some form of regulations, some form of ground rules? And I think that's the question I hope to hear some answers to by the end of this conference.

CHAIRMAN EDLEY: Okay. I want to thank this first panel for a wonderful --

(Applause.)

CHAIRMAN EDLEY: And if the next set of victims could come up so that we could --

MR. FEUER: It will take us about a minute-and-a-half to switch the tent cards and get people seated. But we're going to start in a minute-and-a-half --

CHAIRMAN EDLEY: That's right.

MR. FEUER: -- whether you're back from the men's and ladies rooms or not.

(Whereupon, the foregoing matter went off

the record at 11:35 a.m. and went back on

the record at 11:36 a.m.)

CHAIRMAN EDLEY: Okay, ladies and gentlemen? Let me just introduce these panelists seriatim, since that might be a little more efficient. What I'd really like to do is, we're going to try to compress this little bit so that we free up some more time for discussion. We've already heard from Eva Baker; she needs no introduction, unless you have a short-term memory problem.

John Fremer is professor in the Graduate School of Education at UC-Santa Barbara and a member of -- no, I'm reading Richard Duran's. So you don't do the same thing that Richard Duran does, okay.

John Fremer has worked for ETS for 27 years where he has held professional and administrative positions in the psychometrics test development and program development areas; currently serves as senior development leader in the admissions area. He chaired the Joint Committee on Testing Practices Work Group that developed the Code of Fair Testing Practices in Education. John, we're very grateful to you for joining us.

George Madaus is a professor of Education and Public Policy at Boston College. He's former Director of BC's Center for the Study of Testing, Evaluation, and Educational Policy, and a former Executive Director of the National Commission on Testing and Public Policy. He has served at various times on the Joint Test Standards Committee of AERA, APA, and NCME.

I hope that we're going to have a substantial amount of time after we've heard from these presentations to let everybody in the audience talk about all of the subjects that we've touched upon this morning. But with no further ado, Eva, if you want to get started?

MS. BAKER: I'm going to talk very, very briefly because I believe this group is probably familiar with the whole notion of technical standards that are developed collaboratively by AERA, APA, and NCME.

I saw Diane Schneider passing things out. What are you passing out? Pardon me? Oh, okay. The reason I ask this, that Diane is the staff person who serves the Joint Committee that's working on the revision of the Standards and I thought maybe she had something new that I hadn't seen yet.

All right. So the technical standards -- who don't know what that looks like -- are formulated in a book -- relatively short book. There they are. We hope to change the color. The book has some relatively brief, conceptual statements, and then provides sets of standards that are intended to guide practice in the design, development and application of tests in educational and psychological settings, including individual use in clinical settings or in industry and business settings, as well as education settings.

The main purpose I think, of the standards is to provide this guidance in terms of what are reasonable expectations for technical quality and how should people who are trying to make claims about the quality of the measure they're creating, proceed. And as I indicated in my earlier presentation, a good deal of the discussion involves designing validity studies that will assess the quality of the test in terms of the purpose or purposes for which the test is being used.

And the third point very quickly, is that this group consists of people who are now, I would say, in a situation of much more convivial agreement on a number of areas, but there is by no means a consensus on every detail of issues in testing practice.

And so as a general approach we have agreed to try and have the standards represent a consolidation of knowledge that has broad subscription at this point, rather than the standards as something that are pushing ahead and trying to create new and different ways of looking at tests.

And so could we be consolidating practice or are our standards aspirational? And I'd say our standards have taken, in this present revision, quite a conservative tilt. As I said, all the types of testing and the status of the standards is that the revision -- a second-level revision is out for review. Comments are due August 1.

We expect to get about what, 20 pages or so, of comments to deal with. And some time in the fall we will have revisions. Those revisions will be reviewed and we hope at that point we will be able to submit them to our sponsoring organizations or whatever ratification processes these are. We're hoping to have these standards out there in '98. We had hoped '97, but life took a funny twist.

What are the standards about? I won't go into enormous detail but in general they start with what the conception of validity is. What do we mean when we're talking about validity which is at the heart of test quality? And the standards are fairly clear on that.

The standards then provide guidance about expectations on other technical aspects that people have come to expect to be treated in a book such as this, as indicated. I think the innovation for this set of standards is, we have a special section on fairness. Fairness had in the past, had been distributed throughout different applications.

This time we are trying to make a clear statement about fairness, although legitimately, fairness concerns are part of a more overarching validity argument. We felt that we wanted to highlight that because of the importance of fairness for testing practice.

And then we, in the standards, we then moved to particular application areas. There's a cluster of applications about individual or clinical uses of assessments in psychological settings, and this year, the ones that I think are principally -- but there are maybe others that are also relevant to you -- there are chapters and associated standards that deal with educational assessment, policy and program evaluation uses, and test use with second language learners. So I just provide those as examples for you.

The next page is simply, who uses these? From analyses that have been done, really related to the sales of the book and to our own sort of anecdotal evidence, it seems to us that test development firms use these standards a lot; whether they use them in a pre-emptive way or as a way of providing guidance.

You know, there around the room people can tell us, but when I visited various test companies I'm just astonished at how carefully and how well-known various standards are to individuals who are creating commercially distributed tests. This is something that they do pay attention to.

By far, the broadest use is in graduate training. People buy these standards and use them in courses to help prepare professionals, and then there's some evidence that the standards are used in the legal profession to contend on various civil rights and other issues about appropriate test use.

One of our panel members, Diana Pullin, was doing a study related to how many cases the standards have been cited in, and so on. And they're not a huge number but they tend to be quite critical, and her sense is that the existence of the standards in some way creates an environment such that certain things don't come to court because contentions are resolved earlier.

Last of all, this is the one that bugs me the most: the question of compliance or enforcement. And I mean, this is exactly where I'm schizophrenic because in my earlier presentation I was talking about, you know, conversational -- let's all sup together at the same table to create, you know, our view of the world. But in this case I'd really like to stick it to people who don't use these standards.

And where we are right now is that there's voluntary adherence. It's a matter of professional ethics and there are no sanctions at this time, although from time-to-time they are proposed and considered.

The reason that I believe that there are, and probably will be, no sanctions associated with the use of the standards, is that the user community is very, very broad, consisting of large testing companies with presumably great resources, to Mama and Papa test developers for whom large-scale validity studies would be far beyond their marketing and even perhaps, operational capability.

So if we say that the test standards have sanctions associated with them, then they would apply to all the constituencies. And there are strong constituencies, particularly on the psychology side I would say, who would resist that approach.

So that's all I think I need to say at this point. Thank you very much.

MR. FREMER: There are two handouts for my session. One is the Code of Fair Testing Practices in Education. You should have all received one. And to just give you the summary argument -- I don't know that I particularly need to make it -- for the national test, there should be developed a document like this -- possibly incorporated. And so, all the other things I say will kind of relate to that, and hopefully they'll make sense.

Also, my talk, I'm trained as an Educational Psychologist and one of the things I learned -- I can't remember the vocabulary anymore -- but tell people in advance exactly what they're going to learn, and then they'll hear it and then review it later. This is exactly what I'm going to say, and I won't depart from it. To highlight my comments I use cartoons. It's not because I don't take you very seriously. I respect you now and I'll respect you after you've heard --

(Laughter.)

I'm going to talk about ethical conduct and high-stakes testing because that's what will matter about the national test. The times when test results are used for high-stakes are the times when we'll start worry about ethical conduct or whether it's being engaged in, and whether or not children, schools, classes, States, remain disadvantaged.

Now, when you move into the area of standards, and you're like me and you've been collecting cartoons for 35 years, one of the things you notice is that the cartoons about research and testing often have somewhat of a religious flavor. And I absolutely, again, don't mean to offend anybody; these just seems to be connected.

And here we have -- and a creature is saying, my sermon today is on the ten loose guidelines. He says, I hate these modern translations --

(Laughter.)

And that's one of the things that we're going to have to face. We have standards and guidelines and no one can agree on what they mean and they seem to permit everything. Those are not the ones we need to have positive outcomes for our national tests.

We'll find that most attention is paid when we get into uses like grade promotion -- not that the Federal Government is urging that. You have a 4th Grade test and an 8th Grade test. There have been examples of such tests used for grade promotion. These will be very carefully constructed, high quality tests. Is it likely that someone will want to use them for that purpose? I think it's very likely.

I'm going to cover, or at least touch on, these areas: developing our selective tests, test administration, security -- main emphasis here -- interpreting scores. And I won't really get a chance in the time limit to cover the other things, but since they're in the Code, since I think they're important, I included them in my notes and I would prefer to look at them later.

You can't pay too much attention to issues like fairness, and designing and following up on the testing program, without this specific concern we have here. I hope that my handing it out will cause you to look in detail at the Code of Fair Testing.

And one of the bits of advice I will give, if anyone chooses to follow through really and make guidelines for the national tests, as someone who worked on this project -- and also point out George Madaus worked on it, Richard Duran -- is when you get finished with it, print it on nice paper. It makes it look good and have a design.

Because if it were just 8-1/2 X 11, white, stapled upper left, people wouldn't take it seriously. That may seem like a small thing, but from someone who's been trying to communicate for that testing for a long time, it really isn't.

Here's the areas that we cover. A point that we make in the Code is that it's widely endorsed, and I won't read off all the names that are in my notes, but if you just store this away, every major organization whose members use tests, and every major test publisher, has endorsed this Code. No one has declined to endorse it. So the companies and the organizations that haven't endorsed it, it's because they weren't asked.

And we just got tired, because this is a volunteer effort. We kept going around asking people and they kept saying yes. Eventually, we had reached the companies that test over 95 percent of all of kids that were ever tested in America, and we thought no one else would have the nerve then, not to endorse.

How do you not endorse a testing code that the American Psychological Association, American Educational Research, NCME, Psychological Corporation, and so one -- ETS, College Board, ACT. So assume that it has everyone's support.

Now, what do the Standards say? Again, I won't look at them in detail but I'll look at groups of them. Here's a set of developing a test, and I'll call your attention to the kinds of words that appear: define, accurately represent, explain, describe, provide evidence. What this set of Standards says, you have to have good communications going out about the test. I think the Department and NCES are trying to do that.

It goes on, in another case when it tells you what the developer should do, it tells you what the user should do. And the very first thing that a user ought to do is to define the purpose. Now, these are -- to cover all tests. In the case of the national tests it's coming from the Federal Government, so they have the job of defining the purpose. We've already heard here that not everyone is convinced that that's been nailed down yet.

Moving on to the development stage, the one I want to highlight here is, we talked about providing sample material, making sure it's appropriate for all the people, and then publicizing skills. One of the hardest jobs there could be, what's an adequate amount of evidence to present if you have appropriate tests? It's something about which you could argue when you say, appropriateness to lots of different groups, the need for a larger body of evidence, I think that's greater.

We move on now, into the area of test preparation. And here what we want to do is avoid the situation where students -- I hope this is a safe topic -- I've gone outside our species --

(Laughter.)

You can see my concern about offending anyone. What we have here is Berryview Kennel and Dog Obedience School, and as you'll see in a minute, these dogs have just had a test and they're talking about it later. And one says to the other, I did well on the spoken commands but that essay question was a tough one.

(Laughter.)

I like that one. I like it for a lot of reasons. One of the reasons I like is I think it makes in a very simple way, you should know what you're going to be testing on, and everyone going in should have a proper amount of information.

The issue of test rep was not covered as much when we first made that Code, as it would if we were doing that now. So I want to go just a little bit beyond the Code into some work that Bill Mehrens, a widely-known, respected measurement person, has done on test preparation.

He's developed a scale, and I'll just give you two ends of it. He says it's always equitable to give general instruction on what the test is measuring and to teach test taking skills. I don't think anyone could argue about that, although issues like a national test get emotional. People would want to argue that anyway, even when it seems clear.

The other end of this continuum though, isn't as universally agreed upon. He said that you should never -- it's never equitable to provide practice on the actual test. I think most measurement people would agree to that. Sometimes people from the larger educational community have a disagreement.

But even within the measurement community, sometimes a disagreement of what you can provide for practice on a published, parallel form. And the model here is like a standard achievement of an Iowa test, and so on. Where the feeling Bill Mehrens noticed is that it's so specific and so structurally regulated -- if you know one form being tested from the other, cannot be a fair situation where someone hasn't had that experience.

Whether that will be a problem with the national test will depend on the nature of the test. I worked with Ed Graber, formerly of Michigan Department of Ed., and with the Chiefs, on developing standards of what people should do to make sure that all students were adequately prepared.

And essentially we say, make sure that the same kind of test is available to everyone, that everyone knows the administrative procedures and they're closely followed, and that you maintain test security. And as was pointed out, with the availability of Internet, the test security issues is going to be a very serious one.

And it isn't just a technological problem; it's a climate problem, a school climate problem. What the attitude is toward cheating in students. Now, these two people have just been told that there's going to be a test, and they've just been given an instruction by the teacher. Here's their response: what exactly do you mean by, no cheating?

Attitude that, it's really probably okay to cheat. It's unfortunately very prevalent when you do sort of blind studies where people think they're responding anonymously and you ask them if they've ever cheated. And even if you're looking at honor students, or students at institutions that have honor codes, you find out that many of them report that they've actually cheated.

Moving on to interpreting scores, one of the areas that I think has already come up several times and it's going to be a really important one for the national test, is warning about misuse. The spirit of the Code is, it addresses people that want to do right, they just need to be sure that they understand what the right way is.

So the logic of the Code is, tell everyone what are appropriate uses and what are inappropriate uses, and make sure that's out there anyplace where it needs to be. I'm sure George will say -- well, I'm not going to try to take George's time away.

Our committee has tended to take the position that most misuses are out of ignorance rather than malevolence. It's not that people deliberately want to hurt students and misuse the test, but they just don't understand what constitutes appropriate use, particularly for teachers. Their job is helping students -- to give them a little extra help, a little more time and some more explanations.

Sometimes it's absolutely perfect and other times it destroys the ability to make inferences. So when we test users we have the argument that, the developers have to tell the users what's good and what's not good, and the users have to follow that advice.

In the area of interpretation, one of the problems we'll have -- I think it might have been Dick Jaegar that talked about enumerosy. This young man is coming home and he's going to talk about the results of a test that he took today, and he says, "I finally got 100 on a test Dad -- the SAT".

And that's maybe to argue a little bit about what parents want, although I'm hearkening in the face of research which is always risky. Here -- but I'm using my friends, the cartoonists, to help. "So how is Ditto doing relative to other kids in the class?" Ditto is their son. "This isn't a competition, Mr. Klagston. We judge each individual student on his or her own abilities and achievements." "That takes all the fun out of it."

I think there's a lot of evidence that people want more than statements of what their children have learned. They want normative statements of some kind -- failure to provide them is quite apparent.

I think in terms of time, that is one of the logical stopping points in my talk and I think I'll choose that as a logical stopping point. I've covered the main areas I said I would cover, though only to my closing which I have to do with -- whatever the Federal Government does, there will open interpretation of the results.

And back into the religious world, for example. This is Judgment Day, I assume; I assume that's Saint Peter. This poor, unfortunate is learning something about the evidence that's being looked at to make a judgment about, and it says, "You're kidding. You --

(Laughter.)

MR. MADAUS: I wish I had some cartoons. Your program says I am to address the question: are these mechanisms, the Standards and the Codes, enough?

No. I could just sit down. But they are the basis I think, for serious evaluation of any kind of testing program, and what I am going to talk about is another mechanism that builds on the use of the Codes and the Standards.

It comes out of a report done for the Carnegie Corporation and the Ford Foundation in 1993, entitled "The Proposal for a Monitoring Body for Tests Used in Public Policy". It's a very lengthy report and I'm just going to hit the highlights for you, because I think it has applicability for the Voluntary National Test.

It builds on the National Commission on Testing and Public Policy's recommendations that the enterprise of testing must be subjected to greater public accountability. Further, the National Commission recommended the development of additional, institutional means to examine the quality of tests and the assessment instruments, and to provide oversight for test use. One of the Commissioners was then the Governor of Arkansas that signed onto that recommendation.

In looking at how you would institutionalize something like that, the National Commission affirms society's interest in certifying the competence of individuals and holding institutions accountable and the important role of assessment in our national life. And it isn't the intent of the body that I'm going to talk about, to curtail or inhibit testing. And it's not that this body should wait for the perfect test; there's no such thing. And it doesn't mean that if you find that there are problems with a test once you start to use it, you have to scrap the program.

But evaluation and monitoring of testing programs means that the public who pays for such programs, and those who use and are directly affected by such tests, should have assurances that the programs are technically sound, the benefits outweigh the harms, for all groups in our society, and that negative side effects are minimized and misuses curtailed as far as possible.

When we started this project we thought that the easiest thing to do would be to have a Board -- a national Board -- and that that Board's job, the staff's job, would be to operationalize the test Standards and the Code.

That is, if you look at the Standards and the Code, the devil is in the detail, and you can get experts -- I've been involved in this -- on the standard say, this standard has been met, and another expert can come up and say no, this Standard hasn't been met. So we felt that gee, if you could get an evidence trail attached to each standard, you could then get a better idea whether the standard had been met in a particular instance.

So we were looking at first, for a specification of a generic, operational audit trail for each standard, universally applicable across context -- isn't that stupid? -- and made less -- and once we got into this we scrapped the idea, because we think the testing program evolved in different ways.

And this Voluntary National Test I think, is an example. How it gets used in Massachusetts will not be the same as how it gets used in Missouri. So while the Standards and the Code form the basis for professional practice, we do not think that we can operationalize them in unique ways that cover all avenues.

What we did then is -- in trying to come up with a design of a body is we looked at 16 different agencies who did -- regulatory, monitoring, protection, licensing, accrediting agencies -- and we tried to find out from a really, in-depth look at them, what their strengths and weaknesses are. So we picked 16.

I'll just give you, for a flavor of what they were: The Federal Deposit Insurance Corporation, The Consumer's Union, Good Housekeeping, National Collegiate Athletic Association, The Supreme Judicial Court of Massachusetts, Bar Admissions and Lawyer Discipline Body. That's just a few of the 16 that we looked at in-depth. And again, if you want details of this, the full report is available.

In looking at them, we looked at how they were controlled, what the object of regulation was, the background and rationale for each organization -- how they got started -- its organization, its funding, its staffing, its operations -- including the trigger for action -- the criteria or evidence used, results, appeals processes, and miscellaneous. And then we did an evaluation and critique of each of these agencies.

By the way -- and I just don't have time to go into this -- but this whole call for regulating tests or monitoring tests or evaluating tests -- whatever word you want to use -- has a long history going back to the 1920s that I won't get into. But at the Federal level it surfaced in the Hobbes Report of 1968 and then died.

And I also will skip over the triggers for setting up regulatory, evaluation, monitoring bodies. Basically it comes down to, you need to things. You need high hazard and high outrage before these kind of bodies get put into effect. And I would argue that testing for some populations is very high hazard in high-stake situations, but generally there is little generalized outrage over results.

Now, let me just go on to what this Board might look like. We are in the process of investigating whether or not the National Commission on Testing and Public Policy could be reconstituted, because we think that this oversight body, monitoring body, evaluating body, should not be a Federal entity. It should be outside of the Federal realm.

And we feel that it needs to be a self-perpetuating body with a very distinguished Board of Trustees, Board of Directors, and made up of various elements in our society that are interested in testing issues; from corporate and business leaders, to politicians, and academics -- and not just academics in testing but academics in many areas -- and certainly representatives of civil rights organizations and other groups. And again, we have an outline of what this Board might look like.

We think that the Board must then, have its own permanent staff; however, given the evolving nature of the issues that arise in testing programs, the Board will have to remain flexible and adaptable, able to react in an ad hoc manner to the vast array of issues which will inevitably arise.

And we've really only touched the tip of the iceberg on the kinds of issues that are going to arise. Richard's area alone, is going to surface many, many different areas. So therefore, staffing patterns will need to change to address these emerging issues.

To get the thing going we propose the Board with a Director or co-Directors and a professional staff of six to ten people with the following skills: psychometric assessment test development skills, qualitative naturalistic evaluation skills, quantitative pre-ordinate, quasi experimental and evaluation skills, policy analysis skills, negotiation and arbitration skills.

In fact, as we did this study more and more, we came to the conclusion that the way to approach this -- and I'll come back to this a little bit later -- is through negotiation and arbitration. And finally, administration and political skills.

We saw the Board as monitoring testing programs to help ensure that they are conceptually sound and satisfy all relevant, technical and ethical standards; that they improve the balance of benefits over harms associated with the program; that they monitor ethical and equity use of tests and call attention to misuse; and forestall disputes and litigation concerning testing through early intervention and non-adversarial participatory procedures.

We then went on to list the principles that would guide the Board's work. There were five fundamental principles. Beneficence: all high-stakes tests should be useful and beneficial and demonstrably more appropriate for their particular uses than available alternatives. Evaluation and context: all high-stakes tests, testing programs, and test scores themselves must be examined and evaluated in the particular context of their actual use.

Precedence of interest of affected parties: in the design, development, implementation used and interpretation of high-stakes tests, the focus of interest and inquiry should always be the interest and rights of those individuals and institutions most affected by tests.

By the way, I keep using high-stakes tests because to me, it's taken for granted this Voluntary National Test is going to be high-stakes test.

Open critical inquiry: all high-stakes testing programs and practices should be subject to broad, open, critical inquiry. Flexible application of recognized Standards: applicable technical -- things like the Code, ethical Standards as the APA has -- and principles should be applied to high-stakes tests and testing programs in a consistent, yet flexible manner which is adapted to the exigencies of the particular context.

In addition to the permanent staff, the Board also ought to have the capability of assembling two sorts of project teams or committees. And what I have in mind here is the sort of standing and project committees used by the National Academy of Sciences. Standing committees would be ones organized and last over a fairly long period of time, say more than two years, and would be responsible for monitoring testing programs or issues of particular importance, over an extended period.

The big stumbling block in all this, for the last four years since this idea has floated around, is how do you fund something like this? There's absolutely no question in my mind that you can get initial funding for three to maybe five years from Foundations to get this set up. But Foundations don't like to give birth to babies that then they have to raise for the rest of their lives.

So you would have to seek funding, and I think ultimately, you would have to get some Federal funding, just as the National Board of Professional Teaching Standards has had to get funding.

Well, I'm basically out of time, but I then go on to talk about the triggering mechanism that the Board would use, the criteria and evidence that it would use. Its operating strategies would be based on models of negotiated rulemaking procedures of the government, and to getting to yes, procedures in the business sector.

Then one of the key operating strategies of such a Board would be diversity of interest, simultaneously with offering technical confidence in psychometrics in related fields, this Board would also have to embrace a diversity of interests and perspectives in its outlook, policy formation and its work.

The widespread use of tests as instruments of public policy inherently implicates other disciplines and perspectives, such as political theory, sociology, administrative and bureaucratic goals and concerns, economics, public accountability, and many others. In carrying out its mission to foster appropriate and responsible uses of tests, the Board must consider these perspectives and the broad issues of educational and social policy.

Just to close, quickly, I want to come back to something Eva said in her first incarnation up here. I've always believed that we can learn a lot from the history of technology, because testing is technology. It's a well-developed technology where the technology community -- many of whom are in this room today -- and technical underpinnings that are arcane to most people.

If you think about some of the underpinnings of our profession, one of the analogous things to think about is some of the hidden algorithms that are used in IRS audits, bank credit checks. There are things that go on behind scenes that people really don't know much about and take for granted, and testing is not unlike that.

Testing is a complex, technological system with its own infrastructure, akin to transportation, power, communications, computing, and manufacturing systems. And policy decisions about testing have technical implications, and technical implications and technical decisions have policy ramifications.

And I just close with one other metaphor here. The medical profession would not swallow, if faced with policy decisions to introduce a new, major, untried medical technology to millions of children, particularly a treatment that would be given to healthy children as well as those who are ill. The medical profession instead, would ask about safety, efficacy, quality, and social and economic effects of such treatment.

And I think that this is something that a Board, a national Board overseeing something like this, could do very, very well.

CHAIRMAN EDLEY: Thank you, George. Questions and comments? We have a good block of time available to us. Dick Elmore.

MR. ELMORE: Let me begin by saying I'm speaking for myself and not for the Board on testing and assessment. This discussion has to me, an astonishing and slightly creepy air of unreality about it. And let me try to make my concerns concrete with a couple of examples.

One is the famous John Silver proposing to use the GED as a high school exit exam example, in which we just barely -- just barely avoided having every high school senior in the State of Massachusetts take the exam this spring.

The other is an article which appeared in The New York Times this week which has Rudy Crew, the Chancellor of the public schools in the city of New York, declaring that his goal is to have every 3rd Grader reading at grade level.

Reported, absolutely deadpan, with a completely straight face by a reporter for the newspaper of record, without even a nod in the direction of the possibility that a norm referenced test might not be able to produced such a result.

Now, my concern is the following. As professionals, we gin up these ideas about what the appropriate use of tests is. We even, as George has suggested, gin up utopian designs for institutions that will enforce these ideas.

These are largely, if not totally, disconnected from the reality of the political incentives that are driving the use of tests and don't take account at all, of what the actual conditions are under which these things will be used, much less the issue that Bill raised earlier about what -- so you have Standards, so you have a body, so what's the regulatory or legal authority of such a body, and who cares?

I personally, would like to see the meeting between some representative of some Board and John Silver, the purpose of which would be to discuss the appropriate use of a test. I'd like to see that meeting. I think the amount of leverage any external authority could exercise over such a person would be, to say the least, minimal.

John said in his remarks that he thinks that most misuses are out of ignorance rather than malevolence. What strikes me about that formulation is that it could only come from a discipline which emphasizes individuals and not collectivities. What's missing from that formulation is the whole idea of incentives -- institutional, individual, and collective -- in which it doesn't really matter whether it's ignorance or malevolence; it's what works for the political purpose of a given elected official or institutional at a given moment.

And that's what's missing from this discussion. It's great that we're at the level of talking about what the principles are that say what acceptable uses are or are not, but the world in which these things are going to operate doesn't have anything to do with that. It has to do with politicians making hay while the sun shines.

So there's an air of disconnectedness between our best professional ideas about what good uses of tests are, and the world out there in which the tests are being used, and I don't get the connection. I don't see how we have made any progress by simply specifying what the standards are and inventing institutions which are anchored by skyhooks, right, that have no basis in the actual institutions and individuals who are making policy decisions about the uses of these tests and the incentives that drive those things.

I don't think a standards discussion has any weight in the practical world of policymaking or practice in education, unless it's anchored somehow in the reality of those institutions.

CHAIRMAN EDLEY: Let me try to respond a little bit to Dick's concerns, which I obviously -- maybe not obviously, but I agree with 100 percent. Here's the theory of this session from my perspective.

One reason we wanted to have it is because there is this track record of professional norms with respect to appropriate design and use of tests, followed by misuses -- abuses of various sorts. The theory was, here we are embarked on a bold, new venture directed by the Federal Government.

And the question on the table it seems to me is, with an inventory of the potential risks, is it possible to design mechanisms that will respond to the anxiety that Dick expressed? Are there things that could be built-in to the licensing? Are there things that could be built-in to the administrative arrangements between the Department of Education and LEAs or State agencies? That could go a long way in closing what has historically been this gulf between the theoretical statement of norms and the actual practice in the field.

In inviting people for this session -- for the two days -- and in structuring it the way we have, the goal was really to some extent, to really try to break some new territory here and bring experts on testing together with some people who might have some ideas about institutional design, concerns of enforcement, and people who are education policy mavens, to see if there could be some synergy in actually trying to close the gap that Dick Elmore's comments point to.

In a way I think, where the rubber meets the road is happily, in Dick's hands in moderating the panel this afternoon.

MR. ELMORE: Nicely done.

(Laughter.)

CHAIRMAN EDLEY: Thank you, thank you. And the reason we wanted Dick to do it was because he has this same anxiety. But just to forecast it a little bit, the goal this afternoon is really to ask, when we look at examples in other contexts of a complicated, institutional structure with inter-governmental dimensions, are there any useful models for regulatory arrangements, professional arrangements, etc., that enforce norms and that create accountability to prevent abuses of various sorts?

So we've had an inventory this morning of what the risks are. We're going to look this afternoon to see whether there are examples from other fields of how risks can be addressed, alleviated. And then tomorrow, ideally, the working group will try to get very concrete, indeed, in order to give advice to the Education Department. Period. Paragraph.

Last, it obviously is not the function, the role, the talent of BOTA to engage in the lobbying activity, to get the Education Department to adopt whatever pearls of wisdom, whatever great ideas might result from Dick's discussion this afternoon and the panel tomorrow.

But I hope that by simply by having the meeting and perhaps by producing a small report out of this, there will be a broader community that will be trying to work with the Education Department to assist them with the challenges that Dick has identified.

I don't know, maybe I'm on drugs. But that was the idea behind the thing. Other -- Eva?

MS. BAILEY: I just would like to say that my intention in the first session, Dick, was to talk very much about the tension between seeing purposes identified by some entity and then enforcement frame of mind to deal with the use connected to them, as opposed to looking at the practical world, seeing what the incentives for what we would as professionals, imagine to be misuse, and try and find a way to vitiate or minimize the incentives for uses of that sort.

And I think that the important idea that I hope that BOTA will think about and that the Department will think about is, again, that these purposes are going to expand and grow, sort of amoeba-like in ways that we, you know, that we really don't understand and we can't control at this point. And the best that we can probably do is not think about regulation and enforcement as much as clarification, education, and the kinds of explanations of risks and mitigation for particular kinds of things that we might imagine to occur.

MR. ELMORE: -- to read through John Silver's -- points of having Standards?

MS. BAILEY: I'm not talking about the Standards at all at this moment. I'm talking about simply, the approach that when we think about some innovation like this particular proposed, national test, that we try and think about it in a way that is not trying to constrain the way it's used and define people as good guys and bad guys, but to try and find ways of optimizing probable uses and explaining to people what reasonable things they can say and what reasonable things they may be able to do, and to broaden that discourse.

You know, as far as -- you know, I mean, there's all kinds of people who say crazy things about tests all the time and I think the public, probably -- I don't think The New York Times did it with a straight face; probably the reporter thought that was just an appropriate goal. I mean, I'm sure.

MR. ELMORE: I'm sure he did, actually.

CHAIRMAN EDLEY: If I can interject. There are three related issues. One is, what ought the standards be? Secondly, is it possible to get those standards adopted by Gary and company? And the third is, can we imagine an enforcement mechanism that would possibly be effective? And we have a bunch of lawyers here so I'm sure we -- at least the idea behind inviting them was so that we would get into these ideas of enforceability.

Let me just also say that if people will -- as they're giving comments, not only give their name, but there are three things in particular I suggest we try to sharpen. One is, there have been a lot of statements of folks believing that it is inevitable that the test will be misused for high-stakes purposes. I just want to test whether there really is a consensus on that.

And secondly, there is many people saying they think it's inevitable that there will be abuses with respect to the reporting and interpretation. And the third is, I think several people argued that they think that it's almost inevitable that there will be abuses with respect to inclusion and accommodations -- LEP, etc.

So if people would indulge me by commenting on whether or not those inevitabilities are broadly-shared judgments. Michael, and then -- Michael Feuer.

MR. FEUER: I want to, if I can -- and I also don't speak on behalf of the Board, although I'm not really sure what that means given that I'm Director of the Board. I want to go back to Dick's question and Chris' answer and underscore what I think the combination of that question and answer signify for the purpose and my expectations of these two days.

It's not surprising that Dick and Chris and I share some of these kinds of concerns because we have all three spent some time in and about the policy analysis movement; which is, as some of you may know, policy analysis has become a bona fide discipline for graduate study and in some places, even undergraduate study.

So it's not surprising that when we started looking at some of the issues surrounding, not just this Voluntary National Test, but educational testing more broadly, what we see is a set of risks associated with either intendedly unethical behavior -- although I don't think those are as interesting -- or ethical behavior with unintendedly unethical or undesirable consequences.

Coupled with a general sense that most people in this country -- professional educators, students, parents, the general public -- which much prefer good test use over lousy test use. And I think that's a fair statement for the test publishers, for the test developers, for the professional testing community. They would all much prefer good test use.

After all, John Fremer who's been spending most of his career working on this Code, also works for one of the largest test companies in the world, and the fact that he has devoted and they have devoted time to this is an indication that people generally prefer good test use.

Third, the sense that without some form of what the political scientist, economist Mancur Olson once called mutually-acceptable coercion, the good intentions of all of these actors don't necessarily add up to an outcome that society views to be good. Now, for people who have sort of lived in the policy analysis movement for a while, this sounds awfully familiar.

The idea that exhortation as a policy tool is perhaps necessary but not sufficient -- which is what George has been telling us -- is also not new to students of policy analysis. So all of that led us to think that what we've got here is something that sounds kind of familiar in some ways, and one needs to suspend a little bit of disbelief in order to even entertain any of these kinds of analogies.

It is safe to say I would guess, that most Americans prefer clean air to dirty air. They always have, undoubtedly, although there's probably a few crazies out there who actually like breathing fumes. But let's just assume for a moment that we've got most people preferring clean air. Clean air did not come to this country just on the basis of people's good intentions. It required law; it required other mechanisms of coercion.

In other words, we got together as a democracy and decided that it wasn't enough that we wanted something, we had to figure out institutions that could be designed to get us there.

And the irony of course here is, that we wanted to design institutions that would get us to where we want to be, which is when you think about it, why political economy and why the work of people like Mancur Olson and others has been so profoundly important, but also so profoundly lacking, I think, in recent discussions -- not just about testing, but about the general state of affairs.

Back to this conference. When we looked at those possibilities for analogy and possibility -- and now what's most important here is, our belief that the Federal Government, the Department of Education in particular, Mike Smith and Gary Philips in particular -- do indeed, have the best of intentions for the purposes and uses of this testing program. And that they have come to the National Academy of Sciences seeking guidance on the design of mechanisms to ensure that some of those risks -- all of those risks to the extent possible -- are mitigated, and that some of the benefits can be realized.

That's a fundamental, necessary condition for the whole, continuation of the discussion. People who don't believe that they have that in mind and that they are not interested in designing good institutions, you're welcome to stay but you're not going to get much out of this. Our, sort of assumption going into this, our prior, for you basians, is that they do have good intentions in mind here and that they do want to design good mechanisms, and that's what brings us to the business of licensing.

They have developed a model that they believe can, at least for the purposes of starting a discussion, reduce some of the downside risks associated with this program, and elevate some of the upside benefits. And the only reason that we have rehearsed -- and I know for many of you this has been a rehearsal because you've heard these stories of how tests can be used and misused -- the only reason we've done that is to lay a firm foundation based on prior, empirical, historical evidence of the need to think carefully about institutions.

And this latest presentation about what mechanisms exist which are primarily hortatory, sets the stage for a discussion that we will begin this afternoon about what do you do that's better. In order to do that, we thought it would be useful -- and I'm very eager to conclude this little sermon so you all go to lunch, but also so that you can be thinking about this as we prepare for the next session which Dick will moderate and through which we will hear about other examples in the management of inter-governmental relations and in the development of policies that solve some of these fundamental dilemmas of individual choice, and ultimately, collective responsibility.

CHAIRMAN EDLEY: I hate to say this, but I think we probably ought to stop. Now, here's what I would recommend. Rich and Bob and Michael and I will try particularly to make ourselves available in the next several minutes, to collect comments and ideas from people since we didn't have enough time for audience discussion.

But I think if you particularly have a point you want to make sure gets into the discussion -- a sense about risks, a sense about the relative importance of risks, etc. -- please try to get to one of the four of us and we'll collate the feedback that you give us and try to help provide a little bit more structure for the discussion.

(Whereupon, at 12:43 p.m. a brief luncheon recess was taken.)

A-F-T-E-R-N-O-O-N S-E-S-S-I-O-N

1:48 p.m.

CHAIRMAN EDLEY: Next at bat for a repeat engagement, we have Gary Philips. Now again, the theory here is that Gary, in addition to getting a chance to clarify the widespread and deep misunderstanding that was evident this morning in your concerns about the plan, is going to talk about the licensing strategy that the Department has.

In essential respects, the theory behind this is that, apart from rebuttal time, it would give everyone a sense of what the department is contemplating with respect to licensing so that Dick Elmore and his group afterwards, will in a sense have something against which to compare some of the other kinds of experiences -- inter-governmental and program experiences.

And last, by way of throat-clearing, again please be trying to accumulate for yourself a sense of what the significant, most important risks or warning flags seem to be, based upon what you've heard this morning. And now beginning to ask, to what extent does the licensing scheme contemplated by the Department satisfy you that those risks have been anticipated and suitably addressed?

With that Gary, if you will -- we welcome you back to the microphone.

MR. PHILIPS: Well, what I would like to do is, I know at the end of the day -- either today or tomorrow -- we'll have lots of time for discussion. But before I get into the licensing I want you to know that I listened carefully to what was said this morning and I understand the issues. I think there are a couple of categories of things here which I want to mention, and as I said, later I'm sure we'll get into these in more depth.

I know there's a set of policy decisions that you and maybe others would like the Department to make up-front. And I mean, the decisions have to be made eventually, particularly decisions about reporting, inclusion, that sort of thing -- and a few others.

It's not as if we haven't thought about this. We are working on these decisions and some of them will be made. But some of them need to be made I also think, in a context of a broader discussion. Once the RFP is awarded we will have opportunities to discuss these things in more depth, look at the literature, look at data, have a variety of people have their views expressed, and some of those decisions will be made then.

I do want to assure you that in terms of the quality control in this program, we will have the quality control that I think you'll be proud of, and that's really what we're working towards here. And as you hear about the licensing and then after this meeting and future meetings, I think you'll hear more and more about that.

So anyway, I do look forward to having discussions about the issues that you raised this morning. Let me talk about the licensing of the Voluntary National Tests.

The general strategy here is again, what we wanted to do was to, we want to develop the test, make sure that it has the quality control that we need, that we want, stand behind its technical integrity, but to make this something that belongs to the districts and the States.

Now, I know this is a balancing act; you need to trade off some things to get other things. And what we would like to do -- and so what we came up with was a strategy of doing that through this licensing mechanism.

The general way this would work -- and I'll get into the specifics in just a moment -- is if you're a district or a State and you want to administer the test, we will be awarding a contract to a licensing management group. We don't know who that group will be yet. It could be a company, it could be an association -- who knows? It could be a consortium of people.

The purpose of this licensing contract would be to issue licenses to companies and to others that want to administer a score and report on the test. In order to be licensed, in order to get the license, you have to demonstrate through the standards that will be developed in this RFP, that you will -- that you have the capacity and the quality control to maintain test security, to administer the test properly, score the test properly, and report on it properly, following the guidelines that we will have available.

And there will be guidelines on test utilization, as I said; there will be guidelines on reporting; there will be rules about inclusion and accommodation; and there will be rules about administration and test security and that sort of thing. All of those have to be developed through the various contracts.

Now, in addition to that, this company -- or consortia or whatever it turns out to be that would win that award -- their responsibility is, in addition to administering the licensing, they're also responsible for monitoring a random set of the administration sites -- classrooms where the test is administered -- and the scoring sites. And this would be random visits, it would be somewhat like NAEP except probably not the same percentage of schools would be monitored.

And which school that's going to administer the test, they would not know until the day of administration that they're going to be monitored on it, so there's no way of knowing if you're going to be monitored so you have to be on your best behavior.

The purpose of the monitoring would be to provide information to the government and the public that things are going well. It really is to monitor the system. If there are systemic problems then those would be fixed in future administrations. For example, it might be that the administration is going well but the scoring needs some work, something like that.

And so the goal here is not to monitor every single school; that will not happen. The goal here is to monitor the administration process, the scoring process, and the reporting process.

Again, you should have another set of overheads in your packet, so rather than using the machine I'll just speak from the hard copy. It looks like this. Now, this is really why we wanted to have this meeting. Because, I know there are lots of reasons why you might like to have had this meeting, but this is the reason that we wanted to have the meeting.

Because the licensing concept is a new one and so we want to get as much discussion and views and input from you and others as we can get. We did have one public meeting; this is another public meeting. It's a little bit awkward for us in the process of writing an RFP, to have private meetings with people that might be bidding on -- well, it's not a little awkward, it's illegal. We can't have private meetings with people that might bid on this contract, for example.

So we need to have meetings like this that are publicly -- that are in the public, where people that might be bidding on the contract could come and listen, and if you can't make it they can see what happened in that meeting with the transcripts.

The licensing contract is one of five awards, contracts, in the Voluntary National Test. There was one for the item and test specifications that was awarded to MPR and the Council of Chiefs and School Officers. There will be one or two in September on the development: a Reading development and a Math development.

The evaluation as I said, will be another one, and we're trying to get that done quickly. Then there will be the one on linking; that's linking the national test to NAEP and to TIMSS. And finally, the one we're talking about now which is licensing. So the Voluntary National Testing program will be carried out, implemented, through these five awards.

The licensing contractor will develop standards and implement those standards, for printing, distribution retrieval, scoring, and reporting. So what this means is this contractor will use for example, the guidelines that will be developed on reporting. And in addition to those guidelines they will have certain Standards. Those Standards have to be -- I don't know what those are today because they have to be proposed as part of the RFP.

And those Standards will be things like, if you're a company there will be a certain way that you can meet -- you can demonstrate that you can do the scoring. Like for example, you have to maybe have a certain number of years of experience at scoring tests, things like that.

That might be a Standard. I don't know what the Standards will ultimately be. But part of the responsibility of this licensing contractor is to develop and implement those Standards.

The licensing contractor will develop and implement guidelines for administration and training. There will be possibly, many districts and States that will administer the test but will not be able to, let's say, train the test administrators. And so they would go to a contractor who would do that training for them. Or it might be that there may be the -- there may be other aspects of the administration that would need to be handled through a contractual support.

Test security will be a major component of the administration, the printing and the distribution. So when a company is licensed, let's say, to score the test or to administer the test or to do the training or to print the test, they have to demonstrate that they can maintain absolute security.

As was mentioned this morning, I lose a lot of sleep at night worrying about the security of this test because I do know there will be thousands of computer hackers out there trying to figure out a way of breaking the security and getting this on the Internet and having a big laugh.

So the contact that we'll be awarding in September will be -- part of that will be to propose security procedures that can be used throughout the entire process. One way that I've thought about it -- again, going into that is -- we may very well have a test that's ready for use in 1999, but always maintain a backup test that's kept under the most strict of conditions in case there is a breach of security. Or we might need to have more than one backup test, I don't know. And there may be other ways of doing this as well; that has to be proposed as part of the RFP.

As I said, the licensing contractor will monitor a random sample of test administrations and they will also monitor the organizations that are doing the scoring and the printing. This is a way of assuring the government and the public that there's a level playing field, that everybody is -- that the test is being administered properly, and that the scoring is being done properly.

The licensing contractor will propose a reimbursement procedure in 1999. Remember, the plan is that we will reimburse the companies that are responsible for the training and the printing and the scoring and reporting, in 1999. And so that, we would ask the licensing contractor to propose a reimbursement procedure. That of course, they would have to work with us on that. This is also something new that we need to work through many details about, so this is part of the licensing contract.

Each license will be given to an organization or set of organizations who collectively can carry out the four functions I mentioned earlier. In other words, we're not going to give a license to a scoring company, and then another one to a printing company and, another one to do reporting.

It will be given to either a single company or groups of companies that can collectively carry out all four functions, otherwise we have too many moving parts and this is a way of making sure that we spread out the work, but at the same time reining in all the different moving parts that we can have if we had separate licenses that would go to separate companies that did separate things.

So for example, if you were a large company that could do all four, you might get a license. Or it might be that you might go in with several companies or other groups, and then the four of you or the three of you would have a -- would get a license. There usually would be like a lead organization or company with subcontractors, for example.

So if we look at the next page, this is sort of a hypothetical management scheme here. There are sort of three scenarios here, and there might be others.

For example, a single company might go to the management contractor and get a license and that company might take care of States, LEAs, or a consortia of private schools. So if you are a consortium of private schools -- it might be 15, 20, 30 schools, whatever it is -- you might get together, go to the company and ask that this test be administered in your schools. And this company then, would be responsible for the training and the scoring and the reporting and that sort of thing.

And you would enter into an agreement with that company. Or it could be a State would go to the company, or an LEA or a set of LEAs. Another option might be that a license would be given to three companies: B, C, and D. Or it could be more than three. At which point again, school districts could go to them, States, private schools, sets of private schools could go to them.

Finally, a State itself, or it might be a district, could get a license. So for example, if you're a large school district with years of experience at testing and you have the internal capacity to score the test and report the test, things like that, you might get a license. I think that might be an exception but there may be, you know, a number of those for which that could occur.

Okay, that's the general strategy. The timelines for the awards. The statement of work was drafted in May. It will be on the Web site this month, on the street in June, and we're looking for an award in November.

After the meeting today we will, late next week, we'll be having internal meetings in the Department and either late next week or maybe the week after that, after those meetings -- assuming we don't have major conflicts; sometimes we do -- we will put a draft of the RFP on the Web for public comment for two weeks -- so that's another opportunity for you and others to comment on it.

By the way, we did this with the development contract and it was an excellent strategy. What we found is that -- I mean, we got some of the best comments. I've been with the Federal Government for 11, 12 years. We've never gotten this kind of feedback that we got on the Web.

It was very thoughtful, we got a lot of it, it was lots of different perspectives, and we used it -- many of the people that are here, sitting around the room today, we used it in lots of meetings within the Department -- day-long, week-long meetings working through those comments. And it did, I think, make the RFP a much better product. And so we're looking forward to the same -- hoping we'll get the same kind of thing with the licensing contract.

After the two weeks for public comment, it's taken off the Web, it's revised, and then finally put out on the street for bidding. That's the general plan.

Okay, that in a nutshell is what the plan is for licensing. What we're hoping is that you can give us feedback today, tell us what's good or bad about this idea. If you have alternative models, we'd like to hear it. And if there are issues that we need to know about, we've overlooked, we'd like to hear those. And so, I'm all ears.

Mr. FEUER: Chris is out so I'll just moderate. Yes?

MS. AUCHTER: Joan Auchter. Gary, being from a test development company that sends out tests to 3200 centers, one of the places that you're most vulnerable is when the product is at the printer and at the distribution site, because that's prior to administration. So I'd strongly encourage you to try to centralize those two functions for two reasons.

It's the most costly part. If you're doing the large printing in one process in one place, you're going to get a much better price of doing it. Secondly, I think there's a link that you've missed. Somebody from test development -- or you're going to have to assume the role of monitoring the printing process to make sure that the test is printed the way the developers developed it. So there's a real careful link that has to come in there.

But the printing -- from the development to the printing, there's a real close monitoring quality control, and from the printing to distribution there's a real close quality control. Because what gets delivered to the distribution site is not always what you thought you were getting.

MR. PHILIPS: Good point.

MS. AUCHTER: Because there's a lot of inter-quality control and I think to separate those out and to have those at many sites, you've got a lot of security issues there that you're going to need to think about.

MR. PHILIPS: Good point. What do you think about -- in terms of having a central printing place -- what do you think about GPO being a central printing place?

MS. AUCHTER: GPO?

MR. PHILIPS: Government Printing Office.

MS. AUCHTER: It could be, but they're going to learn a lot about printing they never knew before. It's a technical -- it's a real, high-quality print and they're going to have someone over their shoulders telling them what to do all the time. Printing's easy now -- you can go from disk, you can go online -- I mean, there are a lot of easy ways to make it happen and there are people out there who do it real well and can do it -- they're very inexpensive, but if it's already in their budget.

MR. FEUER: Any more -- yes, Bob?

MR. LINN: At the risk of sounding like -- when I looked at what the licensing contractor would be responsible for in terms of rules or Standards or sharper words, they are all in what I would call the category of the means support the end. We talked about something at such a much looser level, guidelines could use, and my question is, has the Department considered whether or not the licenses might also involve monitoring the use and seeing if it's consistent with --.

MR. PHILIPS: That's not currently in the plan, but I understand the points. I know there is a -- one of the messages I've received loud and clear here is that you would like the government to enter into an enforcement sort of monitoring role in all this, and I hear that loud and clear. I think, you know, that's a policy decision that we have to wrestle with. I'll put that down on the list of things to think about for the licensing.

MR. FEUER: Sharon.

MS. LEWIS: Will we see a common report format, and will the reporting be done in languages other than English?

MR. PHILIPS: What we have said to-date is that the reporting of the tests is a local option, limited by broad guidelines which will be developed. And when I think about what those broad guidelines will be, in my head I think of the Code of Fair Testing Practice and the Joint Technical Standards.

And also, since we have said that this project will be operated in a way that it is consistent with the Code of Fair Testing Practice and the Joint Technical Standards, you know that at least that's what's going to be in the guideline -- which I think is a pretty good start.

I mean, I personally think the Standards and the Code are really excellent documents. John Fremer was saying, many companies have committed themselves and the Federal Government has also committed itself to it. We've said repeatedly in NAEP and in other projects that we are committed to both of those documents and we support them.

So I hope that answers your question.

MS. LEWIS: It didn't.

MR. PHILIPS: Didn't? Okay.

MS. LEWIS: No, it didn't.

CHAIRMAN EDLEY: Well, why not?

MS. LEWIS: Well, the question was, will there be a common report format? Will Michigan and Ohio and Illinois -- will there be one format that people will use if there's going to be a district report or an individual -- is there one report?

And the second question was, will it be languages other than English?

MR. PHILIPS: Okay. The current plan is that there would not be one report. I know you don't want to hear that, but the current plan is there would not be one report. It's a local option and the -- limited within those broad guidelines.

Now, whether it would be in -- the report would be in English or other languages, I think, you know, that still is a possibility. I know Ray Cortinez would like very much to have the reports in various languages and we've been talking about that. Whether we make that a requirement, again, I don't know today, if that will be a requirement. That is a bit contrary to the idea of having local options in the reporting.

CHAIRMAN EDLEY: Gary, when you said, limited by the guidelines --

MR. PHILIPS: Well, what I mean is --

CHAIRMAN EDLEY: -- sense of being committed --

MR. PHILIPS: No, what I mean is that, when you look at the Code of Fair Testing Practice or the Standards, it tells you things like, you should not use a test -- high-stakes test -- for purposes for which it has not been validated, for example.

CHAIRMAN EDLEY: Right, but you're not going to keep them from doing that anyway, correct?

MR. PHILIPS: The guidelines on test utilization -- see, we have not made a prior policy decision about that. But my assumption is that when we do have these guidelines for test utilization, we are going to be consistent with the Code of Fair Testing Practice and the Standards.

And what that says -- it would say something like this: If you're a district and you want to use this test for high-stakes purposes, you need to demonstrate that that test is valid for that purpose. I mean, that's what the Code says and that's what the Standard is saying.

I'm in a little awkward position because we don't have those guidelines in place yet, and we have this policy -- we haven't made, you know, strong policy decisions. And the reasons we haven't again, is because there is a kind of a, there's a balancing act between our wanting to make this a test that's useful and used by lots of people, and on the other hand, be sensitive to the role that the Federal Government has in local education and in regulatory activities.

So, you know, it's a -- you know, this came up several times today and that's absolutely correct. That's why you don't have -- from me, you're not hearing strong decisions up-front about policy decisions that you might like us to make. And I want to encourage you to keep the pressure on because that helps get those decisions made.

MR. SHANNON: In reference to the fact that it will be a local decision on reporting, does that mean the State as well as local, or what is the --

MR. PHILIPS: No, I think what -- let me clarify that. This also gets back to the concept of voluntary. I think the way we're envisioning this is, when a jurisdiction like a State decides that it wants to participate, that's a voluntary decision.

Once the State decides it participates, then within that State that State then makes the decision as to which students take this test. But which students take this test -- once they've bought into taking the test, they must buy-in to the inclusion criteria, the accommodations, the guidelines for test utilization, the guidelines for reporting -- all the things that goes with educational jurisdiction.

So you don't just get to use the test for any purpose. You know, you buy-in to this set of things that goes along with it. Now, if you don't want to use the test, if you as a State or a district decide you're not interested, then of course you're not interested. You're not being forced to take this; it's not a requirement.

CHAIRMAN EDLEY: I'm sorry, Gary, I'm a lawyer so I'm having trouble here. You said, if they buy-in, if they make the decision to buy-in, then they must agree to the inclusion principles --

MR. PHILIPS: Yes.

CHAIRMAN EDLEY: -- and the guidelines, whatever those guidelines turn out to be, etc.

MR. PHILIPS: Yes.

CHAIRMAN EDLEY: When you say "must", to a lawyer that suggests that it's enforceable. Are you saying that you contemplate some enforcement mechanism for the license --

MR. PHILIPS: I'm not saying -- okay, the enforcement mechanism --

CHAIRMAN EDLEY: Is that they're pledging to do so?

MR. PHILIPS: Yes. And the whole enforcement mechanism issue, I think, is something that's, you know, we still need to work through within the Department.

CHAIRMAN EDLEY: Okay. All right. I just wanted to clarify that. Can we just finish quickly? George and Joan and then we'll try to move on.

MR. MADAUS: Let me refer to something you said earlier. You said that you might have guidelines for test use that ran something -- I'm paraphrasing now -- but if you're going to use this for high-stakes you've got to demonstrate validity. How can you do that beforehand? How can you guarantee that before you've got the test, given it, seen what -- I mean, validity --

MR. PHILIPS: Yes, I understand. No, I agree. And again, I don't know what those guidelines will say, but assuming that they say that, that is what the Standards and the Code says. Then of course, one way you could do this is you could administer the test for a year or so without these high-stakes, or do research studies or whatever needs to be done to demonstrate that the test is appropriate for that use.

Which means, in the first year you can't use it for high-stakes. But again, I'm not saying that is going to be the way we're going to do it. I'm just giving you an example of how it could be played out.

CHAIRMAN EDLEY: Rich, can I get your guidance here? There are lots of hands in the air, and it seems productive on the other hand -- there's a schedule. The people who's hands are up -- why don't you just make your point, state your question or whatever, and then give Gary a chance to -- all right. Joan?

MS. AUCHTER: My first point was on standardized score reporting, and I may be missing the point here but, we talked this morning about the interpretation of the results. I can see where it may vary where the State wants to put different information on the form, but if a student is proficient, the definition for proficient is going to remain the same and everyone has to use that --

MR. PHILIPS: Absolutely, right.

MS. AUCHTER: -- and that's based on a certain score. So there's certain elements of that score report that have to remain consistent that will be prescribed.

MR. PHILIPS: Oh yes, sure. Right.

MS. AUCHTER: Okay, that. And then secondly, piggybacking on -- never mind. I forgot it.

MR. PHILIPS: Right, well when I say there may be differences in States in the reporting, what is in my head is like for example, one State may want to do a score report for schools, another one may want to do it for districts. One State may want to add a background questionnaire and do an analysis by instructional practices or teacher certification, and another one may not, things like that.

But in terms of reporting on students in special populations, of course those are going to be standardized across the various educational jurisdictions through the reporting guidelines.

MR. FEUER: Larry Snowhite.

MR. SNOWHITE: Larry Snowhite. Am I correct that the licensing contractor function in the RFP will be restricted to these four issues which are, if I may, more mechanical or operational, and the licensing contractor will not oversee such questions as compliance with the Standards and the Code, or questions of utilization or determinations of validity or reliability, or any of these four macro issues, if you will, that were discussed earlier? That a licensing contractor will be restricted to these, essentially, four functions?

MR. PHILIPS: I think that's generally right. The licensing contractor is not a police enforcer of -- that's right.

CHAIRMAN EDLEY: Bill Taylor, and then Jay Heubert, and I think that's it.

MR. TAYLOR: Just a follow-up, Chris, on a question. Take hypothetically the case where you've decided that there are inclusion requirements, and one finds out, your management contractor or somehow learns that on the day before the test was given, the LEA -- which is the licensee -- told numerous LEP students and disabled students that they should stay home -- they really didn't need to come in and take that test.

Now, I understood you to say you haven't decided what consequences should flow from that, what would be a breach of the contract or a breach of the license. I want to know, is that correct, and then I also want to know, have you decided in looking at this particular arrangement, that whatever consequences would flow from action by the management contractor rather than the Federal Government?

MR. PHILIPS: Well, it's true, the enforcement mechanism is still something that's being talked about. If there are laws that are being broken, you know, lawsuits can occur, the whole legal machinery can kick in. I mean, that sort of thing can happen. But we don't have yet, in this project, a strict policing activity with sanctions and penalties and that sort of thing, associated with misuse.

Now, you know, as we think this through, I don't know know that will develop. But what I'm telling you is where we are today.

CHAIRMAN EDLEY: Okay, Jay. Last question.

MR. HEUBERT: Apart from the question of monitoring, where will the guidelines for test utilization be and originate in this model? It says, the licensing contractor will develop and implement standards for printing, distribution, scoring, reporting, administration, training.

None of those, to me, seem to cover standards for test utilization. Where in this scheme, do the standards fit in and who is it that's developing them? Are they in the RFP, are they in --

MR. PHILIPS: They're in the RFP for the development of the test, which will be awarded in September. And that's where all those guidelines will be developed -- utilization -- there will be guidelines for test security, guidelines for reporting -- just various guidelines.

CHAIRMAN EDLEY: Gary, you are a great and brave American. Dick Elmore and his merry band. And Dick, I'll just leave it to you to introduce your folks and I can retire.

MR. ELMORE: Let me see if I can set the context for this discussion and briefly introduce our panelists. The national test is an interesting experiment in Federalism and inter-governmental relations about which we know something -- the general topic, that is, of the constellation of issues that are entailed by the Federal Government attempting to assume some new responsibility and to make a niche for itself in a new area.

People we've assembled today are all experts in one way or another on this issue -- all from outside the field of education -- which is a rare treat and privilege for groups like this.

Among the constellation of issues associated with the national test that are grouped under the heading of Federalism and inter-governmental relations, are things like the following.

How does the Federal Government define its role? What does it set as its purposes? How does it communicate those purposes and how does it attempt to organize itself and act consistently with those purposes?

Another set of issues has to do with the problem of incentives. As an exercise in inter-governmental relations -- and I hope our panelists will get into this -- this is a kind of an interesting venture. It's a voluntary national effort which means that a lot hinges on the value of the test itself in the setting and context in which it's going to be used; that is to say, it's not a mandate. And somehow it has to work out to have value in a lot of States and localities for a lot of people, if it's going to function effectively as a voluntary national test.

It is also an attempt to enter a field in which there's an enormous amount of activity already going on. It's not like there aren't any tests being administered out there, nor is the problem that you can't compare an individual score or a school score or a district score or State score, to some external standard. We can also do that.

There are some distinctive characteristics and features of the proposed test which give it a kind of value-added -- it's individual, it's every year -- that open up the possibility that it will add value and will be attractive, but those features have yet to be tested. So we need to know something about that set of issues.

Then there are, as we've made abundantly clear this morning, are a whole host of kind of technical and institutional design issues having to do with how you set something like this up. The central one that I hope to get some focus on this afternoon, has to do with this licensing arrangement and how it would work and what we know from other policy areas, how this would work.

But then there are a whole set of other institutional issues, not the least of which is the question I tried to raise earlier in the day which is, what is the relationship between the formal institutions that are set up to administer the test and the political environment around standards testing in school improvement, school reform, that's going on out there in the country right now?

So that's some sense of the range of issues under this general topic of Federalism and inter-governmental relations that beg for our attention.

We will have a presentation from Bruce McDowell who is at the National Academy of Public Administration, which is a sort of inventory of a general set of questions and concerns from the perspective of someone who's spent a lot of time thinking about the problems of Federalism and inter-governmental relations.

That will be followed by a series of comments from our distinguished panelists, and if you could just in your mind, there's not much danger that you will confuse the two, but if you could just switch Jack Knott and Beryl Radin's positions on the list, that's the order in which they will go.

Beryl is a professor of Public Administration and Policy at the Rockefeller College at the State University of New York at Albany, who's on an inter-governmental personnel act. We don't call them assignments, we don't call them boondoggles anymore. She's on an assignment, an IPA assignment with the Department of Health and Human Services here in Washington.

John Shannon is a Senior Fellow at the Urban Institute and he's a former Executive Director of the Advisory Commission on Inter-Governmental Relations -- an organization that has produced a lot of interesting research over the years on problems such as this.

It's a special privilege to appear on a panel with Jack Knott because many years ago now -- it was actually 12 years ago, I guess -- I showed up in East Lansing at Michigan State University and Jack and his colleagues were kind enough -- and when I assumed my job in the College of Education -- to extend me a courtesy appointment in the Political Science Department, which is a dynamite Department.

And Jack has, over the years, spent a big piece of his professional life trying to understand problems of political economy and incentives around issues such as this.

So for opening remarks, I give you Bruce McDowell, who -- Bruce, I understand you're going to take about 40 minutes, 45 minutes? And when you see me starting to squirm visibly in my chair that means you're approaching your deadline.

MR. McDOWELL: By that time I hope to be administering a test to you, and it's a test that nobody has ever passed. So listen carefully.

I am not going to stick to my prepared remarks -- although I do hope to cover them eventually -- in the sense this is a fend-for-yourself panel. They're going to react to some stuff they haven't heard before -- well, I mean, these people have heard everything -- but not something that I sent them in advance, in part.

I used to work for the Advisory Commission on Inter-Governmental Relations; that's where I got my inter-governmental expertise. I started off in 1963, after I was about halfway through my doctoral dissertation at American University, and ACIR thought it was a great idea so they took it on as an ACIR project and it led to something originally known as the A-95 process -- now it's Executive Order 12372.

And what it provides is that if the Federal Government comes out in your territory and does anything, whether it's give a grant, build a building of its own, or whatever, they have to give the affected State and local governments an opportunity to comment and review the proposal before they make a final decision. How many of you Feds are familiar with that? There's at least three.

It's still around. It's been de-emphasized a little bit so I feel I can just about be obsolete. I'm just waiting for it to be declared dead and then I can retire.

Anyway, the point of my background is that I know John very well, worked with him for years, and toward the middle part of the '80s he was Executive Director and my boss. Actually, for part of that time I was his Executive Assistance, trying to get all the things done that he got started and needed some follow-up. So it was a great time.

But he and I had a terrific discussion yesterday morning on the phone about my paper, so I went back home last night and rewrote it. So I have my original speech, I have the one I wrote last night, and then this morning I spent the whole morning rewriting it again to make sure it was responsive to what you all wanted. So I have three speeches. I was wondering how I was going to get through 45 minutes, and I guess that's not a problem. You may have to give me two warnings.

Anyway, we have a fend-for-yourself panel. John Shannon is the guy that invented the term, "fend-for-yourself Federalism" in the '80s, as a lot of things were changing about the way Federalism traditionally had worked for say, three decades or so.

And the money started going away, the regulations started arising, and to a large extent, State and local governments were cut loose. For one thing, to find the money to do all the things the new regulations said, and to take care of the money they were losing on a lot of their Grant programs.

So this is very much a kind of a fend-for-yourself system that we have now in the Federal Government. So I'm going to start with some comments on what you all said this morning, then I'm going to go to last night's speech and I'm going to end up with the pre-prepared one, and that's where I'll give you the test.

These are my misinterpretations of what you have said, so that you won't be surprised when everybody else misinterprets the test results that you've put out. One of the things I heard is that there are no Federal sanctions anywhere along the line, so you can volunteer or not and you can probably get away with most anything, probably even after you've volunteered, in terms of there not being a real way to keep you in line with what you have said you would do when you signed up to volunteer.

You can contract out an awful lot: the administration, the scoring, the reporting. I think the message there, inter-governmentally and public/private is, there are some Federal rules for what is inherently governmental and therefore, what cannot be contracted out.

And it's things like making policy, committing the government to do something -- these are rules that would generally apply to Federal, State, or local but there's a very specific OMB circular on this for the Feds -- holding a contractor accountable, protecting the rights of citizens.

Those are the kinds of things you cannot contract out, and I think that may raise some questions with respect to the broad scale of contracting out that's been posed here for the licensees. So that needs a careful look to make sure that a private contractor is not performing inherently governmental functions.

There's a suggestion of an advisory group. My impression, and maybe I'm wrong, is that that advisory group is going to be largely the techies in this field. Elmore and others have begun to raise the issue of, what about the policy folks? To what extent is this testing stuff and the test results technical or policy, and can you separate technical from the policy?

I think you're going to find that that is going to be a tough task. For example, if you wanted to deal with some of the issues raised by Janell, are those technical issues or are they policy issues? They show up as technical parts of the test but if they're not sensitive to the potential policy uses of the test results, then they're policy. And I'm not sure how you tell the difference -- you can tell the difference real quick after the fact when somebody starts using them in an unintended fashion.

Another thing I sort of sensed was that there may be a real difference between intended or unintended consequences, and consequences that could be anticipated. And I think you can anticipate without very much imagination, that there are going to be all sorts of policy uses of the test results.

Therefore, what that means to me is in the advisory structure, there ought to be some element of policy participation, and don't just treat this as a technical exercise. You'll find out later that it wasn't, even though you thought it was.

One of the great things I heard was that all meetings are going to be public; they're going to be on the Internet. For those of you who are electronically-literate that's terrific. And I assume that every interest group has somebody that's electronically-literate, so that those of us who are not very much, will get it passed on to us rapidly from the source, and we'll know everything that's going on.

Another thing I heard was that the States may voluntarily take this thing on; however, at that point it may become mandatory for the local districts, depending on which State you're in that took this on voluntarily. The last discussion of what's voluntary and what's not reminded me of our whole history of Federal Aid programs.

Federal Aids programs, believe it or not -- I don't know if any of you administer them -- Federal Aid programs are voluntary, they're all voluntary. You don't have to take them, and the Supreme Court has made a lot of that fact when it comes to enforcing all their regulations.

So you can actually put voluntary and regulation in the same sentence, the same phrase, and actually you did it in the title of the workshop. At first, that sort of stopped me a little bit. What? Voluntary regulation; what are they talking about? But we have a long, long tradition of this. So once you're in, you're in, and voluntary kind of ceases until you re-exercise the voluntary to exit.

Now, I guess there are no sanctions in this particular program, at least not yet, so you can sort of volunteer your way through, even though you've made a moral commitment to do things. Most Grant programs are not that way. So you have at least a little window of luxury here where there's more volunteerism in this program, apparently, than there would be in a straight, Federal Aid program.

Unless you want to go for second-year funding on administering the test. At that point I presume there's a sanction. Perhaps. Who knows? We'll test that out in a couple of years and see whether there is or is not.

Gary, I think quite understandably, when he got certain questions, sort of turned them away and said, that's beyond the scope of what we're here to do today. This is not meant as criticism because this is what we generally find in these Federal programs. Each one sort of sits on its own basis, extremely reasonable.

The goals are terrific, it's not very much of a burden. We can figure out how to do it -- at least a couple of years after we're into it -- we get used to it, we learn the ropes and so we can do it. So, not a great problem. Unless you're on the other end and this is one of 200 you're dealing with, and then it adds up. So you have to think hard about which one is the one that broke the camel's back.

In the first Clinton campaign the slogan on the wall was, "It's the economy, Stupid". In the Federal system, the slogan that's up on the wall is, "It's the system, Stupid". It's not any one of these individual mandates.

But at ACIR last year before we went out of business, we took on the existing mandates issue and we asked the State and local governments which mandates bugged them, and we got about 200 and we said, we can't deal with that. So we went back and we said, which mandates bug you the most? And we came up with 14, and we tried to deal with those and the response was that we were put out of business.

So I think we have a lot of mandates with us and they're not easy to turn around. The reason is, if you take any one of them individually, they're almost unassailable. It's only at the systems level that they really give you tremendous fits.

So let me suggest that if we're not real careful on this one, even though the Feds don't mandate it, you're in a State that mandates -- it's another one of those Goddamned Federal mandates as far as the individual school district is concerned, in terms of its own experience of the situation. So I think a little thought down that path might help.

Eva Baker made a big point of reducing the negatives and accentuating the positives, and also the concept of the adaptive use of tests. We used to call Federalism in America, pragmatic Federalism. This was back in the days before we got such contentious issues and starting going fend-on-your-own. I guess we're up to coercive now, was the last description I heard of it.

It used to be cooperative. Federalism, if you look through the past 30, 40 years, has been characterized with an awful lot of very flowery words, and those are a few of them. But I think for our purposes here, pragmatic Federalism probably fits. This is going to be a new part of Federalism, these test results. And Americans are going to treat that pragmatically and they're going to get extremely adaptive, and they're going to use it in all sorts of unimagined ways.

What that means to us I guess, is we have to work our imaginations overtime to try to figure out what those adaptions might be, otherwise we'll be sort of behind, sort of in the defensive mode trying to catch up to the things that are done. It seems to me if we can handle some of those issues on an anticipatory basis, get ahead of the curve, we can turn things around from having to be defensive about them, to making the most creative and productive use of them.

So that's a challenge, I think, for all of us. Essentially do the interpretation up front before you put out -- or, as you're putting out the test results, and try as much as you can to pre-empt the misinterpretation. You know there are going to be misinterpretations.

If you come up with your interpretation after the misinterpretation has gotten out, first impressions are always the ones that people remember. It's going to be a lot tougher. So I'd make sure there's some pretty darn solid interpretation along with the test results when they go out.

Things like some of the speakers this morning were talking about. How do we interpret this in a given school district which is different from a different school district? And let people know right up front. You cannot measure each school district the same way against a measure -- or sometimes we use the words measure and standard interchangeably.

I'm not sure we're quite sure what we're talking about. A lot of what was mentioned this morning was more like standards, and then sometimes it's more like measures. And the Department's view, I guess, is that this is just information. But that ignores the possibilities for adaptive use, and let me suggest that some of the adapters will make it into more than just information.

I think we also need to understand that when people receive the test results they're going to be filtered; there are going to be an awful lot of sources out there filtering the results. And which filter are we going to listen to? The one you trust the most. Well, that's not going to be the same one for each of us.

It will be different in a disadvantaged community then it would be in a community that met the standard long ago and said, what's new? And they'll have different filters that they listen to when they receive the information. That may be true of students, parents, communities, school boards, and so on.

The point I think, that Janell brought out on this particular issue is, will the new Math and Reading tests further disadvantage already disadvantaged students, schools, and school systems? And that is something that we need to think about. There needs to be interpretation on that issue when the stuff goes out -- not let the interpretations sort of seep back in afterwards.

That's a big disincentive for any of those school districts to take this on. They're just handing ammunition out there, and particularly if it's handed out without interpretation; that's a major inter-governmental problem.

She also asked the question, which States opt in or opt out? And I think the implication was that those with the greatest need for improvement also are those with the greatest amount to lose by the misuse of the data. So the people that need it most are probably going to be the ones that opt out first. And that is a significant problem.

I think Constance Newman really sort of addressed that as well. We need to be very careful with that kind of issue and work very hard -- in Constance's terms it was, work very hard to overcome those misuses that would disadvantage an already disadvantaged district.

Kati Haycock made the point that done right, these things could be great, but I think that gives us a little bit of warning to be careful in how we do it. The test itself is not going to change anything; it's how the test is used. And I'm going to suggest, as I develop my argument, that it ought to be used as part of a performance management system.

And that is a very complex thing to think about, but mostly what we've been talking about -- except for a few of the kibitzers -- up to this point is that the test is a test, and what it gives you is test results. It's kind of unconnected unless someone else connects the dots. And the danger there is, they connect them wrong.

So I think we need to think about how they should be connected and begin making those connections in a way that benefits everyone in the inter-governmental system the most.

It was 11:30 this morning when Jack Knott finally mentioned the word Federalism. I guess he must have gotten my paper and been sensitized; thought it was about time at 11:30 to mention the word. Actually, I was kind of surprised. I didn't think it would be mentioned until I got up.

But that's great. I think it shows a sensitivity, and just before we broke for lunch that sensitivity came out more in sort of the parting remarks by Mr. Elmore. This is a very fundamental Federalism issue.

Eva, when she came back the second time, talked about one of the applications of these test results almost certainly will be to policy and program evaluation. And the second one that really has a lot of meaning for the inter-governmental system is, there will be legal applications made of it.

And the reason that is so important inter-governmentally is that the Federal Government has passed a lot of laws which rely to a large extent for their enforcement, upon giving citizens the right to go out to the State and local governments -- and for that we're very grateful. Best thing the Feds ever did to us. I'm joking a little bit.

Most State and local governments think that's the last thing they need is to have the Feds sponsoring a whole lot of legal action against them. It doesn't sound like cooperative Federalism. I think that's why we've, just in the last five or six years, started to come up more and more with the name more like coercive Federalism.

So I think we want to be careful here to try to guard against those legal applications as much as possible. Not that they don't sometimes do a lot of good in a system that otherwise might not move, but that should not be unanticipated, that these results will be used that way.

John Fremer made the same point in a slightly different way when he talked about the high-stakes issues being ones that are going to get attention. And there is really no way -- I think you summed up the sense of the group -- there's no way we're going to keep these test results from being used for high-risk issues.

Just about done with the recap, I guess. This is taking a little longer than I thought it would, but this was a great session. So if all we get out of it is these reactions, in terms of my part I think, this is what I'd prefer.

In the last Q&A before lunch, this real stark difference between the way the professionals look at this whole thing and the way the policy, political world look at it, is really a key part of the message I want to leave you with.

At the present there's essentially a disconnect and there needs to be a strong connect. I'll give you an example. These things are not actually impossible in terms of dealing with the political field. I've done a lot of inter-governmental work on public works issues, and if you think you've got a problem with test scores, take a look at pork barrel projects going through Congress or going through a State Legislature for that matter, or even a City Council.

And the planners are working away and the planners are justifying and they're doing investment analysis, and they have worked out the projects that are going to have the greatest benefit/cost ratio, and suddenly it's not in the right district. Well, that's essentially the same kind of issue you're going to come up here. If the test results don't work right in the right district we're going to have a lot of political problems.

Along the lines of not letting this information just sit there as information, I think what we need to do is use it to improve performance, use it to improve accountability, and use it to keep accountability at the proper level of government -- whatever that might be: State, local, or wherever.

After lunch, specifically on the licensing thing, I think more than anything else it raises that issue of what's inherently governmental. I won't say anything more about that, but that ties in very much to accountability and the uses for policy purposes.

Let me now move to a little bit of inter-governmental background that's largely a result of John Shannon's discussion with me. When I prepared my prepared remarks I zeroed right in on the issue and forgot all the context, and I forgot that this wasn't an inter-governmental group that I was talking to.

I often talk to inter-governmental groups -- unless it's a bunch of foreign visitors -- when I talk to foreign visitors I back up all the way to the Constitution and before. But I think I wanted to just make a few contextual points.

This initiative is tied very solidly to national objectives that I think we all support. In the outcome's term, what we're really trying to do here is make teaching more effective, raise the competency of K through 12 graduates -- raise that competency in terms of the level of preparation for higher education, or the level of preparation for going out into the workforce.

And, bottom line, the purpose of this whole thing is to keep the American economy competitive in a worldwide market. I think nobody could disagree with that. Secretary Riley's statement to the Congress made an extremely strong point, strong case for this policy. It's in the national interest, it's in the best interest of the students, best interests of their families, best interests of their communities.

So I think there's no question but what this is a terrific initiative. It boils down to the devils and the details, I think. A part of the detail is that our country is more decentralized than most. It's a Federal system and you can count the Federal systems on just a little bit over one hand. Most countries in the world are unitary systems. There are lots of implications of that, but let me just throw out one.

The Federal Government in our country collects about two-thirds of all the tax dollars, and it turns out that that is just slightly lower than the lowest amount collected by any unitary government. The average unitary national government collects 85 percent of the revenue in the country, and France collects 99 percent -- almost 98.7 percent of all the revenues.

Therefore, if you come to a management problem you just follow the dollars and you've got it managed. It's not that simple, obviously, but you have a hell-of-a-lot of leverage if you're a national government in most countries. In this country, you're not at the bottom of your direct influence, but you're pretty far down.

So we have to think of a lot of other ways to keep the inter-governmental system going as a system which can meet national goals. And we're pretty inventive on that. Let me give you four ways that we do it.

One is Federal Aid, and this is money, but it's in decline. I would bet if we had taken an initiative like this 20 years ago, without a doubt this would have been a major new Grant program. We wouldn't be talking voluntary and blah-blah-blah, except in the sense that Federal Aid is voluntary, till you take it.

What happened in the '80s -- this was the way we did everything, kind of from the '30s up through the '70s -- it kind of peaked out in '78. But it's now not the preferred method. In the '80s the preferred method became regulatory, and ACIR got a whole series of studies on regulatory Federalism out of that shift.

Before then we had been studying grant management till we were blue in the face. We knew everything there was to know about Grant management and had done it over again on the second round of studies. And suddenly we're thrown into regulatory Federalism. That peaked out in 1995.

The 200 mandates I mentioned -- there was also 440 pre-emptions. The pre-emptions tell you what you can't do; mandates tell you what you have to do. So you've got 640 orders from the Federal Government that you've got to live with if you're a State and local government.

So the system got kind of burdened and in 1993 the State and local governments all got together -- hardly ever all get together on one issue; the last time was general revenue sharing which was about 1980 or so -- no, '70. So from '70 to '93 we were without a unifying force to keep all the State and local governments on a single issue with a single way of thinking about it.

The results of that a couple of years later, early 1995, was the Unfunded Mandate Reform Act which Clinton signed in the Rose Garden with the statement that we'll never see another Federal Mandate. I was quoted in the newspaper as saying -- and this was not to Clinton or anything, this was a totally separate thing, but the reporter put it together -- that history would give you some reason to doubt that optimistic view.

So I was immediately taken off the Mandate project. The Mandate project was done incorrectly, and the organization was abolished. So, a little piece of history.

These are difficult issues. The Mandate Reform legislation has three parts to it -- three major sections. The first section says, if you're in Congress and you want to pass a law and it would cost State and local governments more than $50 million a year, you have to find a way to pay for it or you have to stand before a point of order by any member of Congress in either House, and that must be overridden by a majority vote on that issue before you can go on and consider the bill.

So if this program were not voluntary, it would have to find the money or it would have to go through a point of order in Congress to get enacted under the new rules. That kicked in January 1st of '96.

The second Title of the act says that if you're in a Federal agency and you're trying to make the rules under Federal Mandate legislation, that you have to consult as widely as possible with the affected stakeholders, and you have to allow for compliance by a range of methods that includes the least costly of the methods of compliance.

Now, that comes out of a long history of the environmental regulations being written specifically to mandate the most expensive method of compliance. You know you get what you pay for, so if you pay a hell-of-a-lot more, maybe you'll get better cleanup. But then somebody gets inventive later on and you're kind of stuck with the best technical solution, regardless of affordability. So the Federal agencies need to think about what these mandates are going to cost.

The third part was to ask ACIR to do the studies that led to its demise. So that was the best Title of all.

Anyway, regulatory stuff now is a lot harder to do, and finally sort of reached a breakpoint. So if we can't use money and we can't use regulations for new initiatives, what should we use? Well, there are a couple of other possibilities that we're experimenting with, and even though some of these have long histories, they've not been major methods of inter-governmental management.

One is the cooperative partnership method -- essentially the voluntary approach. And I think the best example we've had of that is the Agricultural Extension Service where they were able, for example, by getting a farmer to take a big chance and to use fertilizer, free. And suddenly when the spring came his crops greened up a lot earlier than the crops next door, and the next-door farmer said, what have you got there? I want some of that.

And when it came harvest time, he harvested a hell-of-a-lot more than the farmer that hadn't had the fertilizer. And so the next-door farmer comes over and says, I've really got to have that; how did you get that? And before you know it, everybody's running after those benefits because they've been demonstrated next door in a very visible fashion.

And I think that's what we need to do with our voluntary programs. The message there for this program, I think undoubtedly, is that we've got to find ways to show the incentives. What is the real benefit for volunteering? And I'm not sure whether we've got convincing story yet or not.

The fourth means of inter-governmental management which the Economists favor is competition. Essentially, if we can set up the playing field correctly, the unseen hand will make governments do what they should do in their own best interests. And I think the application here to testing goes something like this.

If you're trying to get more jobs in your community, you want more economic development -- maybe you've just had a base closing of a major military installation, and perhaps that was your primary economic base in a small community out in North Dakota or someplace. You have a real incentive to do something to show that you have a world-class workforce, and these tests just might do it.

So that kind of thinking can be brought in. That's an incentive to parents. When this new economic development is attracted to my area, I want to make sure my kid gets one of those jobs. The business community says I've been here forever, I want to stay here, I want to grow, but I can't do it if I don't have a great workforce. So they've got an incentive. And the schools have an incentive along the same lines to serve that clientele.

So I think the implication is clear. We've moved from just a Grant program, we've moved from regulated. We're down now somewhere in this cooperative, competitive area. It's a new area that's kind of untraveled, and we're going to have to be creative.

So what we're really looking for is incentives so we can recruit volunteers; on the other side, remove the barriers so when we've got a volunteer we make it easy for them -- facilitate the whole deal. Don't say great, glad to have you, the more volunteers the merrier, and you must comply with this and this and this and this, and the forms are long and the process is arduous.

Well, let me come to the third speech. How am I doing?

MR. ELMORE: You have five minutes.

MR. McDOWELL: I'll be pretty quick on this. I want to make sure I get the test in so I can flunk you all out.

National uniform standards -- I think somebody already said -- do not come into a vacuum. There's been a lot of tests, so this is a test on top of tests, unless this test replaces other tests -- maybe not at first, but ultimately.

Now, this would be an incentive. If we have the tests in the classroom, then we have the standardized tests that the school system's been giving for years -- I mean, we had Iowa tests when I was coming up and then I think we switched to California tests, standardized tests. I'm not all that young anymore, so they've been around for a long time.

And then we've got these new sampling tests that tell us where we are with relation to other countries. So we've got at least three sets of tests

already, maybe more, and we're talking about putting a fourth one in, and wouldn't you please volunteer for that. And there is some reaction. All I know is what I read in the newspaper, but some people have said, we've got enough tests already and more tests out of classroom instruction time is not what we need to improve.

Regardless of the merits of that, I think it illustrates that you're piling something new into a system that's already fairly full and you need to think about that: why this test rather than another; can this test, if it's really better than the others, replace the others; and actually, can it have a potential for doing better and cheaper, what we need done in the current system? I think that would be the ultimate incentive. So think about that one a little bit. I don't know your field well enough to talk about that much.

Negotiate the rules, build consensus. If you don't, you're not going to have trust that the test is something you want to volunteer for. You'll let somebody else demonstrate it first. And this is only really going to work if you get everybody in. So I think your goal is, get everybody in the system participating in this new, better test, but cut the burdens.

I'll give you an example from Transportation -- well, maybe I shouldn't. Transportation reports to Congress every two years on the status and performance of the highway system, and recently they've come down to an annual report that includes transit also. Where do they get their data? They get it all from the State Departments of Transportation and the Metropolitan Planning organizations.

They have to have high quality data coming in or their report to Congress is no good. So their methodologists got out and they set up this reporting system that could not be generated from the data that were being used by the States and the Metropolitan areas, so it's a new, go out and get it and report it to me please, and it's an extra burden.

Well, they've been doing this for 16 years now and the States have come back and said, it's too much of a burden; let's cut the burdens. And the reaction from The Federal Register is a whole bunch of options to cut data, not to cut the burden. I mean, cut the burden by cutting data.

What we need more than ever is more data. We're not going to get to a high performance society unless we've got the data to measure it, and the question is, why can't we work out a system where the Feds can use and report to Congress, the information that's needed to manage the blasted systems in the first place? So that we don't have to go out with a redundant data call. And why can't we just have everybody put that on the Internet and there's no reporting? If the Feds want a report they download the data needed in the report and they've got it. Because it's agreed-upon, working system.

So I'd urge you to look for those kinds of synergies. Let me just shift to four or five quick slides which give you an idea of the kind of performance system that I think might meet everybody's needs within one system. This is certainly not proven out so don't go home and tell anybody that we've got a working system.

This is kind of a home-grown set and I need do it myself. It kind of fits together and builds upon each sheet. What we're really interested in is better outcomes. We want an America that has a cleaner environment, a stronger economy, and a healthier population and a better workforce, and that's what makes the difference.

And how do we get there? We have programs. So we try to either intervene on what is happening out there in our country and make it better. Why don't we succeed? Well, there's a hell-of-a-lot out there working against us. The system is being stretched.

Take this just as workforce, and take your tests as the input to improve the workforce. In a dynamic way, interchanging the test results with the teachers and the students, making something of it.

At the same time you've got drugs, crime, one-parent families that don't have time for the kids, fighting you. And you're working harder than ever and you can't understand why you see results over there. It's because there's a lot working against you.

Well, that's not a terrific situation. So what's your reaction? Don't hold me accountable for my program because it isn't going to get results. I'm in a situation I can't win. The powers that are fighting me are bigger than my programs, so I either need a bigger program or don't hold me accountable -- one way or the other.

So what are we trying to do? We start off and we start measuring. We measure what's going on in the workforce, we measure the tests -- and these are the tests you're talking about. A new measure of the output of your education program -- test 4th Grade, 8th Grade. And we're putting a certain amount of budget in there and we're measuring that; we measure all three of these things.

And where does it get us? Well, sometimes it doesn't get us very far. This may not end up real precise but, if we could actually find a way to link these three sets of measures together in some sort of related way -- maybe not totally causative, but anyway, related in some fashion -- and use those datalinks to do some program evaluation, what's the relationship between these three measures? And that might give us some insights. So some people are starting to do that.

I might say, what I'm putting up here is the results of a study of 14 Federal agencies in the Public Works field, and five of them were using this system -- the basic concept -- to struggle towards some solutions. We still haven't got a complete solution, as you probably recognize, but we could also go out and measure some of the things that are fighting us over here.

And we could take that measure and add it to our program evaluation, see what difference that would make. And maybe that gets us a little further along the way. But we're still not there; haven't solved the problem yet.

Suppose, instead of just trying to raise the budget, increase your program, suppose you could push some of the activity up to a prevention strategy that comes over and reduces what's fighting against you? So now we've got these three measures -- this measure -- we've got four things that we're measuring and we're trying to link them all the time.

So if you can get that far, all I'm really trying to illustrate is your tests alone are not going to cut it, they're not going to get you any improvement unless you can link up with a larger system. But it's very important -- I don't mean to reduce the importance of this measure because the whole thing falls apart if you don't have that measure.

All I'm trying to suggest is that if you leave this alone as an unconnected piece, it's not likely to have the kind of impact that everybody's expecting. Somebody, I think, talked about expectations earlier today, and those expectations are going to be real hard to meet unless you take that kind of approach.

So I lay that out, not with the thought that it's simple. You're not going to get to this kind of a system real quick; nobody that I know has mastered it yet. But I think in my own mind, that's the direction to go. Thank you very much, Bruce.

MR. ELMORE: Thank you very much. Beryl.

MR. McDOWELL: The test, by the way, was whether you could understand that set of slides.

MR. ELMORE: How many people understood it? So you have a small group that's reached mastery; the rest of you will stay after and remain here until you do.

MS. RADIN: I want to do three things this afternoon. First, to give you a feel of the inter-governmental world as I see it today; second, to focus on two issues that were discussed this morning that seemed to me to be particularly interesting; and then finally, to give you an example of an effort that may have some illustrative quality for this enterprise we're in.

I think as Bruce suggested, the inter-governmental field is really best described as an area that's constantly in flux and it's full of contradictions. And one of the things that's interesting in the field is that it is -- it's got a literature that's always searching for metaphors or images that describe what it's about.

And sort of the two kind of classic ones started with the layer cake and then moved into the marble cake. But I think neither of those, sort of baked-in images, captures the uncertainty and dramatic swings that have occurred in the field today.

So the image that I came up with is that of a juggler who is dealing with a variety of objects that have to be balanced and kept in the air. I don't know if any of you have ever seen that juggling group called the Brothers Karimazov who juggle all kinds of things. In fact, I once saw them juggle jello, cats, and automobile parts at once.

And I think that's really what the inter-governmental field is like. And many domestic policy areas are designed and implemented in a framework that does contain something that's similar to these multiple objects. And what I tried to do in characterizing this field is to draw from some other policy fields -- Human Services, Welfare, Rural Development -- areas that I've work in, in the past.

And I like to sort of characterize four realities that I think people have to deal with today as they're talking about the inter-governmental field. The first is devolution. This is an era when there's a great move to minimize the Federal Government role and to move responsibility to lower levels of government. And what we've got is an erosion of the legitimacy of an active Federal role.

Most of this devolution has occurred to the State level, but we've also seen that the States have, in many areas, have moved to devolve further to the local level. Or even to the private sector or to others who actually deliver services.

Now, devolution means that, rather than a single way of doing things the political cultures of States and localities really produce really striking differences. Now I think as was suggested this morning, there are two faces of this concept of differences.

Differences can mean the lack of consistency and some real equal protection questions for the lawyers, but also differences can mean creativity, and you know, let the 1000 flowers bloom. And I think that even in an era when we had Federal policies that were regulatory and what we usually call command and control, that we found there was a great variety of responses to Federal policies around the country.

Now, my second reality that I think the inter-governmental field has to deal with is a concern about performance. And this is, as again has been noticed, an era of rhetoric about accountability, and we are asking public organizations to be able to tell the public and their political masters, what they are doing and how they demonstrate their effectiveness.

This movement means that we are not satisfied with the professional norms that have been used traditionally to really establish some of the accountability relationships. We see the establishment of a set of standards in the number of policy areas, and I think it's important to acknowledge that this moves in a direction that's very much the opposite direction to the devolution imperative.

Some of the people who are sitting in this room have been living with the government Performance and Results Act, which assumes that the Federal Government is able to define performance standards for programs and measure progress against them. This is despite the reality that many Federal programs are not administered or even well-defined by the Feds. And so we have one, a problem of establishing the causative relationship of a Federal intervention, and also the whole problem of data.

The third reality that I think my juggler is dealing with is the move toward standardization. And that again, is a reality of the world that we live in, and it's found in professional socialization, in training -- in many, many areas where you have a creation of norms and expectation that really transcend jurisdictional lines. So that we find in fact, those jurisdictional lines both in the U.S. and internationally as well.

And increasingly, people talk to one another across the boundary lines. I found in some of the work I've been doing this year in HHS that there's a great interest in a number of fields in peer to peer technical assistance, where people can learn from one another and develop the standardization through that exchange. But there's often not agreement on standard data definitions or even what people should be doing.

Now, the last reality that my juggler is dealing with is the reality of hidden agendas. And it's a kind of a shadow that we know is there but it's not visible. And it's what we've talked about throughout this day as kind of the political overlay for many issues.

In a number of policy areas it's very hard to know whether one is talking about abstractions in Federalism and inter-governmental relations, States' rights, appropriate roles, or whether the discussion of issues -- like mandates, to give an example that Bruce gave -- is really a way for people who oppose specific policies to use an argument that wraps itself around an inter-governmental appropriate role argument.

We know that States are extremely protective of their autonomy in ways that are not intuitively obvious, and it becomes difficult to figure out how they're going to interpret something. I know in the Human Services Field, States are often willing to create national standards if they feel that they're the actors deciding and the Federal Government is not involved at all, or if the Federal Government plays any role, particularly a regulatory role.

Now, another piece of this hidden agenda issue, or reality, is a fear that the Federal Government's promises today may be broken tomorrow. And sometimes that happens when issues resurface in the form of auditing or reporting requirements where the Feds say, I'm not going to require you to do something, but then five years later when the auditors come around, they end up making policy through those auditing requirements.

Another piece of the hidden agenda we talked about which is that data that is there does have a life of its own. While the assessment design says that the Feds will not get the information, I can't believe that with pressure on Federal government to describe what it's doing with its funds, that we can't expect that information to be used in budget hearings in the future.

Now, the challenge it seems to me, is to cramp policies that juggle these conflicting attributes, and the technical answer -- the answer given by the professional that reflects best practice -- can't be viewed in isolation from these other issues.

As I listened this morning and read the materials, I tried to think about experiences in other policy areas that might be relevant to this effort. I'm not sure there's any direct analogy, but there may be some experience that is useful in crafting the assessment process. As I noted, I'd like to describe two issues and then conclude with an example that might contain, at least a partial lesson for this process.

The first issue that seemed to me that we were talking about a number of ways this morning, is how were we thinking about the Federal role in this effort? In a somewhat simplistic way, we can say that the Federal role in education has either been active or passive. In the active role it's had a regulatory mode of operation; it's been a kind of command and control mode. the Feds tell others what to do and they've been more or less successful in accomplishing this.

But the second way, this more passive mode, is developmental and kind of a capacity development approach. In some ways it's like the Federal role before ESEA where you established dialogues, norms, and others used them. And I think one of the problems as we were discussing this morning, is that there is a piece of both of these approaches in what we've heard so far. If we take the developmental approach for example, and we care about parents and the classroom, why do we need to even have a national compilation of data?

There are a number of mechanisms that have been developed recently to try to move the regulatory approach to more collaborative relationships; the partnership movement that Bruce described. But that really has -- the partnership approach doesn't have a single product. It takes many forms and it's constantly changing.

Some of the work that I've done that has looked at -- the National Rural Development Partnership -- shows that there is a huge investment in the process of devising partnerships. This process investment is extremely important but even when you go through it, it does not always lead to an outcome approach, particularly an outcome approach that allows one to look at a national program or set of behaviors rather than a collection of multiple experiences that are not comparable.

Also, when you take the partnership approach, because the actors change you can't assume that agreement that is reached by those stakeholders at any one point, will stick in the future, because we are living in this environment that is constantly changing.

The second issue that concerned me that we discussed this morning was what I'd call the level of analysis issue. We're proposing to devise an assessment system that can be used by many with very, very different agendas. And I think it's extremely important to think this through, as several others have suggested.

The parents want individual information, and they want it more often than once every four years. We hope that teachers want information for feedback to improve teaching. Principals are likely to use this information for performance assessment of their teachers. School systems may use this information to allocate resources and to devise reward systems.

At the State level you can imagine the State Education Department using this to compare districts, legislatures to devise funding streams, gubernatorial candidates to use against one another in a political campaign, and as I noted at the Federal level, we can't believe that the Department won't use it.

I think at the minimum, members of Congress who are opposed to the Federal role in education will use it to show that programs are not working, while others will show that they're unmet needs. Each level has its own dynamic and spin, playing out different values. And it seems to me, only if you ignore value conflicts and the realities of politics, can you think that you can have a win-win situation with one system.

Now, let me very briefly talk about an example in another policy area that might be useful to explore. Healthy People 2000 was an effort that was started ten years ago. It was the development of a set of national health promotion and disease prevention objectives. The main player was a group called Healthy People 2000 Consortium, that was funded by the Public Health Service, U.S. Government, in cooperation with the Institute of Government, the National Academy of Sciences.

It was made up of 330 member organizations representing the professional, voluntary, and corporate sectors, as well as State and territorial health departments. The process established 300 specific objectives in 22 separate priority areas. Most of these had baseline data. But as they developed, a number of patterns surfaced.

For example, they found that groups of people who bear a disproportionate burden of suffering compared to the total population wasn't really being picked up, so the next iteration of this which is Healthy People 2010, is attempting to focus on those specialized populations.

Now, the objectives that are developed in Healthy People 2000 are out there; it's a menu that others can use. And to this point, the Federal role has been mainly developmental but it may not be for long. In several areas Congress has incorporated the Healthy People 2000 objectives into national legislation, even though that was not what was envisioned at the beginning.

And there is also a requirement that the data be used by States to track progress in several program areas. There are differences between Healthy People 2000 and what we've been talking about: the data isn't at the individual level; there's nothing analogous to information to the parent; there are a large number of elements; choices are given to users to mix and match. And although, as I say, it wasn't envisioned at the beginning, the data is linked to Federal resources allocation processes.

Now, it's interesting to compare I think, Health and Education, because the data in the Health area has been used to document needs and problems, and there doesn't seem to be the difficulty of blaming the citizen or the health provider for the non-achievement of those goals. So that's just an example that we might want to think about.

MR. SHANNON: This morning, Bill Taylor raised -- was the first, I believe, to raise the issue of the role of the national government. And he asserted that the primary role of the national government as far as elementary and secondary education is concerned, was equality of opportunity. I mean, that there would be at least a great consensus in that area.

And of course, that came out of the Civil Rights experiences of the 1960s. But we have a new and I think, a continuing and a second great need for a strong Federal role that's coming out of the experience of the '80s and the '90s, and certainly into the 21st Century. And that is the realities of a globalizing economy.

And you can almost lay it out with the logic of an Aristotelian syllogism. The major premise is that American students should at least be roughly competitive with the students of the other advanced nations of the world in this increasingly competitive -- the minor premise is that there is persuasive evidence that the students, American students, the products of our highly decentralized system, are not competitive in many cases with the products of the centralized educational system of many of the advanced countries.

And so the national policymakers are confronted with a problem. How do they raise up the general level of a highly decentralized educational establishment so that its products are fairly competitive at the junior high level and the high school level, with the foreign counterparts? And this problem isn't going to go away unless we get more results as we received yesterday; that at least at the 4th Grade level our students are not doing too badly -- in fact, they're holding their own.

But with the dynamic and the tension of the globalizing economy, there's going to be unremitting pressure on national leaders to do anything they can to try to raise the general level.

Now, as Bruce laid out, the Federal Government in the past has had three classic ways of influencing State and local governments. They can use bribes -- we've got 600 grants going; they can use mandates as they increasingly did in the '80s -- unfunded mandates but that's no longer fashionable; or they can use cooperation and jawboning. Those have been the three classic techniques for trying to get State and local governments to do what the national policymakers think they should be doing.

But money is short; political realities will not go for mandating -- especially unfunded; and there are limits to jawboning. President Bush and the Governors have been meeting for -- started back in about 1990. So there's an urgency to the problem. And frankly, I think this testing -- it's a unique way of trying to confront the realities that you can't bribe them and you can't mandate them and jawboning isn't necessarily going to do the job -- at least not quickly.

And when you really look at testing, it is putting a political burr under the saddle of the State and local educational establishment. It's designed to activate parents, activate political leaders, to demand much more out of an underperforming State and local school establishment. Underperforming in many of the suburbs and in some cases, nonperforming in the central cities.

So it's kind of a unique way of trying to hit the problem when the more traditional, classical techniques are not available. So this is unique. I often say that there's nothing new under our complicated Federal sun. But this is a pretty novel and unique approach at trying to -- at one level, trying to influence 50 States and 18,000 school districts.

And of course, it's going to raise all kinds of problems. I noticed that at 10:45 Chris Edley said he's spotted up to -- 84 warning flags had gone up. My latest count this afternoon is that it's gone up to about 127, so there are plenty of problems with this very innovative and in some ways, risky way of trying to stir up the State and local educational establishment.

I think though, there's a danger in concentrating too much on warning signals. One becomes like Hamlet, too wise to act, when you see all of the problems. I'm reminded, to paraphrase his soliloquy, great ideas become sicklied over with a pale task of doubt and lose their name of action.

And if you concentrate always on these negative problems -- and they are some, and there are going to be misuses -- this is an explosive technique of testing. Because it's going to put local school establishments on the spot -- no question about it.

The Austrian Economist used to talk about constructive deconstruction. You may get some of it with national testing of all these local school districts. But the end product is to introduce more parental demand and political demand for a more responsive educational system, more competitive, vis-a-vis, other major countries. So in that sense, I think that this may well turn out to be a rather historic endeavor.

And the last thing I'd like to leave you with is that the American political system -- especially when you get to these real tough issues, is riddled with political paradox. It took Richard Milhouse Nixon, a hater of Communists, to open the gates to Red China. And it may take William Jefferson Clinton, the great friend of teacher's unions, to pull off a revolutionary development at the local school level. Thank you.

MR. KNOTT: Since I understood this to be a discussion I'm just going to sit here. I'll try to be brief; it's almost 4 o'clock already, so we're over time. I'm going to go in a little different order than I thought I would just based on some of the comments that have already been made.

One of the things that I'm particularly interested in is this question of licensing. Unfortunately, at least from my work in inter-governmental relations, there hasn't been a lot of careful study of licensing, especially as it relates to outcomes. There's a lot of study of the processes, you know, whether you do licensing in one way or another and whether it produces different kinds of outcomes is something that I don't think has been studied very well.

There's one exception to that that I want to refer to, because I think it may have some lesson, at least for this question of enforcement that we've been talking about. And that is, a recent study by Brent Miller of the University of Arizona that's looking at licensing in different States in the area of mental health.

And one of the things that he found was a sharp contrast between Ohio and Arizona -- two of the States included in the study -- in the kind of outcome they had, and they had performance measures in the various community mental health organizations. And the key variable that determined the difference in outcome between those two States which had a very similar licensing structure, was the question of enforcement.

In Ohio, they actually had worked out in advance and saw it through, a fairly careful monitoring and enforcement system. So that, what they did is, they then franchised to a State-wide agency, these mental health services which then in turn, licensed local providers or gave contracts to local providers that were licensed providers. So it was a 2-stage kind of arrangement in which you had sort of a general contractor that then contracted out to subcontractors.

And the interesting thing about it is, in the study, those were the two States that had a similar kind of structure and so you could, in some ways, see the effect of the differences and effort that was put into enforcement.

And in Arizona, there was very little monitoring done of the general contractor, and very little work put into trying to figure out what it is that the ultimate licensed firms were supposed to be doing and what it was that they were going to monitor and what was included and so on.

And they ended up with the general contractor eventually building a surplus of funds and not passing it down, and they looked at patterns also, of who they had contracted with, which turned out not to be all that desirable, etc.

So, I won't go into the details of it but it just struck me when I was listening to this question of enforcement here, and that might be a useful reference for you to look at because it also looks at the incentives that the local firms had for getting involved in the contracting process.

And again, I won't go into the details of that here but there are a lot of games that this firms can play within a State context, in terms of getting the contract, who gets the contract and so on, and who is administering it. So I refer that to you, given the limited time I won't go into the specifics of it, but I do think that that is a very important issue.

For me, the crux of the issue, aside from this question of whether you do it -- you know, the fact that it hasn't been thought through that much yet, although you know, it will be -- is the question of what is going to be included in what the contractee is going to be responsible for and what is some other authority going to be responsible for -- either the State government or the Federal Government.

And given the fact that what I've heard is a consensus up here, that in fact, these test results will be used for things beyond what was originally stated. In other words, they're not going to be just used to inform students and parents of how kids are doing. That becomes a very real implementation, legal issue.

So I think that's really one of the key, crux decisions that's going to have to be made. Is there going to be a role for the Federal Government or State government, especially if these other kinds of uses are going to come into play, and what role does the contractor have in that?

A second point that I don't think has been made too much mention of, although there are several lawyers in the audience, I understand, and that is this question of judicial challenge -- not necessarily by the Federal Government, you know, in case somebody's not carrying out their contract correctly -- but by the States against this policy.

If we look at sort of recent history in Federalism, even say the gun-free legislation, there was State challenge to that, and it was a kind of, you know, it wasn't a direct, sort of Federal program, but there was legal challenge brought. And I would guess that you're going to see legal challenge against this.

Now, that may not happen because it's billed as a voluntary program, but to the extent that it includes requirements, and to the extent that people who engage in, voluntarily participating in the program, get involved in meeting or not meeting these requirements, and the State government gets involved also, and therefore mandating it to local governments, I think there's a risk here of some kind of judicial challenge form the other side, you know, saying that this is Federal Government interference and State's rights and into the field of Education.

Let me just make a couple of other points. One is that I see this as a kind of a dilemma in the context of what our other speakers have been saying. And that is, the force in inter-governmental relations over the past several years has been away from a strong role for the Federal Government. And you know, I think Bruce originally said, you know, if this had been done 20, 30 years ago it would be a large Grant program, and instead what we have is a voluntary program that promises not to connect it to anything else.

And when you look at the requirements it seems like these requirements are almost reacting to a fear that if the Federal Government actually did anything in a policy sense with regard to this program, that it would be bumping up against this political and institutional State's rights movement in the country, and also the legal approach, the development that has occurred that's protected State's rights.

So on the one hand you have the Federal Government, I think, wanting to do something in policy but putting forward something that really doesn't address the policy questions. It's kind of putting out a seed, as John was saying, and hoping that this is going to sprout into something -- with a lot of uncertainties around it -- rather than putting forward a policy that requires legislation that's going to have money and enforcement, etc., with it.

On the other hand, at the State level there are all kinds of incentives I think -- and these came out already -- to use this in a policy way. So that you're putting this seed into a context that is going to be very politically and policy intense. So that this issue of other uses is not a question, I think, it's a fact. There are going to be strategic high-stakes uses for this.

And so to pretend that there isn't at this point, I don't think makes any sense. That's the dilemma. How does the Federal Government pursue a policy which it really can't really pursue a policy on, given the political, institutional, legal context? And yet do something that's going to produce a result that somebody really wants, or that the Federal Government wants. And I think that's where the dilemma lies.

Now, in terms of -- and let me just say one other thing about the Federal role. I was very intrigued by that panel, Kati and -- I don't know their last names -- but Kati and Eva, I think were on this panel -- and they were both talking about these uses, strategic uses. And one was looking at it as a risk, you know, we've got to worry about this; and the other was looking at it as, yes, that's exactly what we ought to be doing.

So part of this question is, when you think about risk, you can have risk in two different ways. You can engage in risk by doing something that might cause harm, but you can also engage in risk by not doing something and then allowing a harm to manifest itself or continue. And I think that's what both of them were talking about.

Kati is really concerned about the risk of not doing something about these huge inequalities and huge needs that some of our children have. And Eva is saying, well if we're doing this and it leads to this, that may be a risk in terms of what the stated policy is.

And so there's some sorting out here, I think. That term risk, I think, is problematic in that respect. It's really a policy question that you're facing, and it's going to be decided at the Federal Government or State or local government by one or the other.

Let me just say something about incentives; the kind of context that this goes into. This is a complex issue. You know, it's hard to know where to begin with this. I was yesterday, at a Kellogg Foundation Conference and you know, there were several representatives there from Indian tribes in New Mexico.

And they were arguing against Head Start as a program saying that it was one of the worst things that happened to their tribe, and I thought that was kind of surprising since Head Start I thought, was one of the more successful pre-school programs.

And their point was that it encourages -- or, excuse me -- it discourages the use of Native languages and Native concepts in the development of their early child education. And that that leads to status distinctions and differences among the tribe, and they would like to see Head Start operated in a different way, and they had some suggestions around that.

And they gave several examples of this kind of thing. Now, this is just one small group that is going to have the same kind of problem with this national test, I suspect. It's going to encourage the use of English and knowing English by the 4th Grade among kids on the reservations, etc., and the kind of implications it might have. That's just one group.

When you start looking into this, I think the complex of incentives is really mind-boggling. I think of my own State, Michigan. Governor Engler is in support of this, I think because he's very supportive of Education. It also has political uses that he's going to use this for, I think, in order to promote, you know, Michigan in the world economy and that sort of thing.

But also, he's very interested in taking over poor-performing schools in the State, which he has proposed as a policy. He wants to put them in receivership and have the State government take over them if they don't meet a certain test level standards. You know, there's a different kind of incentive, a different kind of use.

You know, you can go through a lot of these things. My advice I guess, would be for the Department and for those involved in this, if they're going to throw out a seed like this, to at least start mapping out to some extent, what they think those various incentives for use or not use, might be. What States, what localities, what kind of districts are likely to adopt this? What kinds are not likely to adopt it? And is that going to introduce biases and distortions in the way this test is used, or politics that you might not like, based on how it gets adopted?

And I don't know to what extent that can be done. You know, I've sort of been overwhelmed by all the information that I've received in the new policy area just today, but I think it's something that, if I were an administrator, would be concerned about.

Just finally, since it is late, I do want to say that I've just directed a dissertation that deals with the use of waivers in the health policy area -- studies waivers -- and also just read an article about the States' response to the failure of the Clinton Health Care Reform.

And the conclusions of both of those studies is that initiative by the Federal Government, at least in this case, in the Health Policy area, did have a positive impact on State responses. In other words, the States that actually engaged in waivers -- and this was a fairly careful, statistical study over some years -- responded to key Federal initiatives.

It wasn't sort of like the kind of term of, you know, the Laboratory of Democracy where States sort of come up with these things and doing them on their own, but they responded to something that the Federal Government put into place, in both cases.

For example, with the Clinton Health Reform, a lot of States were moving towards various aspects of reform in their own health systems. When that reform failed, a lot of those States stopped moving in that direction. So I want to finish on that kind of encouraging note; that this is a stimulus from the Federal Government, even despite the fact that we're in a State's rights political institutional environment.

I think evidence suggests that this is going to stimulate a number of States and localities to join this or do something in this area that they might not already have done.

MR. ELMORE: Thank you, Jack. I'd like to thank your panelists, and I would also like, just in conclusion, to pluck three or four themes out of the conversation that I think we should take away from this broader understanding of the institutional context of Federalism and inter-governmental relations.

I want to stress that these are addressed to us collectively, and not just to Gary, because I think we've been sort of training our beady eyes on him all day as if he were somehow supposed to produce answers to these questions on the spot. I think these are things that as a professional research policy community, we ought to be agonizing about as this thing moves along, and ought to be participating in understanding.

The thing that strikes me about this conversation -- I've been in and out of this inter-governmental literature over the course of my academic career for a variety of purposes and haven't been squarely in it for a couple of years -- is how important the message is that this is a fast changing environment, and the Federal Government is getting maneuvered into terrain here that it's never been in before.

And that the national test is just symptomatic of just -- it's just one example of this; that it raises questions that we've never confronted in quite this form before. We can talk about where we've been and what the trajectory is and what the uncertainties are, but we don't really know the exact consequences of this -- and let me just particularize this to the field of Education in the following way.

The Federal Government's in an odd position here because it's trying -- this is my metaphor -- it's trying to kind of elbow its way into a game which has been in progress for quite some time. There are States out there that have invested substantial resources on their own in creating their own assessment systems, which will produce data on every student in every school in every district in the State.

There are States that have Standards but don't yet have assessments. There are States like Mississippi that are using off-the-shelf assessments until they can build their own assessments. There are States at all levels of this, but everybody's kind of inching into the Standards game. By the Consortium Policy Research's last count we had something like 43 States that had declared themselves to be in this game.

It's before too long, going to be impossible for a State not to be in this game. Along comes the Federal Government; the question is, what's the value-added and what's the rule that the Federal Government contributes to this discourse, and how do you tailor the national test so that it lands squarely in this rather fast-moving environment of inter-governmental relations in education in a way that people see as generally constructive and productive and say yes, this makes sense with the Federal Government, they ought to be in this game, and it resonates to me?

I don't think we're quite there yet in explaining the purpose of this, but we can get there. This boils down to a somewhat more specific question which I would like to see us address, just as clearly as we possibly can, which is: what is the -- and it's presented by a range of participants, but by Beryl and Jack most recently -- what is the incentive for a State or a locality to participate in this?

And what are the ranges of incentives that will be operating out there? I think -- I got the sense that people came to this meeting operating under the assumption that everybody was going to want to get in and the question was, how do you control the abuses? It's, given the state of play out there, it's not at all clear to me that everybody's going to want to be in, especially the States that are high capacity States that have been moving on this issue for some time.

I can see reasons why they might want to be in; I can see reasons why they might want to stay away; I can see them calculating issues like test burden; I can see them calculating issues like consistency with their existing assessment system.

I'm not expressing skepticism about why people would want to be in. I'm just sort of saying, do we underhand enough about the relationship between what's going on out there and why a State or a locality would want to participate in this, and what does this tell us about what the test ought to try to do, or how it can contribute to national discourse about performance and standards?

Then I think we need to address, really quite squarely, an issue that Bruce and John raised which is, what's the underlying model here? What's the theory? Is it a kind of cooperative partnership model in which the Federal Government is trying to engage various jurisdictions in some sort of constructive dialogue over the long term about student performance, and play a role in shaping discourse?

Is it an essentially competitive model in which we're trying to generate enough data so that we can get parents to stick it to schools, local jurisdictions to compete with each other, States to compete with each other, etc.? Or is it both?

Now, I'm not saying that the President has to stand up and declare that it's one or the other, but if you don't have some kind of theory underlying this, it's going to be really hard to say what a good system is versus a bad system.

The inherent logic of Goals 2000 always seemed to me to be really strong, which is, you have a very diverse, Federal system out there that's inching its way towards standards-based reform. There's a big problem of variable capacity. One role the Federal Government can play is to put some money out there to heat up the discourse, and especially to get the low-performing States engaged in the debate so that they're not always hanging out there all by themselves.

It's that kind of logic that I think needs to be more explicit in the national tests. Maybe it's there, maybe it isn't; I don't know. I haven't heard it yet.

Another message is that this idea is playing out across very diverse policy contexts. Many of you have heard my pet theory which is that national standards and national tests could very well increase the diversity of outcomes among schools, districts, and States in this country, rather than narrow the diversity of outcomes.

Because when you lay a uniform standard down on a highly -- a system that's characterized by a highly variable capacity, the high capacity jurisdictions tend to capitalize on it and push on, while the low capacity jurisdictions are still struggling to try to figure out what the game is about.

Testing, as several people have said, does nothing about the capacity problem. Part of the theory here is that somehow testing and feeding results back to the system entails the building of capacity. We don't know anything about that and we don't know exactly what to predict the results of such a system would be. We need to know more about the effects of a single test, a single set of expectations, across widely varying political environments and widely varying capacity.

On the issue of licensing, it seems to me that this question of what is governmental and what is not, is a key design issue in a licensing system. Maybe it's clearly defined in the existing set of specifications. I didn't get the sense that it is.

Whether it's clearly defined or not, there's a remaining issue about well defined it is for the various parties to this transaction, and whether we know enough about how to define the governmental role so that we don't get ourselves into trouble in this kind of a licensing arrangement.

And then Jack's question about licensing which is, can we actually structure the incentives here so that we know what we're getting? Can we figure out why a person would want to participate and why an organization would want to participate in such a licensing arrangement? And how to channel that behavior in a direction that's consistent with the collective ends, rather than the interests of the contractor.

While it is true that we don't know a whole lot empirically about the results of licensing arrangements, we've got a ton of literature on principal-agent theory, which Jack knows well. This may be the first real opportunity in which this literature is actually applied to a public policy problem for some benefit.

But I think this is the classic principal-agent problem, and understanding it a little bit from the perspective of how you define this relationship and how you understand the interests of the various parties to the relationship in a sort of hard-headed way rather than in a way that is characterized mostly by your aspirations for how it will work -- you know, who's actually going to do this work and what are the incentives operating on them, and what's a good result -- which is my last question.

I think those of us in the policy research community need to keep hammering away on this question of, what is a good result? I asked this to a room full of two State school officers a few years ago, and I even gave them a prompt, you know. I didn't just say, what's a good result? I said -- I gave them a little multiple-choice test.

I said, would a good result be shortening the distance between the bottom and the top performing students in schools? Would a good result be that the bottom comes up but the top can go anywhere? It can continue to go up, it can go up at a faster rate, etc. Would a good result be that a very substantial proportion of the population reaches some standard which would be a high minimum, and we don't care what happens to the rest of the population?

Just choose one or give me one that's better than any of the ones I have given you. The response I got back from them was, gee, we really haven't thought about that. Now, these are people, all of whom were involved in high reform States; all of whom had staked their personal and political reputation on standards-based reform; all of whom were active in a very entrepreneurial way in their States and on the national scene in standards-based reform.

Imagine the answer you would get from people who had little or no involvement in this. I think there is a general lack of discussion of what a good result is, in the political discourse around this. I can understand why political leaders don't want to be explicit about it in public discourse because it can incite a high degree of political conflict.

I don't understand why, in the privacy of, you know, four walls, you can't engage people in a serious discussion of it. I mean, why we haven't thought it through more clearly. I think this is symptomatic of, you know, if you don't know where you're going you're pretty likely to get there, you know, or not to get there, or whatever.

And I think this is a role that the policy research community can play; which is to keep provoking this question about what a good result is, and whether various designs push us in the direction of one sort of result or another, should be part of the political discourse that we can bring to the table.

I'm sure there are many other themes that you took away from this discussion. Those were a few that I plucked out. Chris, Bob, Mike -- what next?

CHAIRMAN EDLEY: I was just writing Michael a note saying that I had to beg off. I mean, I have to apologize to everybody because as you may know, the President's announcing this big Race Relations Initiative or something, and I'm in the middle of that, so I've been bopping in and out all days in phone calls with White House Staff and 6,000 reporters.

So I don't have enough of a sense, really, of the flow of all the discussion to do justice by way of summation, and was hoping that Bob and Rich would pick up that task this afternoon. But I will caucus with several people this evening and promise to have something that won't be too embarrassing to me, to contribute in the morning to help frame the discussion tomorrow.

So, my apologies.

MR. LINN: (Not on mike) I will -- let me say that I'm sure that Chris would do a better job of summarizing this and not --

(Laughter.)

Fortunately, Bruce McDowell -- assembling a lot of things that transpired today -- here -- out to me that -- three critical questions and might want to say something about those. When he earlier said, which of this team -- testing program. Can we get -- to adopt that -- and third, can --. And put that into the context that a lot of things that have been said in this last session were incentives.

I think incentives haven't been in play --. The first place they've been in play, in my mind, is when we're thinking about the meaning for some kind of -- to encourage the -- and encourage the --.

These incentives that started -- what is the incentive for -- . To imagine some sort of -- regulation -- and in addition to that, this is all --. So if I -- would want to put in any kind of description. Okay, so I think -- and those of us who have -- have heard rhetoric that is similar to what we're picking up on the national test, many, many times you can make mistakes. It happens to be a mistake -- and how many times have you heard, oh yes, the test --

(On mike) Having watched in many States, the lack of any sort of mechanism to look at the downsides and to make sure that you're compensating for, or avoiding some of the misuses of tests, I as a professional, have a mixed feeling here. Can we design a better test? Well, maybe. But there are some realities we hadn't talked about -- the disconnect between theory and reality that Dick brought up earlier in one context.

Well, there's a disconnect possibly, between the major of the test that is desired -- that I desire that I think everyone that is going to support this movement desires -- and the realities of what is going to happen in a period from March to May, is all the time you're going to have to score those tests.

So what does that say about the nature of the examination? How different will it be from tests that are produced by CTB, by Riverside, by Cycorp, in terms of what it's going to be? So there are a lot of tensions here and I think that we need to worry about the incentives for making sure that we accomplish the goal of having a test that's worth teaching to.

At the same time we don't let that rhetoric capture us and then just produce anything that can manage to fit within that reality. So I'm at a loss, not being a person who deals with regulations and policies and things like that, to know how that's to be accomplished.

There are a lot of people here that know more about that then I do, and I just hope that we'll think some about that whole series of incentives. And as I said, you would have been much better off having Chris summarize something when he wasn't here then let me make those kinds of statements.

MR. KNOTT: I'd like to explain about, what is a good result? This is, at least in Michigan, a very political question. We recently did a major finance reform of how K-12 education is financed, and we've shifted it from the property tax to a States sales tax, and then evenly distributed money across jurisdictions, so a lot of poor, rural jurisdictions which were only spending maybe $2500 or $3000 per student, now suddenly got $5000, or close to that, which became the State-wide standard.

The problem was, is that the State defined as a good result, bringing up those districts but not allowing the suburban, wealthy districts to be as good as they could be, and set a cap on how much money they could raise for spending on education. In other words, they couldn't raise the property tax above a certain amount.

And this is really at the heart, now, of what political controversy is over this finance reform -- or one of the key elements of it is, if you raise up the bottom, at least in this case, of spending, but don't allow the top to go even higher, you're going to get a strong opposition from those high-capacity areas.

So I kind of understand why this isn't on the table in some ways, because it seems like a very political issue. It sounds like a technical question when you ask, what is a good result? But in fact, it depends on who it benefits and who doesn't benefit, and how you answer that question benefits one group differently than another.

CHAIRMAN EDLEY: Bruce, and then John.

MR. McDOWELL: We did a study of that at ACIR just a few years ago, after a lot of this State education finance equalization had been in effect. And the general effect across the country was that the bottom raised up but the top kept pace -- a little bit better than kept pace -- and the actual range grew a bit, because they didn't cap the suburban areas that wanted to move ahead.

CHAIRMAN EDLEY: Okay.

MR. ELMORE: Just a quick response to that. I think it's quite possible that the aggregate effect of the introduction of broad-scale performance testing and Standards, could be a substantial widening in the variability of performance among students, because the policy itself does not contain the information necessary to succeed, right?

My favorite formulation of this is, you don't make the pig heavier by weighing it. Which is, what you're hoping the test entails is a heavy investment in teacher competency, classroom process, and school organizational capacity. When you drop such a system down on a system that has highly variable capacity to start with, you're introducing the same incentives that you described, Bruce, which is the high performers will crack the code and push farther, and the low performers will still be trying to figure out the rules of the game.

I think this is not a trivial question when you're tinkering with policies that set high expectations and that try to delivery information as a commodity for improvement.

CHAIRMAN EDLEY: John, and then George, and then Dick, and then Scott.

MR. SHANNON: I take a more optimistic view of the way that it will play out. It's possible or probable that in the very immediate future, the winners might win a little bit more. But I think over the long run a national assessment will focus like a laser beam on the real, real poor performers and both the national leaders and the State leaders will not be able to ignore it.

So that I think it lays the groundwork for some very surgical, some very specific, remedial action at higher levels.

CHAIRMAN EDLEY: George Madaus.

MR. MADAUS: On Bob's point, another very good -- and Dick's point about increasing -- a good example you should look at is the English experience for the standard assessment test. Thatcher wanted a pretty simple, multiple-choice test to get league tables to rank schools.

The educationers, as she called them, took over and they got a very elaborate performance-based test that they gave first at age 7. And it just blew up in their face, and now they're retreating back to what Bob described as, how is it any different than the regular kind of tests that we're used to?

The other thing about the English experience was, at least at Grade 7 and at least in the first year -- it stopped after that so we don't know what might have happened -- it did increase the variability. It increased it very, very much among schools and it increased it among populations like kids with learning disabilities, kids on free lunch, language, kids with second language problems.

The data are all there. So you can learn a lot from the English experience, versus the way they did it in Scotland where it was not a high-stakes situation at all, and the teachers bought into it, the parents liked it, and you never got the turmoil that you got in England and Wales. So I think we can learn a lot from that and from, as Bob points out, other State testing programs.

MR. JAEGAR: Gary, I think I've heard two levels of concern expressed here today. One has to do with the national test as a policy and folks who would probably prefer it if the program didn't exist at all.

But setting that aside, there's a second level of concern and that has to do with the proposed operation. And I think that a lot of folks would strongly prefer pretty heavy-handed and strong government control if you're going to do it. If you think about the aspects of NAEP that make it most desirable, it's because it's trusted, compared to a lot of testing alternatives.

And the reason it's trusted is because you have very strong and firm controls on quality, on almost all aspects of the test. And so it also occurs to me that the areas in which the Federal Government ought to be most skittish are those in which you are exercising some control. Those have to do with the content of the tests, the framework, the construction of the items and so on. You have a contractor who's doing that.

Presumably you're going to improve those products. I mean, those have to do with Federal control of Education. It's not the quality control aspects of the testing operation that any State is going to worry about. So it would seem to me that you can afford to be most direct and most directive on those aspects of the testing program that don't have do with content, but do have to do with quality control. And those are the areas in which you have the most to gain.

CHAIRMAN EDLEY: Okay, I can see the light at the end of the tunnel. Bob Linn and then Mike Feuer, and we'll close by giving Gary an opportunity to say a Benediction on the whole matter if he so chooses.

MR. LINN: I just had a brief comment on John's point on the laser beam. I think that is an important goal, to figure out what it is that's going to be different about this test that would make it actually used that way. But we have to put that in a context of the fact that we know a lot already. Washington, D.C., has participated as a State in NAEP, and look at the performance that has been reported on that, and look at what sort of actions have been taken to change that.

Now, one difference is that this is an individual child test. Chicago, to pick another city at random, reports results on the Iowa Test of Basic Skills on every child, okay. They're not reporting all that great of results, yet there has not been a movement in the State of Illinois to really rectify that in the laser beam set.

Okay, that's different; it's not a standards-based; it's not the kind of test that we're talking about. But how are we going to ensure that this test is enough different to really make sure that it serves that function of the laser beam? I support that very much but I don't think we should be naive that it will just happen. I'm not suggesting you're naive, but.

MR. FEUER: I want to suggest that some light bulbs went on over our heads here today, and I think these are light bulbs that can really shine some new light on the question that you came to us with, which is essentially about the licensing component of this whole program.

And I just want you to know that I sympathize because I sense there's a certain kind of Casey Stengelism going on here. Casey Stengel, as some of you New Yorkers might recall, when asked what time it was, would usually give a long history into the technology and development of the wristwatch.

There's even a variation on the Casey Stengel problem here, is because you've asked what time it is and we're asking -- and some of the answers have been, why do you even want to know? But I think what we've gotten here in looking at the history and technology of the wristwatch, at least partly, is some really important information that should go into the development of this model for licensing.

As I understand it, there are -- and I think this is the insight from Jack and others -- there are risks of action and risks of inaction, and there are benefits from action and there are benefits from inaction.

The Federal Government has taken a gamble that benefits from action -- in this case from this test -- will outweigh the combined risks or costs of inaction from just, as Kati Haycock said, letting things go as they are, plus the risks of action that are associated with whatever happens as a result of this test.

I think that's a useful way to frame the discussion for tomorrow, Gary, which ought to be, how can the specifics of this licensing model that you have started to lay out -- or variations to that model -- be tailored so as to at least raise the probability that the gamble will come out right?

And of course, I can't guarantee that as a result of all of this deliberation, you in the government may not want to sort of rethink the gamble. But that's not really our business here today or tomorrow.

So I would really hope that everybody comes back tomorrow thinking about how to put all of this very important and useful and rich information back into the context of the first sketch of the licensing plan, which is in these handouts. And I would ask you then, to say anything you'd like before we turn off the microphones tonight.

MR. PHILIPS: First, I just want to thank you for an excellent meeting, and this has been a good one. I know, some of you sort of came up to me and said, you know, are you okay? As if I was having some problems. Actually I'm not, I'm very comfortable. It's been a great meeting.

The issues that you've brought up is exactly why we wanted to have this meeting. I mean, you're right on target; it's exactly what we wanted to talk about. And --

MR. FEUER: We're not coming back tomorrow.

MR. PHILIPS: Really, it's been very good. Another thing about this, too, is that, you know, we're really I think, in the middle of making history here, and history is messy. It's not -- you know, when you read about it in the history books it's one thing, but the way it's actually done -- this is the way it's actually done.

And you know, if this does go forward, and all indications are that I see, that it will, this is going to be historic. And there are issues, you know, that we need to deal with, and there will be decisions that need to be made and compromises that need to be made. So it's a difficult situation and it's not easy.

But I do think that we are making history here. This reminds me -- this is on a more grander scale the way -- meetings I use to have on NAEP when we decided to do State-by-State assessments. Lots of people said there's just no way this is ever going to work. States are never going to sign up, it's not the right thing to do, it's the wrong policy decision. And we know that that turned out to be -- that worked, that seemed to work.

Maybe this might go the same route, I don't know. I think, my instinct is that this is a great idea and that the timing is just about right. And so that's my instinct. But there are lots of issues that need to be discussed, and I'm glad that we're discussing them and I want to encourage you tomorrow to continue with this.

CHAIRMAN EDLEY: Okay. Can some BOTA members caucus for a few moments over here? And thank you everyone, and get a good night's sleep.

(Whereupon, the meeting of the Board on Testing and Assessment was concluded at 4:41 p.m.)

[Home] [Directories] [Publications] [Search] [Site Map] [About]

[President's Corner] [Employment] [Browse] [Feedback]

To provide feedback about this web page, contact:

Jane Phillips, jphillip#064;nas.edu

Page Last Updated: 09/16/97 12:03:21 PM


Copyright (c) 1997 by the National Academy of Sciences. All rights reserved.
Degree Articles
Administration Degree
Associate Degree
Associate Degree at Home
Associate Online Degree Programs
Associates Degree Computer
Associates Degree Distance Education
Bachelors Degree
Behavioral Science Degree
Business Administration Degree
College Degrees
College Degree Online
College Degrees Online
College Degrees On Line
Correspondence Degrees
Criminal Justice Degree
Degrees
Degrees Online
Distance Education Degrees
Distance Education Master Degree Online
Distance Learning Associate Degree
Distance Learning Business Degrees
Distance Learning Degrees
Earn a Degree Online
Earn Associate Degree Online
Earn Bachelor Degree Online
Education Degree Online
Electrical Engineering Degrees Online
Engineering Degree Online
Internet Associate Degree
Law Degree
Law Degree Online
Masters Degree
Masters Degree Online
Masters Degree Online Programs
Medical Degrees
Nursing Degree
Nursing Degree Online
Online AA Degree
Online Accredited Degrees
Online Accredited Degree Programs
Online Associate Degree Program
Online Associates Degree
Online Bachelor Degree
Online Business Degree
Online Certificate Degrees
Online College Degree
Online College Degrees
Online Computer Degrees
Online Degree
Online Degrees
Online Degree Program
Online Degree Programs
Online Degrees Undergraduate and Graduate
Online Doctorate Degrees
Online Education Degree
Online Graduate Degree
Online Health Care Degree
Online Law Degree
Online Law Degree 2
Online Masters Degree
Online Master Degree Programs
Online MBA Degrees
Online Nursing Degree
Online Nursing Degrees
Online Psychology Degree
Online Seminary Degrees
Online University Degrees
Psychology Degree
Quick Online Degrees
RN Degree Online
Social Science Degree
Technology Associates Degree
University Degree
University Online Degrees
School Articles
High School Musical
School Supplies
High School
School Uniforms
Summer School
High School Cheerleaders
School Bus
School Vouchers
School Uniform
School Fights
School Pictures
Vacation Bible School
School Reunion
High School Wrestling
School Fundraiser
School Furniture
High School Football
School Supply List
School Violence
School Backpacks
Middle School
Middle School Cheerleaders
School Playground Equipment
Home School
Boarding Schools
Online Schools
High School Reunion
School Teacher
School Loans
Free School Supplies
Military Schools
High School Diploma
School Security
High School Sports
Charter Schools
Medical Schools
Boarding School
Private Schools
Home School Curriculum
Traffic School
School Fundraisers
Public School
Law School
Ivy League Schools
Driving School
Medical School
Back to School Supplies
Private School
Elementary Schools
Bartending School
Truck Driving Schools
Golf Schools
Real Estate School
Pharmacy Schools
Military School
Learning Articles
Distance Learning
Learning Styles
Distance Learning Programs
Learning Disabilities
Cooperative Learning
Online Learning
Learning Spanish
Learning Theory
Adult Learning
Learning Games
Learning Disability
Learning Sign Language
Learning English
Italian Learning
English Learning
Computer Learning Center
Active Learning
Online Learning Education
Learning French

Sitemap 1 - Sitemap 2 - Sitemap 3 - Sitemap 4 - Sitemap 5 - Sitemap 6