Chapter 8

From Testing Malaise and School Accountability to Neo-Vygotskian Approaches (1981-2002).

Paul F. Ballantyne

Starting in 1981 a considerable professional malaise regarding the content and procedural aspects of ability testing was in evidence. [172]   The content aspects (such as how to define human intelligence, and how to guide the design of better ability tests) had long been under debate and were now regarded by some as unresolvable.  The procedural aspects, however (such as how to best apply or interpret the results of already existing standardized tests in accordance with recent legal and professional guidelines), where under renewed disciplinary, legal, and media scrutiny.  This testing malaise became particularly acute with regard to higher-profile issues such as: (1) addressing the professional implications of ongoing class and racial bias of existing tests, and (2) obtaining expert opinion on the newer, largely extradisciplinary, debate over the fairness of existing higher educational admissions tests.

Between 1981-1983 it appeared the use of standardized testing (at least in the arena of higher education) would be severely curtailed due to the efforts of testing critics who had already exposed ETS tests (like the SAT and GRE) as a specialized form of fraud (Nairn, 1980).  Had the expected downsizing of test use by universities actually occurred (voluntarily or by way of legal action), it would have necessitated a brief uncomfortable period of disciplinary adjustment and theoretical housekeeping in psychology (Mayr, 1991).  This adjustment would then have been written up in the history books as a logical outcome of hard won battles for social justice during the previous era of equality (1964-1980) -in which all of the necessary federal protections, funding arrangements, and legal precedents had been set down to promote both fair treatment in the workplace and equality of opportunity to education. [173]

What was supposed to be a short period of apprehension, however, turned out to be a protracted period of increasingly general discomfort.  The initial malaise over testing, that is, did not dissipate but grew steadily worse as the federal government (under Reagan and Bush) ushered in the so-called era of public school accountability (1981-2002).  Despite both the veracity of past testing critiques and costly Equality era litigation (which stunted the growth of vocational and educational testing during the mid-late 1970s) an unanticipated expansion of standardized tests in public schools went forward unabated by debate over the ethics of applying a free market model of competition to the nation's schools.  ETS testing for higher education was also thereby granted a new lease on life.

The predominant mid-1980s disciplinary reaction to this politically mandated Accountability era was what Paul Kline's Psychology Exposed (1988) has called tactical eclecticism.  Comments on the significance of ongoing "race differences" in test performance or on the origins of human intellect itself, and especially on the ethics of expending testing in public education became intentionally tentative, subdued, ambiguous, and noncommittal.  The logical and disciplinary contradictions between the content aspects of testing history and the ongoing procedural-administrative use of tests (to assess students, schools, and teachers) was disturbing to be sure, but what to do about it in the wider politically correct climate of accountability was treated as an entirely different matter. [174]

While grappling with these disciplinary dichotomies, Linn (1986) concluded:

"Although testing has served...important educational functions, we are still a long way from reaching Scarr's (1981) goal of ensuring that testing is 'always...used in the interests of the children tested'...This goal implies the simultaneous pursuit of excellence and equity. Achieving this goal will require a better scientific understanding of [human intellectual development] and the construct validity of our measures of those processes" (p. 1159).

Another important analytical bridge between the two sets of testing discourses was provided in Thorndike & Lohman's A Century of Ability Testing (1990).  As is usual with such insider histories, that book (for the most part) highlighted the procedural aims and accomplishments of past testing traditions. [175]  Yet Thorndike & Lohman also ended their celebratory account by recognizing the potential of Neo-Vygotskian approaches for expanding the content and procedural aspects of ability testing.

While such attempts at inclusion from within the testing subdiscipline were the exception up to that time, by the late-1990s legal and legislative events again intervened to create a crisis which impacted upon both aspects of testing.  That is, when various states began to outlaw Affirmative Action (the primary institutional means of ensuring ethnic diversity on college campuses) -originally adopted as a corrective device for the bias of standardized admissions testing- the existing testing malaise become further exacerbated and generalized.  As Richards (1997) put it, the sustained disciplinary cult of impotence had finally caught up with us.

With a few notable exceptions (e.g., the interactionist accounts in Sternberg & Detterman, 1986; and various under-recognized Neo-Vygotskian approaches), the expected serious and communal act of theoretical housekeeping (regarding the content and procedural aspects of testing) was tactfully deferred until after the Accountability era played itself out. [176]   Having now arrived at that historical juncture, we must ask: Were the sporadic revised interactionist accounts of human intellect (in 1980-90s book-length anthologies) and new consensus on testing procedures (as outlined in special issues of American Psychologist) sufficient to address the current disciplinary crisis of relevance?  The answer I think is no.

The saving grace is that alternative solutions have been put forward.  The best of these have not only criticized the interactionist attempt at historical synthesis as unconvincing and inadequate to the requirements of a truly democratic society but have also put forward prescriptions for improving the state-of-the-art of testing itself.  The classical Vygotskian approach to human intellect, in particular, is utilized here to indicate where the next wave of assessment techniques is likely to come from and to what theoretical or practical aims they might be addressed. 

In contradistinction to insider aims of testing for testing sake, or of limiting discussions to administrative utility (rather than content), these aims will surely include: (1) outlining a truly developmental approach to intellectual assessment; (2) establishing a lawful approach to vocational selection or training policies; (3) setting up a Constitutional approach to ethnic diversity in higher education; and (4) providing defensible means of curricular assessment in public education.  Interactionism and its long-standing system of government sanctioned psychometric testing procedures has failed us on all four of these fronts.

Chapter Overview

Section one details how the proposal and implementation of the so-called era of public school accountability was driven by the free market corporate-oriented ideology of the federal government under presidents Reagan and Bush (1981-1991) and not by the historical facts of psychometric testing outlined in prior chapters. The impact of The Nation at Risk (1983) and the Draconian 1988 amendments to the Elementary and High School Education Act (1965) are both outlined.  Other state and federal developments including: (1) funding for Charter school programs during the Clinton years (1992-1999); (2) the ongoing abandonment of former Affirmative Action policies for higher educational admittance; and (3) the planned continuance of testing under G.W. Bush, are also given consideration.  The dismal record of the school testing boom (in promoting educational excellence and fair access to higher education) are then outlined with reference to recent works by Sacks (1999) and Lemann (1999).

Section two highlights more subdisciplinary and disciplinary concerns.  The strengths and weaknesses of revised (a.k.a. dynamic) interactionism as a description of human intellect are outlined.  It is pointed out that under revised interactionism, so-called expert opinion (as evidenced in the content of 1980s-90s anthologies) had hardly changed from the traditional (yet highly problematic) interactionist position of forty years earlier.  In other words, the genes plus environment fallacy was still held and was reflected in both: (1) individualist (rather than social or societal) portrayals of human intellect; and (2) additive (rather than transformative) empirical testing procedures being utilized.  The approaches of Zigler (1986), Snyderman & Rothman (1988); Flynn (1984-94) and Gould (1996) are all mentioned in this regard.

Finally, in section three, the utility of Vygotsky & Luria's cultural-historical account of mentality is highlighted not only as a means to (1) better understand the origins of the "Flynn effect" of rising average test scores; but also as (2) an indication that we must explicitly adopt a here-to-fore under-utilized transformative (Neo-Vygotskian) account of human mentality if we are ever to improve upon contemporary standards of ability testing.

Section One:

Cultural and Testing Malaise during the 1980s & 1990s

This section covers a period in which: (1) the public schools were used as a political scapegoat of an ailing American society; and (2) in which the progressive spirit of the Elementary and High School Act (1965) was compromised along the regressive lines of conservative corporate-oriented ideology in order to mandate an unjust and expensive era of high-stakes testing.  Given the so-called era of accountability was dictated by political ideology and not the proven empirical worth of existing standardized testing procedures, we should know something about the federal administrations that brought it into being before outlining the means by which it was implemented.

From Carter to Reagan: malaise to the self-deceit of easy illusion

As the 1980 presidential election approached, the country was in a state of profound gloom. [177]   This national malaise played into the hands of the right of center Republican Ronald Reagan who campaigned against both Jimmy Carter and against the Democrats as the party of failed solutions (Neustadt, 1990).  Reagan's skill at confidently proposing simple minded solutions to complex societal and economic problems was first learned during a long career in the entertainment industry and as a public relations officer for General Electric.  This glowing public persona had served him well during two terms as the Governor of California and (in 1980) it was just what a sizable majority of the voting public wanted in their next president.  To this group of voters, the substance of Reagan's campaign (aside from his promise of no new taxes) didn't matter.  Given the choice between ineffectual leadership in Washington and the self-deceit of easy illusion they voted for the latter.

Reagan's conservative convictions, however, and his so-called grassroots citizen politics were particularly appealing to the religious right.  Reagan, who was both divorced and only nominally religious, garnered considerable support among a movement of well organized fundamentalist Christians who believed the nation to be in a state of moral degeneration. [178]   In August, 1980, Reagan appeared in Dallas at the national Moral Majority rally to indicate his endorsement of their cause.  By the time of the Republican national convention, the Moral Majority had taken control of the party's conservative wing and had managed to register 4 million new voters that year (Georgianna, 1989).

Reagan promised to restore American government to its pre-depression era of simplicity (i.e., when individualism, self-reliance, and a free market economy were the American way of life). [179]   The two preceding decades, in particular, Reagan argued, were a period of egalitarian excess and of excessive cuts to national defense industries (Amaker, 1988; Johnson, 1991).  According to the gospel of supply-side (a.k.a. trickle-down) economics, tax cuts and government downsizing would promote increased investment which would in turn stimulate economic growth and higher tax revenues (Canto, et. al. 1983).  An anti-tax message and a plan to roll back welfare was at the core of his successful 1980 campaign. [180]

Reagan on education

In the area of education Reagan was an opponent of the so-called public school monopoly.  On the occasion of the release of his policy statement on education A Nation At Risk (1983) Reagan said: "Our educational system is in the grip of a crisis caused by low standards, lack of purpose, and a failure to strive for excellence.  Our agenda is to restore quality to education by increasing competition and by strengthening parental choice and local control" (April, 26, 1983).

This so-called Excellence in Education movement stressed the use of high-stakes testing in order to assess the performance of schools along the corporate model (Cuban, 1993).  Much like a failing business, the poor performing schools would be shut down and reorganized.  The particulars on how the Accountability era was implemented will be covered shortly but it should be pointed out up front that the rhetoric of excellence and the succeeding reality of testing in the schools were two different things.  That is, during a period of wider fiscal belt-tightening and social program cutbacks, State and local tax revenue fell dramatically and federal help for education was not forthcoming.  School building maintenance, curriculum modernization, and serious funding for addressing the rise of heavy drug use (and gang violence) in schools were all neglected (Wilson, 1985; Fukuyama, 1999).

Proposal and implementation of high-stakes testing (1983-2002)

Traditionally, American public schools had educated citizens to live in a democracy.  They were the melting pot in which immigrants embraced the American dream and they were at the forefront of the struggle for equality (Spring, 1994).  But Reagan now blamed Civil Rights enforcement for hurting basic education over the prior twenty years: "The schools were charged by the Federal Courts with leading in the correcting of long-standing injustices in our society: Racial segregation, sex discrimination, lack of opportunity for the handicapped.  Perhaps there was simply too much to do in too little time..." (June, 30, 1983).

Reagan's argument at this time was that American schools do not need vast new sums of money as much as they need a few fundamental reforms.  In accordance with this rationale, the federal government scaled back its funding role in education shifting the burden of reform to state and local authorities.  Thus, in the mid-1980s public schools were asked to compete in a business driven world where their corporate bottom line was performance on standardized tests.  This policy shift was proposed and implemented in three steps: (1) publication of The Nation at Risk (1983) which announced an educational crisis; (2) Draconian 1988 amendments to the Elementary and High School Education Act of 1965 (which gutted the spirit of this Equality era legislation); and (3) implementation of standardized ETS tests in the schools.  Reagan was long gone from the political scene by the time the pedagogical results of this testing boom were finally available.

A Nation at Risk

Near the end of Reagan's first term, a landmark government report called A Nation at Risk: The imperative for education reform (1983) was widely distributed and covered extensively in the press.  It was authored by the specially formed National Commission on Excellence in Education and funded under the auspices of Education Secretary Terrence Bell.  This report was a marvel of alarmist propaganda and mobilized both military and corporate analogies throughout:

"Our Nation is at risk....while we can take justifiable pride in what our schools and colleges have historically accomplished...the educational foundations of our society are presently being eroded by a rising tide of mediocrity that threatens our very future as a Nation.... If an unfriendly foreign power had attempted to impose on America the mediocre educational performance that exists today, we might well have viewed it as an act of war.... We have, in effect, been committing an act of unthinking, unilateral educational disarmament... Knowledge, learning, information, and skilled intelligence are the new raw materials of international commerce and are today spreading throughout the world as vigorously as miracle drugs, synthetic fertilizers, and blue jeans did earlier.... Learning is the indispensable investment required for success in the information age we are entering..." (COEE, 1983).

By referring to a supposed "unbroken decline" in average 1963-1980 era SAT scores (Wirtz, & Howe, 1977), Risk claimed that the Equality era schools (by adopting "minimum requirements" (see Pipho, 1978) and by making gratuitous course choice options available to students (instead of sticking to the traditional "standards" and "basic curricula" of the past) had wasted away the so-called "competitive edge" of American schools achieved during the prior Sputnik era. [181]   This appeal to a long decline in SAT scores was somewhat disingenuous because it flew in the face of the 1980 response of NEA to the College Board's original (1977) claim about the span and reasons for the decline.  It was also eventually pointed out that the de facto SAT score decline was concentrated primarily in the 1970s, a time of tremendous cultural turmoil in America and had begun to subside near the end of that decade (Stedman & Kaestler, 1991; Fukuyama, 1999).

When one considers both the political contingencies behind the formation of the COEE and timing of the report, it is not surprising that the real culprits of educational erosion (including a crumbling school infrastructure and increased drug use among students) would not be addressed by the new federal mandates.  For instance, while the issue of poor textbooks was mentioned in Risk it was paired with a claim that the fault for this lay in the liberal dumbing-down of the curriculum (rather than the lack of funding for modernizing such teaching resources).  With regard to the timing of the report, the committee had clearly been given an 18 month period (from August 26, 1981 to April 26, 1983) to produce a report that would be in the households of America just in time for Reagan's upcoming 1984 re-election bid.

Despite proclamations in the opening lines of the report to the contrary, the schools would indeed now be used a political "scapegoat" upon which to blame the ills of the American economy.  The advocacy in Risk of a back-to-basics curriculum emphasis was also very much in keeping with the Reagan administration's nostalgic and overly simplistic approach to genuinely complex educational issues.  Finally, the single mention in the report of the "twin goals of equity and high-quality schooling" must be viewed as merely rhetorical because it was a wholly, anti-equality, regressive, reform mandate that was being set into motion.  While this report brought the issue of education to the forefront of political debate in the 1984 election, it also gave political impetus for the adoption of standardized testing technologies even as those tests were becoming known to be of dubious utility.

According to the logic of the report, higher test scores translated into smarter workers, a growing economy, and superior international competitiveness in the global economy.  Thus Risk recommended that: "Standardized tests of achievement...should be administered at major transition points from one level of schooling to another and particularly from high school to college or work....and administered as part of a nationwide (but not federal) system of State and local standardized tests" (COEE, 1983).

Both educators and elected State officials would be held responsible for accomplishing this federal reform agenda.  This policy statement became a veritable New Testament for the modern accountability movement both during and after Reagan's second term (see fig. 56).

Figure 56 Reagan meets with the COEE in 1984 after his re-election. Members of the committee included: 4 University or College Presidents; 1 School Board President; 1 Bell Telephone Chairman; 1 Commissioner of Education; 3 High School Principals; 1 President of the Foundation for Teaching Economics; 1 President of the National School Boards Association; 3 University Professors (Physics, Mathematics, Languages); 1 private consultant; 1 member of Virginia State Board of Education; 1 Former Governor; and 1 Superintendent of Schools for the State of Minnesota (photo from Tyack et al., 2001).

Title 1 Amendments (1988):

The era of high-stakes testing was taken one step farther in 1988 when the accountability section of the Elementary and Secondary Education Act (1965), called Title 1, was re-written to impel States to adopt standardized tests as a means of measuring the results of school reform (Madaus, 1994).  This was a fundamental corruption of the progressive intent of the Act and of Title 1, so a brief elaboration is necessary.

The 1965 Act was originally used as a financial 'carrot' to the 'stick' of the Civil Rights Act (1964) which had impelled integration of the public school system (Heubert, 1999).  Title 1, in particular, was written up to help financially strapped schools (especially those with high concentrations of poor and minority children) to show their need by way of norm-referenced monitoring (such as average socio-economic status of parents in their school district; percentages of visible minority students; or Iowa test of Basic Skills scores).  It was a well-intentioned inclusion in the act which aimed at assuring federal funds to schools which integrated and to rural or urban schools in need (Katz, 1968, 1971).

In 1988, the loose Title 1 accountability provisions were sharpened to require standardized testing.  Local schools were now required to develop desired outcomes for Title 1 funds with results to be measured by standardized test scores.  Schools failing to meet test score objectives (set by the state) were required to submit a program improvement plan to federal authorities.  Thus, the progressive funding incentive of Title 1 had become a Draconian barrier to funding access particularly for those needy and failing schools for which it had originally been written.

Implementation and costs

Standardized testing in schools has assumed an even greater dominance since the time of the Nader report (1980).  The adoption of educational competency tests, in particular, however, was partly a continuation of a pre-1981 trend.  From 1976 to 1980, the number of states requiring some form of minimum competency testing (MCT) increased from 8 to 38 (Learner, 1981).  This rise was merely the first sign of what was to come later (Linn, 1986).  By the end of 1981, just about half the states had adopted mandatory public education testing programs, and by 1998 all but two did (see Sacks, 1999; McGinn, 1999).

In the new Accountability era, each local school and each district would have to prove to the taxpayer that its schools deserved the state and federal funding it was receiving.  The implications of implementing higher standards was first felt at the local school level and then at the district and state levels as a formalized state by state test comparison system was established.  In turn, when the results of these comparisons became known in each school district and in each state, calls for further school reform and for further school choice were forthcoming.

In the years immediately following A Nation at Risk (1983), 35 states adopted highly political school discipline and austerity mandates.  Tougher standards on the local school and district levels initially translated into get tough grade policies requiring minimum GPA requirements for participation in so-called extracurricular activities such as music and sports programs.  This emphasis on the narrow band of academic education was a highly characteristic public relations bracing-tactic on the part of public school administrations in preparation for the eventual coming of age of standardized school by school testing.

After 1988, test scores from nearly every school and district in the country were collected and compared annually under the new Title 1 Evaluation and Reporting System (TIERS), which permits comparison of results by State, region (urban, suburban, rural), type of school (private, public, charter), and level (elementary, middle school, high school).  In 1997, $200 million was spent annually on these public school testing programs alone (Sack, 1999, p. 12) and by 2001, this monetary cost had risen to $500 million.

Aside from the monetary testing costs, there have also been curricular, social, and individual costs.  For one thing, test preparation and test taking began monopolizing about one month of each school year (Frederiksen, 1984).  With test scores being published in the newspapers and school budgets contingent on those scores, property values also began to be linked to the results of testing.  Further, the individual costs for those who failed to meet the new tests standards for graduation from one grade to the next were considerable.  While individual school averages would be bumped up by such retentions (due to practice effect and extra test coaching for retained students) it is notable that Sacks (1999) suggests that many retentions are still made on the basis of district or state test scores (typically in the third, eighth, and 12 year) rather than actual attained school course work grades.  Such practices have led to numerous court challenges (see Sacks, 1999).

Early challenges to the 1979 Florida State (MCT) test determining the award of high school diplomas (e.g., Debra P. v. Turlington, 1981; 1983) provide an example.  The 1981 ruling required that a massive study of the "instructional validity" of the State Student Assessment Test-Part II (SSAT-II) be conducted.  The resulting evidence was accepted in 1983 by the Court of Appeals as a fair test of what was actually being taught in Florida classrooms (Linn, 1986).  The score cut-offs for the Florida test were also under a highly politicized revision during this period (resulting in a decrease in test "failures" on their MCT from 6% in 1979 down to 1.4% in the graduating class of 1983).  The ethics of such educational testing was also under some consideration by Messick (1980; 1981, 1984).  This pattern of legal, empirical, political, and ethical consideration of MCT test implementation would soon be repeated in other states too (see Sacks, 1999).

Despite these procedural advances in psychometric test validity assessment and the related politically guided damage control criterion adjustments, both the narrowing effect on the curriculum (in educational districts which teach to the test) and the negative psychological impact of grade retentions themselves, came to be increasingly criticized in the media by both educational theorists and by student activist themselves (see Linn, 1994; Kreitzer & Madaus, 1995; McGinn, 1999).

Ongoing Politics of School reform

During the mid-late 1990s there was further pressure put on the public school system by way of calls for expanded private school voucher systems (Doerr et al., 1996; Dwyer, 2002); from highly publicized so-called competition from charter schools (Nathan, 1996; Finn et al., 2000); and from calls for public support of religiously based home schooling.  The main consideration here is whether these new school choice options actually posed a serious threat to the public school system.  The short of the story is that they were not a serious threat.  Instead, these choice options tended to be adopted on a limited term basis and applied topically in areas where (or toward student populations for which) the public school system had already failed. [182]

By the 2000-2001 school year, for instance, 90 percent of school-age children still attended traditional (although now highly test oriented) public school institutions.  Students using publicly funded vouchers (in Milwaukee, Cleveland, and Florida) constituted only .03% percent of the national school-age population and 2.5 % percent were schooled at home.  Similarly, public schools numbered over 90,000 and charter schools numbered 2100, with only 173 of these being run by for-profit companies (Tyack & Anderson, 2001).

In the 2000 election, both George W. Bush (Republican) and Al Gore (Democrat) advocated a continuance of the standardized testing movement in public schools.  Most notably however, the political minefield of higher education admissions policy was downplayed by both candidates because it was hard to gage how voters felt on affirmative action.

Upon winning the election, Bush then announced on C-SPAN that: "Educational excellence for all is a national issue and of this moment is a presidential priority... Children must be tested every year in reading and math... Not just in the third grade and the eighth grade [as was done under the Clinton administration], but in the third, fourth, fifth, sixth, seventh, and eighth grade..." (January, 23, 2001).

By January of 2002, Bush had signed into law the Elementary and Secondary Education Act (or as he prefers to call it the "Leave No Child Behind Act" of 2002).  The act, while increasing federal funding to education by some 40 percent, also mandates the testing of every school child in America (in reading and math) from the third to the eighth grades.  Since that time, the New York Times Magazine (April 7) has run a front page cover story on the growing "Class War Over School Testing" which, among other things, indicates that a middle-class backlash against testing may soon be underway on the grounds that generalized school testing is actually pulling down the educational standards of better funded school districts which will now be forced to teach to the test (Traub, 2002).

While the recent increase in federal funding is welcomed by all, the debate over school testing itself will surely continue as the breadth of its application expands yet again.  Educators and psychologists now have an unavoidable obligation to provide the public with informed opinions regarding the past results and likely impact of continued testing on the quality of public education.

Analyzing the Results and Prescriptions of the Testing Boom

The assumptions and prescriptions of A Nation at Risk (1983) have had free hand in the ensuing years.  Has this unexpected era of testing boom actually raised the pre-Risk level of academic achievement in public schools while ensuring equality of opportunity as promised?  The results have clearly been mixed and any analysis of these results requires bearing in mind the underlying issue of the mission of public education (and public higher education in particular).  To this end, after looking at Sack's (1999) critical analysis of the results of Risk, the recent tenuous status of SAT, GRE, and affirmative action based admissions policies for higher education will be outlined.

It will be argued that the past costs of testing (described above) have by far outweighed the meager observed benefits (described below).  Further, it is noted that the assumptions of the post-Risk school testing initiatives and those of the longer-standing (but currently ailing) higher educational admittance tests are virtually identical.  This fact is used as an indication that both of these forms of testing will eventually peter out over the next decade or so.  That is, even under the new conditions of improved federal funding, it may be that the considerable state-funded efforts currently going into the blanket application of standardized testing initiatives would be better spent on ensuring the equitable allocation of the kinds of basic (infrastructure, curricular, teaching ratio, library, and technological) resources that have historically been shown to be required for actually promoting quality public education.

Did testing produce improved performance?

As noted at the outset of this chapter, groups that had historically lagged behind in access to quality public education had already been afforded specific legislative attention during the Equality era of public schools (1964-1980).  So, by 1983 the public school system was clearly doing a better job in this respect than any time in the past (Heubert, 1999).  But in that year, the past emphasis on equality of access was swept away in favor of concerns over the lagging economic competitiveness of America which was in part blamed on the schools.  In addition to the earlier mentioned false reference in A Nation at Risk (1983) to a sustained decline of SAT scores corresponding to the Equality era, there are other indications that: (1) the primary claim of contemporaneous curricular degeneration; and (2) the assumption that high-stakes testing would improve academic attainment were both ill-founded.

Peter Sacks in Standardized Minds (1999), has critiqued the assumptions and actual pedagogical outcome of the Risk agenda.  First of all, if there were deep problems with American educational attainment and skills in the early 1980s, one might expect that by 1994 (during a period of economic boom), the reforms would have brought about higher levels of educational attainment.  But the numbers do not bare this expectation out.

First of all, according to the National Education Goals Panel: In March 1979, 85.6% of Americans (age 25-29) had completed high school.  By the mid-1990s, this number had risen only 1% (see NAEP, 1996).  Similarly for higher education, in 1979, 3 in 10 Americans (age 25-29) had obtained a bachelor's degree and this percentage had not changed by the mid-1990s (Sacks, 1999). [183]   As Sacks points out, therefore, contrary to the logic of the American school bashers, "the U.S. economy [was] hardly at the brink of ruination because of a dysfunctional education system" (p. 86) and in any case the schools certainly were not subsequently credited with the improved economy of the mid-1990s.

Secondly, given the costly imposition of accountability testing programs under Title 1 amendments, one would expect to see an improvement in measured performance if the implied equation in Risk of smarter students and measurable testing results is to hold up.  According to the periodically gathered National Assessment of Education Progress (NAEP, 1996) data, however, such an equation doesn't hold up. [184]   Math proficiency and science achievement comparisons from 1973, 1982 and 1992 remain steady suggesting that: (1) prior to Risk, there was not a "crisis of mediocrity;" and (2) that the imposition of expensive testing programs did not improve overall public school academic performance in any case (Sacks, p. 84).

Further, NAEP data also indicates that up to 1995, the four lone states which had not yet adopted a high-stakes accountability testing program still met or surpassed the national 1994-95 average requirements for math and science achievement.  Hence, the politically charged accountability rhetoric in Risk may have been a complete red herring in the first place and adopting high-stakes tests alone do not necessarily lead to higher overall educational quality for the nation (Sacks, 1999; pp. 88-93; Lemann; 1999a).

At the state level, one valuable lesson learned over the past few years is that the use of off-the-shelf standardized tests (e.g., the Stanford 9; Iowa Test of Basic Skills) as a means of assessing, accrediting, and monetarily rewarding public schools has not produced the desired gains.  The main advantage of these commercially available tests is that they are cheap to administer and score.  Harcourt Brace' Stanford 9, for instance, costs only $6 per student to administer.  The disadvantage is that they: (1) were originally constructed (and have been successively revised) so as to produce results which fall along an ideologically loaded and discriminatory "normal" statistical curve (with non-discriminating items being routinely thrown out); and (2) by virtue of the fact that they are intended for a national marketplace, they do not necessarily reflect the content of the courses taught in the state schools.

This situation may change as high profile states such as California with the nation's largest public school system (and which tied Louisiana for last on the 1995 NAEP) switch over to state specific (ostensively instructionally valid) tests but only time will tell if the 20 million per year (per state) price tag for doing so will produce the desired ends (see Merrow, 2002).  I suspect, however, that (despite the current administration's support of a further expansion in testing) there will eventually be a political backlash against it on the grounds that the costs outweigh the observed gains.

A clear indication of the likelihood of a backlash against public school testing is to be found in the renewed controversy surrounding the issue of standardized testing for higher educational access.  Indeed the ongoing Accountability era testing boom in public schools has modeled itself on a system of higher educational testing that was on the verge of failure prior to being given a short-term lease on life by way of Risk.  The politically charged cycle of events there will likely be repeated in the arena of public school accountability so some degree of elaboration is necessary.

Tenuous status of the SAT; GRE; and Affirmative Action

The case for abolishing the SAT and GRE tests as techniques for higher educational admittance (e.g., Crouse, & Trusheim, 1988) is finally starting to become well known.  James Conant (one of the architects of the modern American educational system) had considered the adoption of the SAT and GRE for college admissions to be a great equalizer.  He viewed the tests as an objective uncontaminated measure of merit.  That is, as a fair and reliable measure of ability to succeed in higher education (Lemann, 1999b).  They are not (e.g., see Steele, & Aronson, 1995; Steele,1997; Spencer et al., 1999 on stereotype threat in intellectual test performance).

Conant also believed that standardized tests would function to cancel out the economic advantages that parents traditionally pass on to their children by sending them to better schools (which had been favored by pre-W.W.II higher institutions of learning).  They don't.  Instead, incredible energies and money are now annually put into preparing well off students for these tests so that the haves in America ensure their children will be haves also (Lemann, 1999a).

Ninety-seven percent of students writing the SAT now use some form of test preparation.  But hidden within that universe of testing preparation is a world of socio-economic difference.  Those with sufficient economic means pay 500 dollars an hour for private tutors, those of moderate means pay for group prep courses or purchase CD ROMs (from Princeton Review or Kaplan), and those without means attend self-help study groups or free courses at their local high school (which themselves vary in quality from region to region).

While it is still claimed that the SAT can predict 15 percent of freshman grades in the first year of college (and similarly that the GRE can predict 11 percent of the variance in first year graduate course grades) it is no longer claimed by anyone that they have any predictive validity to select out those students who will complete a degree or go on to make real world contributions in any given area of expertise.  Instead, it is now realized that these standardized admissions tests function to reproduce the class system (from generation to generation) not turn it on its head (Lemann, 1999a).  Better educated and prosperous parents produce children who score better on standardized tests

Ironically, these relatively recent realizations are in part due to a the political fallout surrounding a conservative backlash (from 1985 onward) that has successively struck down former State run Affirmative Action initiatives (which had tended to utilize different test score cut-offs for students of different ethnic backgrounds).  These realizations also result from the ongoing institutional search for finding other means of promoting ethnic (and class) diversity on college campuses.  As Sacks (1999) put it: "In one of the great ironies of recent American history, the very existence of affirmative action itself -and the recent legal and popular attacks on it- will force the hands of educational decision makers regarding the real utility of gate keeping tests. Only then will the prevailing merit system be officially rendered obsolete" (Sacks, 1999, p. 284).

In July of 1995, public acrimony over a two-tiered gatekeeping system moved California Board of Regents to a vote to phase out (over three years) all consideration of ethnicity and gender in admissions.  A year later California voters passed Proposition 209 which officially ended Affirmative Action in public agencies and higher education.  By 1998, at UC-Berkeley, the numbers of freshman class black students admitted had dropped 60 percent.  Hispanic admissions were off 40 percent.  At UCLA, UC-San Diego, Davis, Irvine, and Santa Barbara, black admissions had dropped anywhere between 14-46 percent.  Hispanic admissions had also declined from 9-33 percent (Sacks, 1999).

Ward Connerly and Terence Pell (of the Center for Individual Rights based in Washington, DC) have sued public universities in the states of Texas, Michigan, and Washington, claiming that affirmative action programs discriminate by applying different test score standards to different races. [185]   Their fundamental argument is threefold: (1) that these universities are faced with real disparities in standardized test scores (along racial lines); (2) that any institution using standardized tests in its admissions process has got to find a way around those disparities if it desires ethnic diversity; and (3) that the typical way to do so is by the illegal use of racial preferences (Connerly, 2000).

In Hopwood v. State of Texas, plaintiffs including Cheryl Hopwood sued because the University of Texas School of Law systematically passed White students over in favor of ethnic minorities with lower LSAT scores.  The court ruled (in March 1996) that the School of Law's use of race as a factor in its admissions equation was prohibited by the U.S. Constitution.  The weight of the Hopwood decision was then felt in other legal challenges.  The University of Michigan's law school (Grutter v. Bollinger, et al.) and then their undergraduate (Gratz v. Bollinger, et al.) College of Literature, Sciences, and the Arts soon faced a lawsuits that essentially duplicated that brought in Texas.  Similarly, in November 1998, Washington State voters approved a bill banning consideration of race in hiring or college admissions decisions.  Using racial classifications to achieve diversity or proportional representation no longer passes constitutional muster nor does it draw clear voter support.

New attempts at diversity

The hunt is now on to find ways that will ensure ethnic diversity on campus without breaking the recent anti-affirmative action rulings.  Some of the solutions have been half-steps (combining both standardized test scores and non-test criteria) and others have foregone the standardized testing dilemma altogether.

Berkeley, for instance, now admits half of the freshmen class on pure academic criteria (grades, course difficulty, and SAT scores).  But in choosing the other half of the class, admissions officers consider other factors (a student's activities, community service, and past ability to overcome life obstacles).  For this reason, they require each applicant to submit an essay concerning their personal background and planned academic intent.  While the ethnicity and gender blanks of the applications are occluded, there are many ways in which this information is revealed in the essays and it remains to be seen whether this new half-step procedure will pass legal muster.  Other half-step programs which both predate and follow the Hopwood decision (including that of Texas) are covered in Sacks (1999) -who argues that in most cases they are not particularly effective in ensuring proportional representation of students from Black and Mexican ethnic backgrounds.

As for the issue of why universities should bother to strive for ethnic diversity, Bowen & Bok's The Shape of the River (1998) studied the impact of affirmative action at 28 selective universities and found evidence indicating the dramatic success of minority students.  The leadership role that black matriculants are contributing in civic and community life, in particular, is quite telling.  The ratios of black metriculants leading civic organizations outstrips that of Whites by 2 to 1 in some areas.  Bowen & Bok argue, therefore, that choices during the freshmen admissions process have to be based not on who has achieved a given test score result at this point in their life but on the basis of which set of applicants will contribute most to the quality of education at the institution to the larger purposes of society (i.e., to the need of society for diverse leadership).  As Nairn (1980) and many other sources since that time have indicated, both the SAT and GRE provide no such predictive validity for future success.

Sacks (1999) has argued that the small percent of institutions that have forgone the traditional test based entrance exams have been successful in promoting diversity on campus and decidedly not at the expense of high academic standards.  The overall degree completion rate, for instance, tends to remain comparable after the abandonment of standardized entrance tests.  While the percentage of test optional 4 year institutions was still small (about 12% in 1997), he correctly predicted the trend toward broader admissions criteria would continue into the near future.

Other higher profile attempts have also been made to circumvent the testing bias dilemma by allowing high school grade ranking to be the predominant criterion for student acceptance.  In Texas, a Uniform Admissions Policy (SB 588) was signed in 1997 which required public universities to automatically accept any Texas student ranked in the top 10% of their graduating class.  The bill meant that the highest performing students from struggling urban schools in San Antonio would be on an equal footing with those from elite schools in Dallas.  The bill also left room for each campus to extend its admissions to the top 25 percent of each high school class.

Sacks (1999) notes that the plan seems to be working: At the University of Texas, for instance, of the 12,000 new students in the 1998 class 4,000 were ranked in the top 10.  Admissions for black students who scored in top 10 percent of their class rose from 87 percent in 1995 (prior to the effects of Hopwood) to 97 percent in 1998.  In other words, the traditionally low SAT scores for this group of applicants no longer posed a barrier (p. 295).

Similarly, the One Florida Program adopted by executive order in November 1999, (partly as a means to pre-empt planned legal action by Connerly) outlawed race, ethnic and gender preferences in state contracting, college admissions, and some state hiring.  Governor Jeb Bush thereby eliminated racial quotas in admissions and guaranteed college placement for the "top 20 percent" of Florida high school seniors.  Initial Fall 2000 enrollment figures provided by Florida's 10 public universities and the Board of Regents indicated that an additional 1,234 African-American, Hispanic, Asian and Native American students had entered the university system as compared with the Fall 1999 numbers. [186]

It remains to be seen whether legal cases will be forthcoming from this new high school percentile-based approach but initial indicators are that they have certainly surpassed the pure standardized test based admissions policies of the past in terms of promoting both ethnic diversity and equality of access for female versus male students.  This latter point is especially important because it solves a well-known, long-standing gender bias weakness in test based admissions policies.  That is, while males tend to outperform females on the SAT (by about 40 points on average), the actual attained freshmen (and end of degree) GPAs of females who are accepted tend to be higher than those of males (Lemann, 1999a).

Most recently, the ETS has responded to the current 20% drop in their college admittance market share by revising the structure of the SAT exam.  In June of 2002, they proposed dropping the analogy section altogether and introducing a 25 minute written essay section (so as to better assess critical reading skills).  The reply from Bob Schaeffer (of FairTest) was immediate.  These cosmetic changes (proposed to come on stream by 2005) will likely exacerbate the already existing biases of the SAT.  In other words, the use of the "revised" SAT will continue to discriminate against the poor, against minorities, and against both older and women applicants.

Striking at the very heart of the role of higher education and of educational access itself Schaeffer summed up:

"The issue isn't the old SAT versus the new SAT or even the alternative, the ACT. The real question is why any college needs to use a test. And there are already 391 colleges and universities in this country that don't require test scores to admit substantial numbers of their applicants.  Some of the most competitive schools....believe that every child can learn and...that every child can show it in real academic work, [and] not largely [by] filling in bubbles and writing one formulaic essay in three hours on a Saturday morning. The high school record is much, much richer. It includes lots of tests, lots of essays, and all kinds of other information" (Schaeffer, The News Hour with Jim Lehrer, July, 2, 2002).

Before returning to these rather weighty and politicized issues, we must first address the more disciplinary consideration of how well the 1980s-1990s interactionist consensus on testing (within psychology) did at: (1) informing our understanding of what human intellect is and how it develops; and (2) providing specific provisions or alternatives for concrete professional issues (such as the effectiveness of school reform or adjustments to admissions policies).

Section Two:

Strengths and Weakness of Revised Interactionism

During the early 1980s, critiques of the older IQ testing tradition such as Gould's Mismeasure of Man (1981) began to achieved the status of conventional wisdom within academic circles.  The main point of Gould's book was that brain primacy theory and assumptions about racial hierarchy of mentality had historically been brought directly into the intelligence testing tradition of the past.  In an era in which the intellectual descendants of these tests were making a resurgence it was incumbent upon middle-of-the-road psychologists to delicately grapple with the content aspects of human intellect in a manner that would not rip the discipline apart at the professional seems between supporters and detractors of testing.

Occasionally, the reticence to make pronouncements went far beyond political correctness to become either: (1) an incredulous denial of past testing endeavors as reflective of anything of ontological reality (e.g., Mensh & Mensh, The IQ Mythology, 1991); or (2) a social constructivist account of the historicity of past views on human intellect as somehow insoluble in principle (e.g., Richards, Race, Racism and Psychology, 1997).  While the vast majority of professionals fell into neither of these two radical anti-ontological camps, it had become politically incorrect in academic circles to acknowledge or to emphasize past racial differences in test performance nor to postulate what they might actually mean (see E. Hunt, 1995a).

I say "academic" circles here because there had now emerged a professional schism between the professorial/clinical specialties of the APA on the one hand and the experimental/psychometric specialties of the APS on this and other matters.  In 1988, a group of psychometric and research-oriented psychologists founded their own organization, the American Psychological Society (APS).  This was the professional refuge toward which proponents of psychometric tests and other reductive accounts of animal and human intellect gravitated. [187]

Also in that year, a considerable divergence between the professional persona of the so-called anti-testing consensus and the private beliefs of psychologists was noted.  Snyderman & Rothman's The IQ Controversy, the Media and Public Policy (1988), referring to the results of their questionnaire, indicated that "whatever the conventional wisdom holds," the vast majority of survey respondents continued to believe that: (1) intelligence can be measured; (2) genetic endowment plays an important role in individual differences in IQ; and (3) IQ was an important determinant of general success in American society.  The latter two of those beliefs have already brought into considerable doubt.  My point in mentioning them here is that under such a professional climate of tactical eclecticism (and de facto fractionation of affiliation), no clear or sustained effort at theoretical housekeeping on the issue of defining and measuring the content aspects of distinctly human intellect was likely to occur.

There were, however, sporadic attempts to reach consensus on human intellect; both within psychology (e.g., Sternberg & Detterman's important anthology What is Intelligence?, 1986) and outside it (e.g., two interdisciplinary anthologies intended to counteract psychometric racism by Fraser, 1995; and Kincheloe, et al., 1996).  By considering the content of such attempts, in combination with more individual extradisciplinary efforts (e.g., Flynn, 1984-1990; Gould, 1996) we can tease out the strengths and limitations of the new so-called revised (a.k.a. dynamic or social) interactionism of the Accountability era.

I will argue that the limitations of the resulting professional consensus on testing procedures remained vulnerable to the biogenic arguments (e.g., Anderson, 1992; Rushton, 1990, 1995; Herrnstein, & Murray, 1994) for precisely the reasons mentioned in the Snyderman & Rothman survey conclusions.  In other words, while the both the historicity of human intellect and its extra-individual (socio-societal) origin were beginning to be recognized from 1981 onwards, the methodological and professional implications of these realizations had not yet been fully recognized, elaborated, disseminated, or implemented.

Weakness of Accountability era anthologies

It is now clear that Darwin's mental evolution continuity view error was repeated by the 1950s-70s interactionist account of human intellect.  Interactionism (like racism itself) is a fundamentally conservative position.  As such, its more liberal proponents unintentionally continued throughout the 1980s & 1990s to buffer scientific racism and other forms of biopsychological reductionism from direct attack.  That is, in both dynamic interactionist views on intelligence (circa 1980s) and subsequent movements toward recognizing an historically situated social intelligence (circa 1990s), the postulation of nature plus nurture was retained thereby allowing (by far) too much room for biogenic proponents to maneuver.

Defining intelligence: The new interactionism

The issue of how far psychology had come in understanding human intellect was faced head-on by an anthology published in the form of Sternberg & Detterman's What is Intelligence? (1986).  The editors opened the anthology with the statement that: "theories in this volume identify three main loci of intelligence -intelligence within the individual, intelligence within the environment, and intelligence within the interaction between the individual and the environment" (p. 3).  The brief contributions of the volume, therefore, whether advocating psychometric "g" (e.g., Eysench, Jensen); multiple "types" of intellect (e.g., Gardner, Schank, Sternberg, Scarr); or whether refusing to make pronouncements on such ontological questions (e.g., Horn, Humphreys, Hunt, Zigler), were therefore unified in their thoroughgoing interactionist rather than transformative approaches to human intellect (see also Gardner, 1983, 1985, 1987, 1993).

Edward Zigler's (1986) contribution is especially important because while attempting to outline a rather progressive developmental stage approach to human intellect, he also exhibits an untenable ontological agnosticism by: (1) emphasizing the arbitrary nature of definitions (as being not 'right or wrong' but merely more or less 'useful'); and (2) by retaining the historically problematic IQ measure in his diagram of differential intellectual development.  Zigler's diagram on intellectual levels (p. 86) is very important because it graphically portrays the hidden additive (and individualistic) assumptions of so-called dynamic interactionism (see fig 57).

Figure 57 Zigler's depiction of intellectual development. The vertical arrow represents time's passage. Horizontal arrows represent "events which effect the individual" (indicated by a pair of vertical lines). "Cognitive" development is depicted as an ascending spiral, in which the numbered loops represent successive stages of "intellectual growth" (from Zigler, 1986; see also Kimble et al., 1980/1984). Despite his commendable attempt to depict a stage approach, Zigler's retention of the IQ measure (at the base of the diagram) is highly problematic because it had already been abandon by the Consortium of Longitudinal Studies (1983) in their analysis of the success of Head Start and seems to indicate (though unintentionally) that a fixed mental capacity is being implied. Although the conflation of IQ scores with both mental capacity and the number of achieved levels of mental development was also very common between 1965-mid-1980s, Zigler, an important player in Head Start for decades, would later openly lament his own part in such lapses (see Zigler, & Muenchow, 1992).

The retention of the IQ measure is stated here as problematic because it had already been abandon by the Consortium of Longitudinal Studies (1983) in their analysis of the success of Head Start.  This having been said, however, Zigler's accompanying textual qualifications on the above diagram comes closest of all the 1986 anthology contributions to elaborating what a valid assessment of an individual's intellect might entail:

"A valid assessment of an individual's functioning would consist of a variety of measures, including a test of formal cognitive ability (such as the standard IQ test or Piagetian model of cognitive functioning), and achievement measure (e.g., the PIAT) and some indicator of motivational and emotional variables (such as self-image or locus of control)" (p. 151).

When I first saw Zigler's diagram, I was both appalled and elated.  The inclusion of an apparently stable IQ measure as correlated with the amount of developmental levels achieved by an individual, for instance, was most alarming.  According to the diagram, the 150 IQ child had completed 9 levels of development by age 20 whereas the 66 IQ child had only completed 4.  What is implied in this depiction of a stable IQ attribution?  No answer is given by Zigler, but I suggest that what is implied is the very argument which Head Start was set up to counter-argue (i.e., fixed mental capacity).  Even a comparison of Zigler's diagram with the former additive mental ladder of the 1920s and 1930s shows that the older diagrams at least allowed for some variance of IQ scores to be attained by individuals across the life-course.

Similarly disturbing is the depicted equal width of the various (undefined) developmental levels depicted.  In my opinion, other older diagrams of developmental stages (e.g., Arnold Gesell's spiral mental growth cycle) had it more right than Zigler by depicting each new developmental stage (or rather mental mile-stone) as expanding the horizons of children as they pass through them.  Thus, the diagrams used in educational texts for many years had correctly depicted successive stages of mental "growth" as wider than the previous ones (e.g., Lindgren, 1956, p. 49).  We will return to this issue of depiction shortly.

Despite the limitations of Zigler's diagram, however, I was also elated by his explicit mention of motivational, emotional, and social assessment as being necessary because this meant that I could eventually include such concerns in my own developmental hierarchy of intellect without raising too many eyebrows.  Indeed, it is necessary to do so in order to demonstrate the difference between the individual, social, and societal mechanisms of rising from one intellectual level to another.  Before moving on to that account, however, we must first return to the issue of how far the so-called dynamic and social interactionism went in describing these causal mechanisms of mental transformation.

A little while after the Sternberg & Detterman anthology, the third chapter of Snyderman & Rothman (1988) nicely outlined the logic of the interactionist account as it is played out in 1980s empirical terms -including contemporaneous arguments over the assumption of covariance of human intellect with IQ score (pp. 80-81).  That chapter also provides forward-looking summaries of so-called dynamic meta-analysis evidence (from longitudinal studies, twin studies, semi-historical comparisons, and cross-race studies) which would become predominant in the testing subdiscipline thereafter.  All of these points intentionally work toward their argument that empirical rigor of such studies had increased (p. 92), but also (unintentionally) toward demonstrating the ongoing hegemony of the continuity view of mind in even the best proponents of mainstream interactionist psychology.

In short, what has always been missing in these interactionist accounts is a serious account of the role of societal-historical existance in not only providing a context for higher mental processes to be expressed in human activity but also in the very formation and divergence of our higher mental processes (a.k.a. intellectual stages) from the sort of lower mental processes we share with animal mentality.  All the same, by at least considering the empirical tools and theory production methods of the past, and by expanding them to include an emphasis on longitudinal (a.k.a. life-span developmental) and motivational approaches, the dynamic interactionist account has clearly made an important half-step beyond the unreflective (ahistorical) traditional interactionist account.

It came to be recognized that what was needed, was not merely further recitation of data, nor further bows to measurement technologies produced in related disciplines, but rather a comparative historicized re-analysis of the practical applications of the past knowledge products of the testing industry.  Yet a firm argument regarding the theoretical and methodological basis upon which to guide this re-analysis was not forthcoming from within the loosely affiliated dynamic/social interactionist camp.  Thus, 1980s-1990s interactionist interpretations of the available longitudinal data (including those appealing to social influences, motivation, and social-historical context for intellectual growth) did not surmount but rather merely postponed resolution of the nature and nurture debate.

Revised interactionism versus scientific racism

How well did the de jure revised interactionism deal with blatant forms of scientific racism and statistical methodolatry during the Accountability era?  The answer is not all that well.  That is, while the best interactionist theorists and extradisciplinary commentaries of the era were successful in exposing the duplicitous procedural tactics used by Philip Rushton and by Charles Murray in their accounts of ongoing race differences in IQ, great difficulties were encountered when attempting provide viable theoretical counter-arguments and methodological options on the content end of the testing debate. [188]   The new claims that the updated biogenic hypothesis was somehow data driven are as disingenuous as those of Jensen, so we will dispense with a description of those claims.  The main issue here is: What kinds of counter-arguments and methodological alternatives were put forward in their place?

Specific contributions to two of the most recent anthologies intended to counteract psychometric racism (Fraser, 1995; Kincheloe, et al., 1996) are highly enlightening on this issue.  First of all, Stephen J. Gould's comments in The Bell Curve Wars (Fraser, 1995) are indicative of the fact that he had not yet worked out the difference between social interaction and a truly transformative approach to mental evolution.  In both that volume and in the second edition of Mismeasure of Man (1996) Gould appropriately criticizes Herrnstein & Murray's The Bell Curve (1994) for failing to distinguish between statistical and cultural bias in intelligence test measurements. [189]   But in the anthology, he elaborates by stating that we "do not yet know the answer" to the question as to "whether blacks average 85 and Whites 100 because society treats blacks unfairly -that is, whether lower black scores record [social biases]" (Gould in Fraser, 1995, p. 18).  This flaccid statement in the realm of mentality is hardly the kind of stand one would expect from an outspoken proponent of punctuated equilibria (and its implications) in the realm of organic evolution.

Having become embroiled in the interactionist's false subdisciplinary dichotomy between the "g vs. specific abilities" debate, Gould (1995; 1996) then throws in his chips in against Thurstone's g factor and for Guilford's (1947, 1952, 1959, 1966a&b, 1967, 1971) multiple intelligence model (and hence with the thoroughgoing interactionism of Guilford and Howard Gardner).  Yet this fifty-year-old "one intellect vs. many" debate was far from the most central or timely methodological issue at stake in contemporaneous test interpretation.

In other words, given the increasing engagement of Gould's "punctuated equilibria" account (Eldrege & Gould, 1972; Gould, 1980a&b) in anthropological circles around the issue of the organic growth of hominid brainsize (see Falk, 1992; Lewin, 1993) -an account which postulates a transformative (i.e., episodic step-wise) rather than strictly gradual growth role of tool use and language along the lines of Russel Wallace- it is highly ironic that Gould's revised Mismeasure of Man (1996) does not explicitly extend the concept of punctuated equilibria from the organic to the mental realm.  In failing to do so clearly, he has apparently repeated Darwin's (1872) mental continuity error.  No account of gradual changes in mentality leading to qualitative shifts in mental kinds is to be found in either work and Gould (1996) makes only one highly tangential reference to punctuated equilibria itself.

Similarly, Kincheloe & Steinberg's commendable effort to characterize the "hopeful" socio-cultural counter-argument (in their introduction to Measured Lies: The bell curve examined, 1996) also falls flat because they seem to lump into one (albeit heterogeneous) camp the proponents of both interactionism and socio-cultural (transformative) analysis:

"We argue in this book that there is reason for hope. Ignoring literally scores of studies that document the benefits of educational intervention, Herrnstein and Murray would rob the poor and non-white of future promise. An entire school of psychological analysis has emerged over the last two decades that views the development of higher orders of thinking around sociocultural interaction (Bohm and Edwards, 1991; Gardner, 1983, 1991; Hultgren, 1987; Kincheloe, 1993; Lave, 1988, Raizen, 1989; Vygotsky, 1978; Walkerdine, 1984, 1988; Wertsch, 1991; Wexler, 1992)" (p. 36).

While the above reference to Vygotsky (1978), and to sociocultural analysis itself is highly encouraging, the label of "sociocultural interaction" is not because Vygotsky (as described below) belongs to a group of thinkers who provided a truly transformative (rather than additive or merely social interactionist) approach to human mentality.  Examples of why this kind of loose appeal to sociocultural analysis of individual differences (and to mental levels) is problematic have already been given but one final example may be useful.

Despite its claims to be dynamic and developmental, the new revised interactionism is ironically left completely vulnerable to the outright separation of higher intelligence from development contained in the work of Mike Anderson (1992) who suggests (along the lines of Jensen) that an innately given "speed of processing" (operationalized as reaction time differences) is the basis of all observed individual differences. [190]   Why are the sociocultural interactionists vulnerable to this speed of processing argument?  Because their accounts are consistent with (or complacent toward) Anderson's main premise: That "lower level theories" of intellect requiring little or no appeal to knowledge (e.g., reaction time tasks in Jensen, 1982; Eysenck, 1986) explain the regularities in the test score data (i.e., no racial differences on these lower tasks) and that the "high level theories" of intellect, which make direct appeal to "cognitive" processes (e.g., Hunt, 1980; Sternberg, 1985) explain exceptions to those regularities (i.e., that blacks consistently score lower than Whites and that Asian immigrants have recently out-performed Whites on those same tests or that individuals may perform better on the mathematical vs. verbal sections of such tests).

The traditional interactionist approach to human intellect (by assuming an unreflective genes plus environment stand) treated mere empirical descriptions as if they were explanatory.  This approach to data was subsequently brought directly into the so-called cognitive science variants of interactionism (of the 1980s and 1990s) which tended to be unapologetically reductionist. [191] Anderson (1992), as one of these proponents, argues that (apparently inborn) speed of processing underlies all subsequent "development" of higher intellectual functions and he then mobilizes a convoluted array cognitive science based experimental data to purportedly support his views.

The one kernel of truth in Anderson's argument is that both the "lower" and "higher" theoretical camps regarding human intellect have (historically and at least implicitly) assumed that genetic endowment is (to some degree) responsible for providing a basis (and upper limit) upon which both the growth of intellect and effectiveness of educational interventions occur.  This shared additive assumption was the platform upon which both Rushton and Murray made their mark in the 1980s and 1990s and no unequivocal reply to this shared view appeared in the resulting anti-racist anthologies.

It is crucial that we now explicitly escape the conceptual and practical bounds of such additive analysis (i.e., nature "plus" nurture).  Only then can we investigate the developmental aspects of intellectual growth by way of concretely outlining the typical patterns of transformation of lower mental processes into higher mental processes.  That is, only then will we have an explanatory understanding of human intellect and be able to produce tests which measure those processes.  The great difficulty in this, of course, is to find a way of doing so that is not associationist, reductive, or even interactionist (see fig. 58).

Figure 58 Bell Curve Controversy. Replying to Herrnstein & Murray's (1994) statistically veiled Mental Darwinist book, eminent figures from various fields mobilized divergent sets of interactionist concepts with no underlying unified conclusions being reached as to what should be done about improving ongoing testing technologies and interpretation (photo from Fraser, 1995). Unless a serious and sustained effort at theoretical housekeeping is carried out to promote transformative (rather than additive) approaches to ability testing, further resurgences of the biogenic account (disingenuous or otherwise) are sure to follow.

By way of demonstrating its utility in explaining the so-called Flynn effect of raising average intelligence test scores, I will argue shortly that only the transformative account of mentality explicitly abandons this long-held associationist/additive methodology for an emergent evolutionary mental ladder approach to intellectual capacity.  It does this by explicitly requiring careful reference to the typical three-fold (phylogenetic, ontogenetic, and socio-historical) pattern of quantitative expansion and qualitative transformations in human mentality.  This methodology guides research by viewing the higher rungs of the mental ladder not as merely added to fundamentally unchanging lower rungs but rather as transforming (both in the process of their development and afterwards) those lower rungs into something qualitatively different.

The proximate historiographic lesson for the present subsection, however, is simply that while successful (i.e., explicit and convincing) counter-arguments to biogenic accounts were not forthcoming from within the diverse revised interactionist camp, important methodological half-steps were at least hinted at by testing outsiders James Flynn (political scientist) and Stephen J. Gould (evolutionary biologist).  These included: (1) a recognition of the historicity of change in human intellectual performance across generations; and (2) an appeal to the concept of punctuated equilibria to support a theory of mental levels.  We will now concentrate on elaborating the first of these half-steps (namely historicity of human intellect) primarily because it deals specifically with the subdiscipline of psychometric ability testing; but both are important.

The Flynn Effect (Historicity of IQ scores recognized)

One part of the sociocultural message (i.e., the historical origin of human intellect) was hinted at by the work of Flynn (1984, 1987, 1990).  Indeed the "Flynn Effect," the recognition that average IQ test score performance is rapidly rising relative to the standardized samples periodically used to update them -both in the United States (Flynn, 1984) and other technologically advanced nations (Flynn, 1987)- has obtained considerable disciplinary cache.  For instance, the theme of the historicity of test performance and the societal origins of the Flynn effect was touched on in the APA Task Force report "Intelligence Knowns and Unknowns" (Neisser et al., 1996) and was then the subject of both a well-known follow-up article by Neisser (1997) appearing in American Scientist (see fig. 59) and an interdisciplinary anthology called The Rising Curve (Neisser, 1998).


Figure 59 Rising scores on the IQ tests (from Neisser, 1997). The above panel indicates that if children of 1997 were to take the 1932 Wechsler test, their average would be somewhere around 120. Alternately, if the generation of 1932 took the present test, their average IQ would have been 80 (with up to one quarter of them being assessed as mentally deficient). The lower panel indicates that the largest Flynn effects appear on tests of visual reasoning (such as the Raven's Progressive Matrices). The increase in these so-called culturally reduced tests are roughly twice the rate on that of broad spectrum tests like the WISC or WAIS. The Dutch data, depicted here, shows a 21 point difference between 1952 and 1982 (which extrapolated back to the early 1930s produces a 35 point increase). That is, the average 19 year old in the Netherlands is now producing scores that would have been 2 standard deviations above the mean for their grandfathers. Moving beyond the considerable limits of Flynn's initial accounts, Neisser (1997) puts forward an important visual stimulation hypothesis to explain the cultural-technological origins of these differential changes in average test performance gains.

The considerable theoretical and procedural limitations of Flynn's 1980s research have tended to be glossed over in subsequent psychology textbook accounts, and the 1998 anthology (despite its sincere efforts) has left many issues to be resolved clearly if a viable future course for the testing subdiscipline is to be mapped out.  Ongoing uncertainty about the subdisciplinary implications of Flynn's work, for instance, led Deary (2001) to conclude that: "If there was a prize to be offered in the field of human intelligence research, it would be for the person who can explain the Flynn effect of rising IQ" (p. 112).

As for one of these theoretical limitations, Neisser (1997) has correctly fingered Flynn's initial belief in psychometric "g" as highly problematic and (as shown below) this belief clearly pervades the basic assumptions, data handling procedures, and conclusions within Flynn's (1984 and 1987) accounts.  It must also be said up front, however, that the still more prevalent theoretical assumption of the individualistic nature of human intellect -as something inside the test subject's head- has gone virtually unquestioned both within Flynn's work and in the subsequent reviews.  This omission itself has profound importance for the future course of intellectual assessment.  Indeed, no true explanation of the Flynn effect is possible without throwing out this obsolete assumption explicitly.  I will argue, therefore, that by combining both the visual stimulation hypothesis of Neisser (1997) and the cultural evolutionary argument of Greenfield (1998) with the (only recently made available) socio-cultural methodology of Vygotsky & Luria (1930/1993), we can get fairly close to explaining the Flynn effect in its fuller significance.

Flynn's account

It is likely the more conservative psychometric aspects of Flynn's approach that have been instrumental in their ready inclusion into the psychological literature.  Flynn (1984) started with the seemingly mundane assumption that any statistically reliable method of assessing IQ test performance over time requires that the initial standardization samples (used to establish test norms) be to some degree representative of the American test taking population (as they were in the years sampled).  In his meta-analysis of past test score comparison studies, he carefully selected out the best 73 studies (representing 7,500 subjects ranging in age from 2-48 years) in which two or more Stanford Binet and/or Wechsler tests were given to the same group of subjects.  In order to investigate whether a pattern of test performance gains could be ascertained between the years 1932-1978, Flynn reasoned that improved performance over time would be reflected in subjects finding earlier test norms easier to exceed than later ones (which is, in part, the rationale for periodic restandardization of tests test norms in the first place).

To control for the varying de facto quality of particular test versions and for the degree in which the tests were in fact representative, however, Flynn provides a brief consideration of the history of successive standardizations of SB and Wechsler tests noting (among other things) that the SB-(1932) was standardized on Whites only and that Wechsler-Bellevue Form 1 (normed 1935-38) had drawn on a sample of New York area residents (pp.30-32).  Following from this, Flynn's survey compared the selected testing data against two sets of norms: (1) against the norms in which the tests taken were originally standardized; and (2) against a "uniform scoring convention" which he worked out to control for various forms of statistical bias and "confounding variables" in past tests or standardization samples.  The latter entailed statistical transposition of "mixed-race" or "minority" data into so many standard deviations above or below the "White" mean.

The data and conclusions of Flynn's 1984 research must therefore be recognized as relating to a fictitious ethnically amorphous generalized American mind (i.e., a hypothetical statistical conglomerate rather than any actually ontologically existing category of mentality or actually existing test population).  This having been said, interesting results were obtained and Flynn attempted to move beyond both contemporaneous subdisciplinary (operationist) agnosticism and the primarily correlational concerns of past IQ test follow-up accounts (e.g. Owens 1953; Campbell,1965) by actually teasing out a descriptive account of: (1) to what degree; and (2) which parts of the tests were now being performed better.  He also attempted (somewhat less successfully) to link this pattern of changing test scores to observable patterns of changes in contemporaneous modern society. [192]

The overall finding that Americans did a "better and better job" on IQ tests over a period of 46 years (amounting to "an American IQ gain" of 13-15 points between the period 1932 to 1978), with a consistent linear "year by year" performance gain of between .25-.440 points (p. 32), is the one picked up in subsequent textbooks.  It is arguable, however, that the most important results of Flynn's research are in the finer details regarding which aspects of the tests were being performed better.

Firstly, Flynn (1984) indicates that the highest gains over time on the Wechsler scales were concentrated on the symbol coding subtests (which went unchanged between the WISC-1947-48 and WISC-R -1971-73).  Flynn (1987) also indicates that even for post-1950 adult data, Wechsler performance subtest gains were greater than verbal subtest gains in various nations, "sometimes by as much as 16 points" (p. 186).

No clear statement is made with regard to SB tests in isolation but Flynn (1984 and 1987) comments on the "divergence" between overall full IQ gains (including SB tests) and contemporaneous declines on SAT-Verbal test scores, denoting to the charitable reader that his initial, rather historically naive, belief in a unitary "g" is beginning to waver.  For instance, noting that overall IQ score increases occurred during a period of lowering of SAT-Verbal scores (mid-1960s and late-1970s), Flynn (1984) asks: "[H]ow can school children gain so much in overall intelligence [i.e., full Wechsler or Binet IQ scores] and make so little progress in terms of enhanced vocabulary? What causal factors could increase intelligence and somehow with-hold their potency from the world of words [as indicated by SAT-V scores]?" (p. 46).

Flynn (1984) eventually turns to "environmental factors" which differed between generations (including socio-economic status, increased test-wiseness, and educational improvements) as a starting point for further analysis, concluding that while a case for the "malleability of IQ performance within times of normal environmental change" has been made, the reference to such variables only takes us about "half-way" in accounting for the magnitude of observed overall gains (p. 48).  Flynn (1987) then elaborates on this point claiming that "higher levels of education contribute 1 point, SES may contribute 3 points, and ...test sophistication perhaps 2 points" (p. 188) to these overall increases.

Importantly, his "Massive IQ gains in 14 nations: What IQ tests measure" (1987) attempts to further tease out the differential magnitude of score gains on various kinds of IQ tests, eventually arguing that existing IQ tests do not test human intelligence per se but rather a little understood "correlate" with a weak causal link to intelligence called "abstract problem-solving ability" (p.188).  He then elaborates on this point.  Learned strategies of problem solving picked up at home and in school play a critical role in how well one does on IQ tests but the effect of each is actually differentially reflected in test gains for different kinds of tests.

That is, respectively, the relevant required "learned content" for various test are as follows: (1) Wechsler tests -elementary academic skills; (2) military test batteries (such as the ASVAB) -simple arithmetic, word knowledge and paragraph comprehension picked up at the elementary or middle school level; and (3) SAT-Verbal -advanced academic skills gained from high school English courses (1987, p.189).  Thus as the assessment of young adults moves from tests of problem-solving ability with a "moderate reliance on elementary academic skills" (Wechsler tests), to one with "heavy reliance" on at least elementary academic skills (ASVAB), and then to one with heavy reliance on "advanced academic skills" (SAT), the overall cohort test scores (with respect to earlier norms) changed "from gains to no gains and from no gains to losses" (p. 189).  That is, claims Flynn, high school students in 1981 "did not have higher [general] intelligence than their counterparts in 1963, they merely had higher APSA" (p. 189).

The above is hardly earth-shattering news for those aquatinted with the historical details of both past psychometric test development and the actual utilization of the Wechsler and military tests in American society, yet it is indeed a testament to the pervasive reification of the unitary view of human intellect that Flynn even felt compelled to elaborate (at length) upon those points.  His (1987) disclaimer regarding the non-necessity of a "commitment to the unitary theory of intelligence" (p. 188) further indicates his demur from that problematic approach.

Once again, it must be emphasized that the most important aspects of Flynn's 1987 meta-research survey lay elsewhere.  That is, in the issues surrounding the observation that the largest tests score gains were not found in the forms of "crystallized" formal knowledge demanded by the so-called learned content aspects of the above named tests (i.e., the sorts of things that people accumulate throughout their lives: general knowledge, vocabulary, mathematical skill) but rather in other aspects of testing which have been characterized as "fluid intelligence" (i.e., which make the subjects demonstrate decontextualized problem-solving ability on the spot).  These latter kinds of tasks include both the performance subtests of the Wechsler scales (which were initially constructed so as to minimize the influence of differential educational attainment in test subjects), and also specially constructed so-called "culturally reduced" tests such as the Raven's Progressive Matrices; Norwegian matrices, Belgian Shapes test, Jenkins and Horn tests (all covered in Flynn, 1987).

In particular, the strongest data presented in Flynn's 1987 investigation concerns the historical rise in relative performance on the Raven's Progressive Matrices a test of abstract visual reasoning (first published in 1938 by Spearmen's student John C. Raven).  Formally thought to be a good indication of psychometric "g," Raven's matrices data was carefully collected in the Netherlands as part of its mandatory military induction process.  The Dutch case, showing a 20 point gain on Raven scores as measured in terms of 1952 norms (see figure 15b), is especially highlighted by Flynn but his vacillations regarding their meaning, disciplinary significance, and origin are highly instructive for all concerned with truly understanding human intellect.

It is here that Flynn exhibits a problematic confusion between mental evolutionary assessment (which entails comparisons of individual or group intellectual abilities -including generational differences therein); and cultural evolutionary assessment (which entails comparisons between human preliterate, illiterate, and literate individuals, groups, or societies -including the generational differences therein).  Put quite simply, the two sorts of analyses are nested but not identical.  Further, measurable progress in one (especially by way of the traditional individualistic psychometric tools) is not necessarily reflective or indicative of progress in the other.

The confusion becomes particularly manifest while he discusses the Raven test results; where Flynn seems to indicate that the intellectual abilities of Dutch inductees could not have been raised as much as the observed tests score suggest.  He points out, for instance, that test scores alone would suggest that 25% of the Dutch teenagers "qualify as gifted" because those with estimated IQs of 150 and above have increased by a factor of almost 60.  For him this meant that if these tests were truly measuring general intelligence the "result should be a cultural renaissance too great to be overlooked" (p.187).  But his search of Dutch education journals from the 1960s onward indicated no mention of a great increase intellectual achievements by newer generations.  For Flynn, this lack of predictive validity of test scores in the cultural realm of life-accomplishment indicated that the Raven (and probably other IQ tests including the Wechsler scales) do not measure intelligence but "merely" abstract problem solving ability.

As indicated earlier, lack of predictive validity for life-achievements has also been shown in the American context for the SAT, GRE, and other forms of generalized ability tests (including various vocational aptitude batteries).  But does this mean that great leaps in mental and cultural evolution have not occurred in the 20th century?  Or is it simply evidence of a weakness in the kinds of tests that traditional psychometrics has produced?  The answer to that question depends upon one's definition of human intellect itself.  More specifically, I would argue that the false disciplinary conundrum which Flynn's research has drawn out into the open is one produced by not only his initial generalized view of human intellect, but is also a result of his ongoing unequivocal commitment to the individualistic nature of human intellect (i.e., as something inside the head of an individual).

While Neisser (1997) would later apply metaphoric ice to the tender Achilles heal of Flynn's argumentation (i.e., psychometric "g"), the faltering limb within which that argumentation is located (i.e., the individualist definition of human intellect) remained untreated.  That is, in an attempt to surmount the logical contradictions between overall psychometric test increases and the real world contingencies of reasonable intellectual assessment, Neisser suggests the following "many" intelligences versus the "one" intellect argument:

"Flynn's argument that real intelligence cannot have gone up as much as scores on the Raven assumes that there is a...unitary quality of mind not unlike Spearman's g. Abandoning that assumption, we may think instead that different forms of intelligence are developed by different kinds of experience. The [differential test gain] paradox then disappears: We are indeed very much smarter than our grandparents where visual analysis is concerned, but not with respect to other aspects of intelligence" (Neisser, 1997, p. 447).

My point here is that without a careful, explicit, retirement of the individualist approach to intellectual assessment itself, the differential test gain paradox will not in fact "disappear" and ensuing disciplinary progress on testing issues are likely to be carried out through mere limps instead of leaps.  This having been said, it must also be emphasized that the most historically important aspect of Neisser's (1997) article itself is not his stand on the 'one versus many debate,' but rather how Neisser actually pushed causal analysis of the Flynn effect one step further by proposing a visual stimulation hypothesis as a means of linking the observed pattern of test score rises with 20th century technological advances.

Explaining the Flynn effect (visual stimulation and cultural evolution)

Starting with a description of Flynn's observations on differential test gains, Neisser (1997) puts forward an "increasing complexity of visual and technical environment" hypothesis to explain the pattern of such gains.  Perhaps the most striking 20th-century change in the human intellectual environment, he argued, has come from the increase in exposure to many types of visual media:

"From pictures on the wall to movies to television to video games to computers, each successive generation has been exposed to far richer optical displays than the one before.... Beyond merely looking at pictures we analyze them. Picture puzzles, mazes, exploded views and complex montages appear everywhere" (1997, p. 446).

Exposure to visual displays may have produced genuine increases in some forms of intelligence and perhaps the scores on tests like the Raven's Progressive Matrices have been increasing so fast because they measure these visual analysis capabilities.  The disciplinary implication of Neisser's hypothesis, while not stated outright by him should not be overlooked at this point: Far from being extraneous variables to be controlled out of experiments these changes in the technology of everyday life may have been at the very heart of the observed patterns of rising test scores.  While Neisser puts forward the visual stimulation hypothesis tentatively (including a disclaimer that "no direct [empirical] evidence" yet exists) to back it up (p. 447) the fact is that a considerable corpus of anthropological evidence, natural (i.e., historically serendipitous) empirical comparative evidence, and modern experimental evidence does exists to back up the hypothesis.

Natural experimental evidence

Foremost among the so-called natural experimental evidence is the serendipitous observations of A.R. Luria's Moscow-based group of researchers which found no visual illusions in illiterate Uzbekistan countryside villagers during the summers of 1931 and 1932 (Luria, 1976; 1979).  Drawing upon the transformative methodological position put forward in Luria & Vygotsky (1930/93) -described below- this research investigated the historically unique circumstances of the ongoing shift from individual peasant farming practices to collectivized farming.  Their particular focus was on establishing a descriptive outline of the observable pattern of changes to the predominant mental tools used by these villagers resulting from this technological shift. [193]

As the educational level of these peasant groups increased so did the appearance of visual perceptual illusions and distinctly modern forms of "conceptual" (i.e., abstract relational) thinking.  Luria argued that what they were observing in these outlying Soviet villages was a condensed form of the very historical shifts in mentality "over a brief period," which under ordinary circumstances had "required centuries" in other locations (Luria, 1976, p. 164).

Their procedural method of empirical enquiry is also highly instructive. [194]   By utilizing a probing but conversational method of leading questions (rather than simply applying standardized tests), Luria's researchers managed to assess the way these villagers understood and approached the world (Luria, 1976).  The data obtained supported their earlier position (based on anecdotal anthropological evidence) that the adoption of literacy in any preliterate culture resulted in a transformative effect on the speech, counting techniques, and modes of memory of those involved (Vygotsky & Luria, 1930).

It was found that the preliterate members of the Uzbekistan villages were still using the traditional primarily functional-descriptive reflection of reality (the very form of thinking which had been successful for their survival for centuries) and those with even a modicum of formal schooling were now using (to various degrees) the characteristically modern abstract-conceptual approach to reality.  Through basic education and on the job experience, modern society was truly rearming the minds of these villagers to deal contingencies of the technological (economic and political) shift from individual farming to collective farming practices.

Unfortunately, Luria's empirical research in this regard remained unknown in the West right up to the 1970s (Luria, 1976, 1979; Vygotsky, 1978) and Vygotsky & Luria's (1930) book-length survey account was not published in English until 1992 (soft cover) and 1993 (hard cover by a different translator).  Nevertheless, these sources are now readily available and drawing the attention to them is vital for our understanding of the transformative power of cultural evolution as reflected in the pattern of tests score gains.

Lest we be charged with cultural imperialism here (as were Luria & Vygotsky in their time), it should be pointed out that similar historical evidence regarding the comparative lack of intellectual and specifically visual sophistication (this time in German children) is provided by Raspe (1924).  In Raspe's study children's judgments regarding the causes of changing visual displays were recorded.  It was found that when coincidental events (such as the striking of a metronome) accompanied shifts in the visual display, they were typically judge by young children to have caused those shifts.

Both Raspe's main result and their eventual cultural historical interpretation would most certainly hold up today with respect to younger children.  The point of mentioning the study at this juncture, however, is that Raspe also found this 'rush to judgment' effect in children up to 10 years of age.  This is a result which would certainly not hold up today and is highly indicative of the kinds mental transformations which have subsequently accompanied the ubiquitous modern cultural shift from the use of predominantly print and radio media toward visual media (such as movies and television).

Contemporaneous (late-1920s) American research by the Iowa Child Welfare Institute into test performance of rural one-room school versus urban consolidated school children is also mentionable here because while no between group differences in basic performance test tasks (e.g., block manipulation, puzzle tasks) were found, there were marked differences observed in both more complex performance (symbol manipulation) tasks and in verbal subtests (Baldwin & Stecher, 1925; Baldwin et al., 1930).  This means that while the differences observed by Luria were truly of a different order (or quality) than those typical of the contemporary American 'rural versus urban' divides, there were still measurable quantitative score differences between these groups which fall directly in line with the general pattern of average national score gains (as later noted by Flynn).

Another early American study (as described by Greenfield, 1998) is informative with regard to the overall magnitude of rural score gains associated with the transition from one-room to consolidated schools between 1930-1940.  Wheeler (1932, 1970), that is, observed huge score gains in East Tennessee "hill children" during this important ten year interval.  In 1930, approximately 1,000 children in Grades 1 through 8 in 21 mountain schools were tested with the Dearborn 1A and IIC Intelligence Tests.  A subset of children in Grades 3 and 8 were also given the Illinois Intelligence Test and a representative sample of children from the same schools were subsequently tested in 1940.  Two thousand additional children from 19 other mountain schools were also tested in 1940.  Wheeler found that average IQs had rose 11 points across grades 1 through 8.  As Greenfield (1998), points out, this rate of change is about double that of the respective estimated Flynn effect rise in the rest of the country during the same period.

As mentioned earlier, not only improved teacher training but also school materials, school attendance, and community attitudes toward education were in transition during that period (see also Sherman & Key, 1932; Sherman & Henry, 1933; Edwards & Jones, 1938).  The Iowa group, for instance, suggested that their observed average score differences were indicative of both the respective urban versus rural availability of library resources and of existing differences in community-based attitudes toward schooling.  Similarly, Wheeler (1970), while noting both a rise of 32 percent in average daily attendance and a rise of 17 percent in enrollment between the 1930 and 1940 samples, also addressed the broader issue of how new road networks were allowing these Tennessee school children historically unprecedented access to urbanized centers and to library resources.

These historical-comparative studies are highly suggestive that: (1) a qualitative shift from the predominance of concrete perceptual thinking toward a predominance of abstract thought is associated with basic literacy (i.e., print media); and (2) that these shifts (in rural America at least) were not only followed by a graded series of quantitative growth of mental ability test performance (as measured by IQ tests -including complex symbol manipulation and verbal test gains) but also that this growth was associated with: urbanization, technological change (including the diffusion of print and radio media), and improvement in teacher training and attitudes toward formal education (indicative of the shift from one-room to consolidated schools).  That is, as the technological sophistication of their schooling and everyday lives increased so did their standardized test scores.

Experimental evidence on visual media

A tenfold increase in secondary education (up to 1930) and the post-W.W.II rise of television, and then of computer technologies were also to have differential impacts on both thinking patterns and on observable test score patterns.  In particular, given that school attendance and school equality issues had reached critical mass by the late-1960s, Flynn (1987) himself suggest that the continuing rise of test scores on visual reasoning tasks must be due to something else perhaps "television" (p.189).  This suggestion was then formalized in the Niesser's (1997) visual stimulation hypothesis which also briefly mentioned the use of computer technologies.

Flynn (1987) recognized that the commonplaceness of such between-generation cultural differences should not be used to downplay their importance because until their effects are identified in detail we can not in fact tell whether observed "between-groups cultural" differences in test performance are "dissimilar" (or by extension due to) the respective diffusion of "between-generations cultural" differences.  While I will argue shortly that clarity on this point is the distinctive gift of the Neo-Vygotskian transformative methodology to contemporary mental assessment issues, let us first look at some of the standard correlational-experimental evidence in this regard.

Patricia Greenfield's (1998) article called "The Cultural Evolution of IQ" briefly summarizes the available evidence that a post-1950s change in balance of use between print and visual media has had considerable, historically traceable, differential effects on test score gains.  Starting from the assumption that only an account which focuses on "cultural history" can explain the differential pattern of test score changes (p. 86), evidence including content analysis of television programs and cross-cultural research on the use of computer technology are presented along with a rough outline of their respective impact on verbal and performance test scores.

The basic argument (as presented in Greenfield, 1984; Greenfield et al., 1994a, 1994b) is that both television watching and use of computer software have provided specific forms of "cognitive socialization" and that these differ from that provided by the former era of mere print and radio media.  Greenfield's (1998) efforts to parse out the differential effects of these technologies with regard to the observed pattern of test score gains were concurrently given further weight by Flynn's own contribution to the same anthology, which suggested an acceleration of test score gains (in the magnitude of half a point per year) between 1972-1989.  This spurt in the Flynn effect, that is, coincided with the successive ascendance of television and computer technologies as near universal aspects of everyday life in North America.

Film, television, computer spreadsheets, and computer games all favor iconic representations over symbolic representation.  That is, they tend to favor image over word (Greenfield, 1998, p. 99).  Experience with the iconicity of television media as part of everyday life certainly coincided with an initial (late 1950s-early 1960s) rise on verbal Wechsler verbal scores (see Flynn, 1984) but was then followed by a decline in measurable vocabulary performance between 1974-1990 (Glenn, 1994).  The significance of this correlational pattern is not clear, however, until it is recognized that viewing of television network or talk-show material provides exposure to verbal content at roughly the grade 4 level of vocabulary (the very level which Flynn had argued was necessary for success on Wechsler and Standford-Binet tests standards).  Greenfield dutifully references both content analyses and review articles on television viewing (including Beentjes & Van der Voort, 1988; Healy, 1990; Glenn, 1994) in this regard but much more evidence (especially with respect to the differential content of network versus Public Broadcasting System station viewing) is still needed. [195]

Indeed for the correctly oriented reader, Greenfield's eventual (1998) contrast between contextualized language (in print and radio media) and the subsequent historical rise in decontextualized language used in television media (accompanied by a decline in reading of print) is especially informative because it exemplifies the ongoing need for a distinct set of analytical concepts to track the content aspects of the mechanisms responsible for the observed test score changes.  In short, the logic of the implied methodology used by Greenfield's mobilization of experimental evidence treats score gains not as a matter of the interaction between genes and environment but rather as a matter of cultural diffusion of the technology of thinking.  Bearing this implied methodological distinction in mind helps us better understand the implications of the experimental data presented in her work.

With regard to the content aspects (of what is being provided by each form of media), Greenfield proceeds toward providing evidence of her conclusion that: "If modern computer technology is making people more iconic in their style of representation, it follows ...that people will do better on nonverbal IQ tests" (pp. 102-103).  She initially references Okagaki & Frensch (1994) study investigating the effects of a computer game called Tetris on mental manipulation of spatial imagery tasks similar to those used in nonverbal IQ tests.  That study demonstrated that 6 hours of playing Tetris (a dynamic spatial puzzle game) enhanced performance on several paper-and-pencil tests (similar to both the object Assembly subtests of the Wechsler and to the Block design tests of the Stanford-Binet).  Thus Greenfield argues: "The effect of Tetris play on the mental manipulation of spatial imagery indicates, on a theoretical level, that external forms of representation stimulate internal forms of representation" (pp. 93-94).

She supports this suggestion with references to her own collaborative experimental research.  Firstly, Greenfield et al. (1994a) indicated that expertise on the computer game The Empire Strikes Back was positively correlated with performance on Stanford Binet paper-folding tasks.  Secondly, Greenfield et al. (1994b) presented a study of university students (in Los Angeles versus Rome) in which both a group/cross-cultural difference (regarding their predominant initial reporting style), and an important experimental treatment difference (regarding a shift in representational style from verbal to iconic) were found. [196]

The cross-cultural difference on the initial pre-treatment reports was that students from Rome used "predominantly symbolic" representations whereas students from Los Angeles used a predominantly iconic reporting style (1998, pp. 104-105).  For Greenfield, the cause of this cross-cultural difference lay in the "greater diffusion in the United States, compared with Italy (in the late 1980s), of all the electronic media that feature iconic imagery" (p. 105).  Similarly, the important experimental treatment effect noted was that post-treatment "communication" about the animated simulation became more iconic (and less symbolic) after participants played the computerized version of Concentration but not after playing the same game on a physical board.  Greenfield, et al. (1984b) had thus demonstrated that even brief exposure to a computer game could produce a shift the initially predominant mode of answering.

Greenfield (1998) then moves on to the argument that (given that the largest historical gains in test scores has been on performance IQ), traditional terms such as "culturally reduced" used to describe certain tests are complete "misnomers" (p. 106).  In particular, tests which rely on the use of matrices (including the Raven and various Wechsler and Standford Binet measures) have actually been shown to be more sensitive to changes in historical tools of thought than standardized tests of school-based knowledge.

The comprehension and use of matrices, she suggests, is taught in a particular learning environment the post-primary classroom.  In order to carry out Raven Progressive Matrices tasks efficiently and to utilize a computer spreadsheet (such as Microsoft Excel) -which is essentially a blank matrix- one needs to know that a matrix is organized in rows and columns.  Such tasks, therefore, presuppose much more than mere acquaintance with simple shapes -as was formerly argued by test producers (see Carpenter et al., 1990).  By way of referring to a 1956 African study (covered in Wober, 1975) which found a divergence on the Raven between schooled and unschooled persons beginning at age 12-13, she then concludes: "This finding could be extremely relevant to the Flynn effect for the particular period in different countries when post primary education was greatly expanded" (p. 109).

It should be mentioned here that Greenfield's opinion regarding the cultural specificity of matrices tasks is in complete concurrence with the related results of Luria's Uzbekistan research.  The Uzbeki peasants, had they been given such matrices tasks, would not have known what to make of them.  As was the case with other visual and conceptual abstraction tasks (which were in fact given), they would have tended to concretize these questions.  That is, they would have simply said that "these are tiles on the floor and one is missing."  When asked what the missing tile might look like, they would have replied: "I don't know, I have never seen it."  Their unwillingness or inability to address the kinds of abstract problems commonly presented in modern standardized tests is elaborated in considerable detail in Luria's Cognitive Development: Its cultural and social foundation (1976).

While Greenfield mentions cross-cultural data in this regard, the historical advent of such skills is severely glossed over in her account.  The ontogenetic aspects of such intellectual transformations (i.e., the process of the development of the required skills to perform matrices tasks and other forms of "abstract problem-solving ability") is also largely absent.  Yet is this not the very essence of the explanatory account that a truly "cultural evolutionary" approach must provide?

Section Three:

Toward an explicit Neo-Vygotskian Methodology

It seems North American psychologists are finally beginning to recognize that changes in test score patterns are due to the huge historical transformations of the standard technologically guided tools of thought used in modern (and postmodern) society.  But recognition of this correlation alone is merely a necessary but not sufficient condition for detailing the needed shift in content and procedural standards of contemporary intellectual assessment.  Such a recognition must be accompanied by an explicitly stated and a viable methodological position to guide further empirical investigation of these transformations and transitions.  Yet it is in this very area of openly addressing the methodological implications (for carrying out future research or testing) that the successive North American anthologies and research summaries have been lacking.

For example, while the mobilization of cultural evolution as a potentially explanatory concept alone is to be applauded, it must be pointed out that Greenfield's (1998) article was also historically remiss (as was the whole anthology in which it was contained) in the notable absence of any reference to, coverage of, or purchase on the theoretically convergent works of Vygotsky and Luria.  This is a shame because the founding works of the cultural-historical (a.k.a. Neo-Vygotskian) movement in psychology have set down both valuable methodological guidelines and produced a set of viable analytical theoretical concepts which must be understood by anyone who might wish to describe the reasons for and disciplinary implications of the Flynn effect.

Vygotsky & Luria (1930/93)

First of all, Vygotsky & Luria (1930/93) made the important methodological point that the most appropriate starting point for data collection with regard to human beings is not merely one of assessing individual or even social adjustment to environmental niche (as was argued up front by Greenfield) but rather one of assessing the typical developmental pattern of ability to use culturally and historically provided tools of intellectual analysis.  In doing so, they provided an important historical exemplar of transformative methodology to mental assessment. [197]

Concerned with carrying out a survey of the available scientific evidence in order to tease out the continuities and discontinuities between the mental evolution of "ape, primitive, and child," the first chapter of their 1930 work started with careful consideration of experimental evidence regarding so-called tool use in chimps by Kohler (1926).  Apes use tools as a natural extension of reaching in the forest when some obstacle, delay, or physical barrier presents itself (e.g., fruit just out of reach).  Primate tool use in experimental situations (such as Kohler's) showed that they can extend and transfer their use of tools beyond the specific situation in which such use was first discovered (e.g., from single to multi-section sticks; to use of ropes or jump poles; to box stacking).

Yet so-called 'tool use' (or rather the use of available implements such as rocks or sticks), for primates, never becomes their predominant means of achieving their individual or shared goals.  Nor are the implements used actually reflected by the apes as tools.  Instead, they are reflected as extensions of the arms, legs, or teeth (i.e., as a relatively fleeting intermediary step in the ongoing fulfillment of immediate biological or social needs).  We know this because such implements are disgarded instead of retained by the ape after their uses (see Leontiev, 1981).

What exists as a rudimentary form in the ape constitutes an outstanding characteristic in human beings.  Use of implements and then tools as such by groups of upright hominids was not only the prerequisite of later culturally embedded cooperative labor activity, our very existence (as a species) is also the consequence of it.  Thus, while recognizing the ape's use of implements (in the absence of labor) brings us closer to understanding the upper reaches of their mentality, it also highlights the qualitative difference between them and us.  In short, Luria & Vygotsky (1930) argue that the dividing line between the mental development of the ape and that of the older human child is the appropriation of cultural evolutionary tools of thought.  Again, Leontiev would later clear up this distinction by pointing out that use of implements (by the younger human child) and then of tools as such (by the older human child) can be described as a stretching and then a transformation from the predominance of two-phase actions toward two-phase activities (i.e., doing one thing to get another in the strictly biological or social realms versus the cultural realm as well).

Utilizing both late-19th century and early-20th century anthropological sources (including Boas, 1916; Levi-Bruhl, 1923, 1926), the second chapter of Luria & Vygotsky's work then presents a compelling anecdotal argument that with changes to the predominant life-technology of a given society or historical era (e.g., with the introduction of literacy, invention of numerical systems, and/or a written system of memory in a given society) the very structural and functional pattern of mental processes predominantly used in that society also changed.  "Primitive" modes of thought are not inferior forms of our own kind of thinking but qualitatively different from them including their own internal kind of logic.  Further, they are not strictly replaced by modern forms but rather displaced in terms of their predominance of use.  Here too, the so-called Baldwin effect (i.e., the 'gain something lose something effect'), better known in prior eras of psychological thought than at present (cf. Haddon, 1901/1971), is mentioned to emphasize this point.

In the historical advent of modern memory, the predominance of retentive memory (reliance on perceptual complexes) used extensively in preliterate cultures to keep track of livestock and food stores historically gave way to mnemotechnics (use of knots and tally sticks) and these were in turn displaced by externalized mediated memory (writing system, books, libraries).  Each of these qualitative shifts expanded the quantity and flexibility of human memory in a way that was superior to what had gone before but transitions through the lower forms are to be understood as an intimate part of their development.

The implication is that in contradiction to the standard North American psychological textbook fixation on the so-called universal modern bounds of retentive memory capacity (i.e., of a digit span of "7 plus or minus 2"), the de facto overall historical expansion of human memory capabilities can not be denied as long as 'that which is written down' is recognized as a distinctive part of modern human memory.  In the same light, numerical complexes (i.e., reckoning without numbers per se), gave way to tally systems and then to formal numerical systems and machine mediated means of computation (including calculators, and computers).  The ability to utilize these culturally provided tools of computation and their relative predominance in given societies (or subcultures within a given society) are surely one of the characteristics which postmodern ability testing must address. [198]

Thus, what comes before (in terms of their socio-historical emergence) is not lost per se but rather nested into a higher forms of modernized functioning.  Likewise, the task of understanding the development of these higher forms (both historically and ontogenetically) is one of tracking the pattern of their initial emergence and predominance of use over time.

Chapter 3 of Luria & Vygotsky (1930), written by Luria, opens with a reiteration that the psychology of adult cultural man is a result of three lines of mental development: "biological evolution..., historical cultural evolution..., and individual [ontogenetic] development" (p. 140).  With respect to child development, if the use of societally provided tools to mediate (i.e., instantiate knowledge for later reference) has been an ongoing theme in the qualitative emergence and quantitative development of the modern human mind, then an observable pattern of transitional forms should be recapitulated in the mental development of each child.  The task of intellectual assessment of children is to ascertain (and measure) the various guideposts of this passage from primitive (natural and social) forms of attention and memory toward complex culturally mediated ones.

Luria's weak (i.e., non-biogenic) brand of mental recapitualationism is informed here by a dialectical materialist understanding of culture as a transformative (rather than simply additive or interactive) force on the development of human mentality.  The mind of the child "not only matures, but also becomes rearmed"....as he tranforms into a cultural adult (p. 168).  Luria is careful to point out, therefore, that the characteristic internalization of cultural "contents and mechanism" of human mentality is in no way "similar to putting on a new dress" (pp. 170-171).  That is, culture is not simply added to the individual and social levels of mentality but produces "deep transformations" of the content and mechanisms of these earlier forms of mentality as they are used in everyday life.

During the course of exemplifying this viewpoint, Luria initially refers to various available contemporary sources (including Stern, 1924; Piaget, 1928) to indicate that the young child's logic is based on qualitatively different principles than those of the adult (pp.142-168).  To this end, early childhood confusion between cause and effect; their lack of distinction between reality and fantasy; their lack of ability to think in terms of relativity (i.e., that 'one and the same object can be on the right in relation to one thing and on the left in relation to another'); and their inability to recognize the presence of contradictory beliefs (i.e., holding simultaneous beliefs which defy the laws of formal logic) are all covered. [199]

The highlight of the chapter, however, comes when Luria's (and A.N. Leontiev's) own ontogenetic experiments, designed to specifically trace the transformative development of cultural techniques of memory from childhood forms through to full-fledged adult forms, are presented.  Here, Luria provides a levels of analysis account by distinguishing so-called "natural" (retentive) memory in the individual's head, from both: (1) transitional social-cultural "tally-like" systems (which utilize auxiliary external memory aids); and (2) full-fledged "culturally mediated" (external) forms of human memory.

In Luria's experiment, preschool children were presented with the task of memorizing a list of numbers.  Once they recognized the difficulty of this task, they were provided with a set of materials (e.g., buck shot, paper, pieces of rope) and informed that these materials might help them remember more numbers if they were used properly.  By way of this guiding activity of the experimenter, preschool children tended to create various tally systems (e.g., tearing paper pieces, or piling kernels of grain) to represent each number to be recalled.

Interestingly, Luria found that when school-age children (who had already learned a number writing system) are presented with the same experimental situation do not revert to the tally form of representation, but rather tend to tare out pieces of in the visual shape of numbers (thus eliminating the necessity to count up tallies).  The tendency to reproduce the shapes of numbers proves very strong.  These latter, distinctly culturally mediated forms of memory do not (in the strict sense) reside in the head of an individual.  They are not an improvement of natural retentive memory but rather an example of the pervasive pattern of substituting culturally provided auxiliary tools which, in effect, increase the power of human memory many times. [200]

The fundamental theoretical implication with regard to intellectual assessment itself must not be understated.  That is, instead of an implicit theory of intelligence (which remains constant across historical and cultural contexts) we need to move toward an explicit theory of intelligence that can account for the varying pattern of transformation of human intellect across historical, cultural, and political eras.  After all when intellectual ability among humans differing in historical era or contemporaneous cultural experience is defined by mere quantity (as was done routinely in the pre-Flynn era), analysis of developmental aspects of intelligence (and especially of distinctly human patterns of transformations) are ruled out a priori.  Yet isn't the awareness of this pattern of developmental transformation the very essence of explanation?  Clearly it is.

Taking the above lines of enquiry together, we are now much closer to explaining the origins and disciplinary implications of the Flynn effect than we were just a few years ago.  While I don't seriously expect that a "prize" for figuring all this out is actually to come my way, I do hope the point has been made that the salvation (or rather redemption) of the testing subdiscipline itself lies in the explicit recognition of cultural evolution and its extra-individual unit of analysis.

Impact and Potential of Neo-Vygotskian approaches to ability testing

Although the task of tracing the achieved professional inroads and potential of Neo-Vygotskian attempts at ability testing deserves a separate book-length treatment in its own right, some brief statements are at least in order.  Stated briefly, the current strengths of the contemporary Neo-Vygotskian approach to ability testing are that: (1) its extra-individual unit of analysis is already promoting a wider definition of human (and animal) intellect; and (2) its recognition of the so-called Zone of Proximal Development is already being used (by some) to guide the empirical procedures of intellectual assessment (toward the goal of pulling out the best possible performance from each examinee).  The potential strength of the approach, however, namely the recognition of societally mediated processes as being vital for both (1) our contentual portrayal or procedural measurement of specifically human intellectual transformations; and for (2) informing public policy (regarding the fairness of existing tests or the formation of new transformative test standards), is just now beginning to take shape.

Based on developments in other subdisciplines (such as developmental and neuropsychology) it can be argued that only when both sets of strengths are explicitly realized will it become clear that a transformative approach is the methodological basis for a serious and communal act of theoretical housekeeping regarding the content and procedural aspects of 21st century ability testing. [201]

In an effort to provide such clarification, a few examples of how Neo-Vygotskian approaches to ability testing have already been utilized are given below and their present limitations are also noted.  Finally, lest we become complacent about the subdisciplnary gains thus far, it is pointed out that an adequate description of social and societal transformations of mentality relies on an appeal to a conceptual lexicon which is itself more properly termed post-Vygotskian.  A programmatic "Table" of this lexicon called the "transformative structure of animal and human intellect" (as I see it) is provided and the implications for future testing (as I see them), are then offered up for posterity.

Current strengths:

An extra-individual unit of analysis and the ZPD

While the standard interactionist approaches to ability testing have been relatively weak in their coverage of the content aspects of human intellect, Neo-Vygotskians have intentionally borrowed heavily from both North American developmental psychology and from European Vygotskian continuity/discontinuity approaches to human mental development in order to do just that.  Since mental development is understood as an increasing "spiral" successively encompassing the individual, social, and societal realms of meaning (see Wertsch, 1985a, 1985b), they seek to describe and measure the timing and mechanisms of these developmental transformations (see fig 60a).

Procedural gains and subdisciplinary penetration

As indicated above, the recognition of the historicity of human intellect has achieved considerable disciplinary cache by way of the Flynn effect.  A call for expanding the definition and procedural standards of ability testing itself has also been communicated by interdisciplinary anthologies and subdisciplinary internal historians of intellectual assessment.  Thorndike & Lohman (1990), for instance, suggested that a solution to the woes of the interactionist approach lies in a "Vygotskian approach" to testing procedures which treats individualized analysis of crystallized intelligence as a merely descriptive starting point for further (extra-individual) analysis of abilities.

In particular, they call attention to Brown & Ferrara's article Diagnosing zones of proximal development (1985) which starts by citing one of the most methodologically significant passages from Vygotsky's Mind in Society (1978).  In this important passage, Vygotsky indicated that obtaining children's mental age by way of a standardized test is merely a starting place.  Since the capability of children (with the same measured crystallized development) to learn under a teacher's guidance varies (i.e., their Zone of Proximal Development varies) the subsequent course of their respective learning would obviously be different. [202]

Vygotsky's point was not merely one of psychometric procedure (i.e., empirical method).  Rather it was one of both content and of providing an investigatory and predictive procedural methodology (Kozulin, 1990).  Contrary to the socio-Darwinist and interactionist accounts of the past, Vygotsky's (1978) account recognizes that the child's intelligence is not just an individual affair occurring inside the head.  Rather, it is made up of a constant exchange of control from the outer interpersonal realm to the inner personal realm of mentality.  In other words, the traditional individualized unit of analysis (i.e., the already attained static "crystallized" knowledge inside the examinee's head) must be augmented or replaced with systematic analysis of extra-individual unit of analysis (see fig. 60b).


Figure 60a &b Spiral mental development and the Neo-Vygotskian approach. The upper panel depicts an older (pre-IQ controversy) understanding of mental development as an increasing spiral, successively encompassing the individual, social, and societal realms of meaning (From Lindgren, 1956 after Arnold Gesell's 1925-29 "mental growth" cycle). The lower photo shows a (pre-computer revolution) cooperative game situation in which the development of "coordinated interdependent actions" is vital for success in this "pulleys and strings" task (From Doise & Mugny, 1985).

As Vygotsky pointed out, IQ estimates and other existing standardized tests (of academic achievement) can be used as a procedural starting point (i.e., as a mental marker of rather passive already crystallized individual knowledge) but the data they provide are at a loss to address ongoing development (i.e., the active utilization of social and societal tools or resources) which both children and adults routinely use in their way about the world (see Brown & Palincsar, 1989; Rogoff & Morelli, 1989; Rogoff, 1990). This is not to say that testing procedures and ability tests per se could not be constructed to address these issues. To some extent, this is just what the Neo-Vygotskian brand of North American ability testers (e.g., Flanagan, 1997) are now attempting to do by way of establishing "test protocols" in which testers lead the mental activity of examinees in order to test the limits of current and ongoing intellectual development.

It is encouraging to note that the more enlightened intellectual assessment textbooks are now insisting that a "broader measurement of intelligence" is needed to narrow the gap between contemporary testing and past "cognitive" science.   For instance, Flanagan, et al., (1997) suggest that we need to "either augment the Wechsler batteries to include measures of important abilities and processes that are not currently assessed...or to use alternative instruments and approaches to assessing intelligence" (p. viii).  At the very least, alternate methods which describe, measure, and predict on the basis of the ZPD (i.e., the difference between what the examinee can do on their own and what they can do with guidance) can be a valuable addition to contemporary individualized intellectual assessment technologies.

This extra-individual focus of the classical Vygotskian approach has been picked up by some of the most recent test batteries reviewed in Flanagan et al.'s  Contemporary Intellectual Assessment (1997).  The best of these contributions have laid out a set of clinical protocols for investigations in the Zone of Proximal Development of children and adult patient populations (see the chapters by Naglieri; Lidz; and Feruerstein, et al.; In Flanagan, 1997).  By openly aiming to skillfully draw out the "best possible performance" from each examinee, the proponents of dynamic extra-individual assessment are actively rejecting a long tradition in America that has treated such leading activity as unacceptable error variance. [203]

It must be pointed out, however, that the subdisciplinary penetration of these innovations has not (as yet) been all that great.  Many of the standard assessment textbooks (e.g., Gregory, 1992, 1996; Tallent, 1992; Kaplan & Saccuzzo, 1997; or Hood & Johnson, 1997) still make no mention of either classical or Neo-Vygotskian approaches.  Even the ones which do (including Flanagan's) contain their own limitations.  These limitations, however, will only become clear once we discuss further the potential and argumentative strengths of the transformative approach per se.  In short, to date, only part of the implications of the Neo-Vygotskian approach (i.e., the extra-individual unit of analysis) has taken hold within the subdiscipline and a fuller remediation of contemporary standards in fact requires a further infusion of specifically post-Vygotskian views.

In as much as they have adopted only part of the Neo-Vygoskian message, the above mentioned test protocol adjustments are as much a return to the Binet-style of mental orthopedics as anything else.  While these efforts are certainly one small step toward ensuring that testing is always used in the interests of the examinees tested, they are merely indicative of how profound the subdisciplinary shift in analytical concepts, testing methods, and motives for testing will be if we proceed further along the path of transformative analysis.

Explanatory Potential and Argumentative Strengths: Societal transformation and democratically informed opinion

Although the adoption of a larger unit of analysis is an exciting and much awaited subdisciplinary development, there remains another more progressive aspect of Vygotsky & Luria's (1930/1993) transformative approach which has not as yet been brought into mainstream ability testing practice: The recognition that the proverbial seven year old is in the midst of a ontogenetic transformation from social to societal emphasis in their intellect.  Their mentality is in the midst of the transformation from predominantly "biological" and "socially mediated" intelligence (as described by Doise, & Mugny, 1984; Goody, 1995) toward a new, higher form, of "societal intelligence" (as described by Vygotsky & Luria, 1930/1993; Leontiev, 1978, 1981; Rogoff, 1990).

That latter form of characteristically human intellect utilizes culturally specific, externalized tools (e.g., drawings, maps, writing, number systems, computers) to transform a child's individual and socially crystallized intelligence to a new collective, externally instantiated, "societal" realm of meaning.  It is precisely the recognition of the functional shift of emphasis in extra-individual intellectual relations from the social to the societal realm, which distinguishes the transformative analysis of human intellect from the more familiar (and additive) interactionist approach to mentality.

Acknowledging the historical embeddedness of the typical pattern of transformation of human mentality (which is created by way of culturally provided tools), in particular, has special potential importance when we attempt to design or implement ameliorative educational programs; assess current standards of university entrance examinations; or produce equitable standards for workplace selection or retraining initiatives.  It should be emphasized, however, that in elaborating the details of the potential application of this second aspect of the transformative (cultural-historical) approach we are encroaching upon an area of theorizing that is more properly termed post-Vygotskian (Leontiev, 1981; Cole, 1985, 1996; Wertsch, 1985a, 1985b, 1986; 1998).

Transformative structure of Animal and Human intellect

Figure 61 is presented as an indication of the considerable conceptual reconstruction necessary to put the content aspects of the subdiscipline back in line with the attained historical empirical data.  Like the older diagrams formerly used to describe the structural and functional content of intelligence, this 'new' table contains both vertical and horizontal aspects.  It presents these aspects, however, without appeal to the problematic earmarks of the past century of theorizing.  Namely, without appeal to the pure continuity view of mental evolution (Darwin, 1859, 1871, 1872; Romanes, 1882, 1888), the additive mental ladder (Sandiford, 1938), or the interactionist's rectangular mental capacity metaphor (Engle, 1945 through to Zigler, 1986).  In other words, while given species, individual organisms, or human beings may be placed along this hierarchy of intellect (depending on their ontogenetic, social, and societal maturity), the lower levels are still retained (nested) within the higher developmental levels -as are the mechanisms of their original transformation (see fig. 61).

Figure 61 Transformative Levels of Animal and Human Mentality. The vertical aspect is to be read as a description of the typical pattern of transformations from lower to higher levels of mentality. Note, however, that the overall pattern described here is circular, self-looping, and transformative. It is not strictly upwardly linear or additive but integrated as well. The left panel called STRUCTURAL/ ONTOLOGICAL ASPECTS (a.k.a. "What" is being done), describes various Levels of Mentality and Levels of Learning (a.k.a. means of environmental reciprocity). In the right panel called FUNCTIONAL/ ACTIVITY ASPECTS (a.k.a. "How" and "Why" it is being done), the "Highest functional attainment" (including their *Means of Transformation) are covered as well as the various Motivational Levels needed to explain the "Why" of what is being done.

A.N. Leontiev (an important student of Luria) devoted a whole book-length elaboration of the analytical categories running throughout figure 61.  While my initial youthful hopes (in 1986) that the transformative approach (set down in considerable detail in Leontiev's Problems of Development of the Mind, 1981) would soon be a adopted in animal psychology (and elsewhere) have been dashed by the realization that such disciplinary change tends to take place over decades, there still remains some grounds for cautious optimism.

First of all, the very act of reifying this approach in a single table for the future reference of others may be helpful in elevating its disciplinary profile.  It is a shame that Leontiev did not do so because this act of externalization may very well have helped cut through contemporaneous North American reticence to engage anything that was deemed to be of Soviet origin.  The official Cold War, however, ended in 1989 (with the fall of the Berlin wall) and it is now time to engage Leontiev's transformative "activity theory" approach openly and unapologetically.  My motive for providing the table here is to indicate (though not prove) that this is a conceptual lexicon which (1) all well-meaning contemporary researchers can live with; and (2) may actually promote efficient communication of the experimental results of research between those with specialties in different aspects of mental assessment or ability testing.

A secondary (though related) ground for optimism is that there has been a progressive opening up of the purviews of comparative animal psychology over the last two decades.  In short, Leontiev (who originally worked out his approach in the 1950s) was "thinking big" at a time when American comparative research (in biology and psychology) was restricted -in part through academic governance (see Holmes 1989 regarding the case of Alex Novikoff)- to limited operationally defined highly circumscribed purviews.  Leontiev's book was then published in English (1981) at a time when disciplinary fractionation of the field of comparative mentality was highly acute indeed.  Times, however, have changed.

For instance, a number of books and articles which touch on or deal outright with the topics of emergent evolution or integrative levels theory (a la Morgan, 1923/31 or Novikoff, 1945 respectively) have now been published. [204]   More specifically, Ethel Tobach's Historical Perspectives and the International Status of Comparative Psychology (1987) included an important article by Charles Tolman on "The comparative psychology of A.N. Leontyev" which can serve as an initial reference point for those with interest in pursuing the matter (see Tolman, 1987b).

This is not to say, however, that the potentially explanatory gains of a transformative approach to mentality have as yet been brought into mainstream comparative animal textbooks.  In fact, it is arguable that the idols of the market place (including product appeal and political correctness) are still alive and well in such textbooks. [205]   The divergence between the concepts provided in figure 61 and those typically used in contemporary disciplinary and subdisciplinary textbooks is highly indicative of the need for a concerted (well funded) and communal effort to rethink the content aspects of both animal and human mentality.  Indeed, I had to look long and hard for comparative texts which even approached the kind of analysis necessary for such subdisciplinary progress.

For example, with regard to the upper-middle ranges of the table, the articles contained in Goody's (1995) edited volume Social Intelligence and Interaction, nicely represent the ongoing interdisciplinary struggle to carry out an "augmented" interactionist approach.  On the positive side, they view intelligence as something more than the possession of a particular individual.  They also rightly point out that the social aspects of intelligence apply to the typical developmental pattern of both ape and human intellect.  Unfortunately, all too often the articles in the volume conflate ape-ape relations with human-ape and human-human relations.  This is well exemplified in Goody's own article which begins by characterizing Homo Sapiens as "the clever hominid: primate social intelligence plus language" (see Goody, 1995, p. 206).  This additive view dates back to that of Romanes (1888) and is a mere methodological half-step.

A somewhat better job is done by the explicitly Neo-Vygotskian works of Doise & Mugny (1984) and of Rogoff (1990) which (respectively) take both an extra-individual unit of analysis as a starting point and describe normal intellectual development of human beings as a culturally specific "apprenticeship" in thinking.  For example, according to this transformative view, the children depicted in figure 60b are using social cooperation and joint actions toward their shared goal of winning a prize for completing the pulley task (as can apes given the right kind of task) but the children are also utilizing higher levels of mentality (i.e., societally appropriated language tools) to do so.  At the same time, however, they all make many false starts and mistakes along the way.  Only when the so-called practice-effect (i.e., the downward movement of intentional individual actions become more automatic operations for each child -in part by way of their guiding of each other's actions) can the group task (i.e., the joint activity) be completed smoothly.

A third ground for optimism comes from the increasingly interdisciplinary field of cultural anthropology.  Whether it is the recognition of the preadaptive (and ongoing) role of social-linguistic gossip in the development of culture (Dunbar, 1996; Dunbar et al. 1999); or Sue Blackmore's, (1999) popularization of the concept of "Memetic" cultural evolution (i.e., the passing down of social gestures and societal technologies); or the more ontogenetically specialized work of Whiten concerning the emergence (around age four) of the "attribution of false belief" (Byrne & Whiten, 1988; Whiten, 1991; Whiten & Byrne, 1997); it should be apparent they are opening up new vistas of explanatory power not currently addressed by standard interactionist ability testing discourse.

In short, these cultural anthropology sources are attempting to do what Gould himself did not; shed light on the social-cultural aspects of the "mind's big bang" -starting 50,000 years ago up to the present (see Heminway, 2001).  The question we must ask though is: Would not the most progressive aspects of these currently heterogeneous works be better served, emphasized, or drawn together by the adoption of an explicitly transformative methodology?  I believe the answer is yes.

We must now wind-down this chapter in the hope that the possibility of a transformative (integrative levels) approach to intellect -gained by combining both (1) Flynn's recognition of the historicity of intelligence test performance and (2) Neisser's abandonment of the long postulated psychometric "g" with (3) the Neo/post-Vygotskian tradition (which suggests that social and societal intellect do not reside in the head of an individual)- has become somewhat more apparent to the reader.  In other words, whether we are assessing the broader development of animal or human mentality or whether we are concerned by more specific topics (such as the rationale for using a given test to assess the recovery potential of a given patient from frontal lobe damage) we must always consider how the continuities and discontinuities of organic, mental, and cultural evolution might be manifested in the particular subject matter under investigation.

Conclusions and prescriptions for remedial action:

Toward a 21st century transformative approach to ability testing.

I opened this work with the argument that in order to move beyond both 19th and 20th century accounts of mental ability, early-21st century psychology must explicitly adopt an historically responsible, transformative, approach to the mentality of ape, primitive, child, and adult.  By historically responsible I meant simply that the assumptions and empirical tools used by the two predominant assessment methodologies (mental Darwinism and interactionism) must be recognized as part of (and partial to) a previous era's cultural ethos -one that is completely incompatible with the observable realities of contemporary society.

The above opening argument (I believe), has been supported by the historical details provided thus far, but in the new era in which the 20th century interactionist rectangular capacity metaphor has been set aside, new historically responsible guidelines for the interpretation and construction of ability tests must also be worked out, officially adopted, and implemented. The clarion message here is that the task of bringing these guidelines forward into fruition requires both active individual efforts to exercise departmental persuasion and communal-professional participation in both political lobbying and ongoing legal actions.

For it is our collective cultural evolution that has stood the former mental and socioeconomic bell curve on its head. The transformative approach to mental evolution explains how this was done and the recognition of this fact necessitates that we all act to stamp out the vestiges of Mental Darwinism and interactionism in our professional conduct.

Indeed, as the past history of ability testing has shown, subdisciplinary change has usually come about as a result of either external (legal or governmental) compulsion or through internal recognition of a decrease in market share.  Rolling boycotts (of textbooks published by test industry providers); departmental affirmative action (with regard to the hiring, research granting, or professional advancement) to favor those with a firm purchase on the transformative approach; and Class Action lawsuits against both states (which utilize off-the-shelf public school assessment measures) and companies or governmental bodies (which do not provide equal access to the materials used to assess applicants on their vocational intake tests) can all be effective over time.

The key to remaining motivated in the struggle to come lays in the recognition that the testing industry is a business and that as such it relies heavily on retaining a customer base.  Test producers have always had to 'adjust the mix' of their content claims and procedural practices in order to present their tests as useful to those who might utilize them.  When one or other of these two aspects of testing are adequately called into question the test (or testing program) is eventually abandoned for another form of testing.  It is in this historical fact that I place my faith in an eventual interdisciplinary "united front" against the mistakes of past and present testing.


[172] “Malaise” is defined as : (1) a feeling of general discomfort; (2) of being below one’s normal standard of health; (3) a feeling of being emotionally ill at ease or apprehensive (Webster’s Encyclopedic Dictionary, 1988).

[173] These provisions include: The Civil Rights Bill (1964); The Elementary and High School Education Act (1965); Title IX Education Amendments (1972) which prohibited federal grants to schools or programs that discriminated on the basis of gender; The Education for all Handicapped Children Act of (1975); and even the proposed Equal Rights Amendment (slated for ratification in 1982).  The great historical irony here was with all of these hard won laws and rights in place a devastating conservative backlash followed.

[174] Testing booms have always been good for some and bad for others in the discipline of psychology.  The latest one was initially good for those who concentrated their professional efforts in the areas of statistical reliability and predictive validity of tests.  It has also continued to be profitable for the major suppliers of off-the-shelf standardized tests including: Harcourt Brace's subsidiary Psychological Corporation (Stanford Achievement Test, Metropolitan Achievement Tests, Wechsler Intelligence Scales); Houghton Mifflin (Iowa Test of Basic Skills, Stanford-Binet); McGraw-Hill's California Test Bureau (California Achievement Tests, Comprehensive Test of Basic Skills, National Educational Development Test); and the ETS (SAT, GRE, LSAT, MCAT).

[175] While the aim of first quarter century testing was the measurement of general mental ability along Social Darwinist lines (e.g., W.W.I tests, Terman's National Intelligence test), the aim in the second quarter century of testing was the measurement of various special abilities along agnostic Operationist lines (e.g., W.W.II pilot tests, Cold War era Merit scholarships).  In the third quarter century of testing, the subdiscipline was assessing patterns of special abilities along Interactionist lines (e.g., educational achievement tests, different kinds of intellect). The fourth quarter century of testing brought with it not only an unexpected increase in coffers but also a self-serving search on the part of testing insiders to somehow synthesize past empirical advances in testing procedures into a coherent and useful set of empirical devices.  After all, reaching a new subdisciplinary consensus on testing procedures would help ensure the continuance of testing itself by making it less prone to so-called political whim or disruptive litigation (cf. Thorndike & Lohman, 1990).

[176] Indeed, early on in the Accountability era, such an endeavor had seemed redundant because the issue of continuing standardized testing was already decided for us in the wider arena of votes and within the only semi-compromised halls of the ETS.  By 2001, the initial discomfort of those seriously interested in working out an explanatory understanding of human intellect had taken root and grown into a full-blown subdisciplinary crisis of relevance.

[177] Between 1976-1980, the independent minded president Jimmy Carter had indeed restored dignity to the White House but had proven ineffectual in forming the bipartisan alliances necessary to help the country to both recover from fiscal debt (incurred during the Vietnam conflict) and to stem the inflationary tide caused by the OPEC oil embargo. His attempts to combat economic 'stagflation' (the combined situation of high unemployment and rising inflation) through gradual rollbacks of federal regulations on industry was unpopular with Democrats and was considered too little too late by Republicans.

[178] Starting with church membership lists, Rev. Jerry Falwell utilized the methods of direct mailing and televangalism to create a powerful political organization called the Moral Majority.  This interfaith organization claimed a citizenship mandate including: pro-life (i.e., anti-abortion); pro-traditional family values (i.e., opposition to homosexual rights or marriage); pro-moral convictions (on the issue of drugs and pornography); and pro-American (meaning strong national defense as a deterrent against communist aggression and enemy takeover).  At church picnics across the country, fundamentalist Christians burnt books, records, and magazines in protest against the corrupt values of secular society.  Between 1979-87 this movement lobbied for prayer and the teaching of creationism in public schools, while opposing the Equal Rights Amendment, homosexual rights, abortion, and the U.S.-Soviet SALT treaties.

[179] Roosevelt 's New Deal had redirected the American way to include a large federal government working on behalf of the nation's poor and the next seven presidents had worked within this Keyesian liberal economic framework to expand human equality of opportunity.  To Reagan, however, big government was the enemy of the American way because it interfered with private enterprise (Neustadt, 1990).

 

[180] The economy of the early 1980s was in severe recession.  The successive rise of various Asian manufacturing economies then resulted in further downsizing of mid-1980s American corporations as they adjusted to the new global economy.  Deregulation, privatization, downsizing, and dismantling were the buzzwords of the era.  An across-the-board assault on former federal regulations and a privatization of public assets in the television industry, savings and loan banking, and airlines were all under way.  Government budgets and social programs at all levels were being slashed.The disruptive economic effects of privatization were further compounded by Reagan's propaganda based (and surreal) defense agenda.  For one thing, his advocacy of the so-called Star Wars initiative (a technopolitical defense program intended to provide an umbrella against nuclear first strike by the Soviet Union's Evil Empire), meant the reversing the post-Vietnam controls on defense spending.  These new military expenditures under Reagan's two terms far outstripped the domestic spending cuts.Re-elected on the apparent strength of a false economic recovery (where the actual national debt had tripled due to massive tax cuts to corporations during his first term), the public faith in Reaganomics was only slightly tarnished after the stock market crash of 1987 forced Reagan to allocate further deficit funding guarantees to shore up a failing central banking system.  But for economists who questioned the long-term wisdom of Reaganomics, this crash was an indication that mere replacement of a federally control economy with an unfettered corporate economy would actually promote a succession of such periodic crashes (Galbraith, 1987).     As the governments of the world began to retreat from the "commanding heights" of their national economies, the resulting periods of prosperity have been periodically interrupted by such global crashes.  Despite this fact, under the new conditions of globalized economies, the old debates about the role of the market versus the role the state have gone by the wayside.  The world has once again become a college of corporations as much as it is a system of nation states.  Consequently, the focus of so-called progressive economic thinking is now said to be not how to harness or suppress market forces but how to utilize existing market forces to achieve democratic objectives (Yergin & Stanislaw, 1998).

[181] "College Board achievement tests also reveal consistent declines in recent years in such subjects as physics and English. The College Board's Scholastic Aptitude Tests (SAT) demonstrate a virtually unbroken decline from 1963 to 1980. Average verbal scores fell over 50 points and average mathematics scores dropped nearly 40 points....Both the number and proportion of students demonstrating superior achievement on the SATs (i.e., those with scores of 650 or higher) have also dramatically declined..." (A Nation at Risk, 1983).

[182] Between 1989-1992, George Bush reintroduced the Republican platform issue of private school vouchers into the mix of national educational issues. A limited voucher system was already under way in Milwaukee and by 1998 the numbers of voucher students there jumped from 1500 to over 6000 when it was expanded to include attendance at private religious schools (to the great consternation of public school teachers unions). Most notably, 3/4 of these new voucher recipients had already been attending private schools (see Kozol, 1991, 2000). Another aspect of school choice eventually gained bipartisan national support in 1997 under the Clinton-Gore administration when Congress allocated 80 million for the promotion of so-called charter schools. The test case for charter schools had been carried out in Baltimore in 1990 where the existing School District sponsored 9 chartered schools for a period of 4 years. Fiscal independence (in terms of maintenance costs, teaching equipment, and teacher salaries) allowed these schools to work out intervention programs (or curricular adjustments) which the centralized administration of the public schools had not been able (or were unwilling) to implement. Notably, the Baltimore experiment of "within system" charters was undermined and voted down in November 1995 partly due to the public schools having brought in private testing specialists to increase their own short-term standardized test performance (Sarason, 1998). The lesson for the ousted Education Alternatives Incorporated was that future charter schools would have to work externally to the public school administration with their accountability being directly to the state (rather than to the existing city and school bureaucracies).

[183] The National Education Goals Panel (NEGP) is a bipartisan andintergovernmental body of federal and state officials created in July 1990 to assess and report state and national progress toward achieving national education goals.

[184] The National Assessment of Educational Progress (NAEP), also known as "the Nation's Report Card," is the only nationally representative and continuing assessment of what America's students know and can do in various subject areas. Since 1969, assessments have been conducted periodically in reading, mathematics, science, writing, U.S. history, civics, geography, and the arts.

[185] Created in 1989 by former Reagan administration attorney Michael McDonald and conservative scholar Michael Greve with 10 full-time employees and an annual budget of $1.9 million, CIR receives most of its funding from right-wing libertarian and conservative foundations such as those run by Richard Mellon Scaife.  While the particular emphasis of CIR has been on civil rights law, freedom of speech, and the free exercise of religion it has most recently been very active in opposing affirmative action.

[186] Ironically, it was then pointed out by others that this first class of "Talented 20" enrollees in college (Fall 2000) was actually more White than the university system as a whole.  Of the 11,539 students who enrolled, about 13.1 percent were Black and 12.2 percent were Hispanic.  That pool of students turns out to be less diverse than the universities' enrollment as a whole, which was 15 percent Black and 15 percent Hispanic.

[187] This schism was partly a reflection of the changing professional make up of the APA itself up to 1987.  Before W.W.II almost 70% of the doctorates in psychology were in experimental psychology; by 1984 that figure had dropped to 8% (Goodstein, 1988).  Before W.W.II, 75% of all psychologists with doctoral degrees worked in academic settings.  By 1989, that number had fallen to 30% (Kohout & Wicherski, 1990).

[188] "Duplicity" is defined as: (1) deception by pretending to feel and act one way while acting another; (2) the technically incorrect use of two or more distinct items (as claims, charges, or defenses) in a single legal action (Webster’s Encyclopedic Dictionary, 1988).

[189] His first edition (1981) had already made a similar point with respect to the older techniques of multivariate analysis including those of Lombroso, Goddard, Burt, and Spearman. Gould exposes all such searches for an aculturally decontextualized measure of intelligence for what they were: statistical methodolatry (see further Bakan, 1967). That is, they are appeals to the mystique of biological, genetic, or statistical methods or processes in the clear absence of a viable developmental theory of mental evolution.

[190] Around 1962, Jensen began extensive testing of Black, Mexican-American, and other minority group school children, developing a series of "culturally-free" intelligence tests that could be administered in any language.  The results of that program soon led him to distinguish between two separate types of learning ability (or intelligence): Level I, or associative learning, may be defined as simple retention of input -that is, rote memorization of simple facts and skills; Level II, or conceptual learning, is roughly equivalent to the attribute measured by I.Q. tests -the ability to manipulate and transform inputs- that is, the ability to solve problems.  Statistical analysis of his findings led Jensen to conclude that Level I abilities were distributed equally among members of all races, but that Level II occurred with significantly greater frequency among Whites than among Blacks and among Asians somewhat more than among Whites (see Jensen, 1982; Jensen, A. & Vernon, 1986).

[191] In that sense, the so-called cognitive revolution was not a revolution at all (see Gardner, 1985).

[192] Flynn was not the first to point out that IQ test often have "unknown environmental factors" that make raw scores vary across decades or even generations. In the late 1940s, W. Owens (1953) at Iowa State University discovered a batch of Army Alpha examinations that had been given to psychology classes at Iowa in 1919.  In a follow-up study he located 127 men of the original sample and retested them after a lapse of 31 years finding no decline in the average score on any subtest of the Alpha (even though it is a speed test) and on several subtests (involving verbal skills, information, comprehension, arithmetic, and memory span for digits) there were appreciable increases.  Similarly, David Campbell (1965) carried out a follow-up on former University of Minnesota students after a 25 year lapse.  Their scores exceeded those obtained on their entrance tests as college freshmen.  The same sample of subjects at age 45 appear brighter than they did when they were 20 (the longitudinal comparison), but at the same time their performance was not as good as that of contemporary 20 year olds on current entrance exams (a cross-sectional comparison).  James Flynn is a political scientist (not a psychologist).  He is currently at the University of Otago in Dunedin New Zealand.

[193] Luria's Uzbekistan research was intended as an interdisciplinary pilot project and including anthropologists, translators, and female researchers on the research team.  Their subject groups included: (1) Illiterate village women (interviews conducted by women researchers); (2) Illiterate male peasant farmers; (3) Women who had attended short-term courses on the teaching of kindergarten age children (almost no literacy training); (4) Barely literate Collective farmers and male young people who had taken short farming courses (but who already had experience in planning farm production, distributing labor and taking stock of farm output); and (5) Women students admitted to teachers' school after two or three years of remedial school study (their educational qualifications being still quite low with respect to contemporaneous American standards of education).

[194] Luria's research group contextualized their study by utilizing the anthropological technique of grounding observation (and the collection of data) within the culture under study: "As a rule our experimental sessions began with long conversations...with the subjects in the relaxed atmosphere of a tea house... or...around the camp fire.... Only gradually did the experimenters introduce the prepared tasks which resembled the 'riddles' familiar to the population and thus seemed a natural extension of the conversation" (Luria, 1976, p. 16).

[195] Likewise, Greenfield points out that the decline in reading for pleasure (associated with the rise of television viewing) was historically accompanied by a decline in test scores on the SAT-verbal exam (p. 114) which require higher levels of verbal competence.  With respect to Flynn's hypothesis that "Advanced Problem Solving Ability" (and not innate intelligence or even "g") is being measured by such tests, we can surmise (albeit speculatively) from the pattern of Flynn's data that the relative ability of test populations to carry out visual comparison of abstractions went up and the relative ability to carry out verbal comparison of abstractions went down along with the cultural diffusion of these visually biased technologies.

[196] The procedure of the study was as follows: After exposure to an animated video display (regarding the logic of electronic circuitry) student participants were first required to provide a pre-treatment report consisting of free-style answers to a paper-and-pencil test of comprehension.  They were then retested after exposure to an intervening experimental condition involving a distracter task -a game of spatial memory called Concentration (given on either a physical board or on a computer display).

[197] This methodological point (regarding the appropriate concepts and starting point for investigation of the human mind), while seemingly obvious, is still important to state explicitly because biogenic language (no matter how well accompanied by empirical evidence) gets in the way of explanation. When Greenfield, for instance, opens with the question: "What constitutes specieswide adaptation to the human niche?" this predisposes the resulting account to a litany of problematic biogenic concepts including: "successful adaptation to a niche," linguistic "communication," "phenotypic" intelligence, and "social" organization.

[198] It must be emphasized that modern culture still utilizes numerical complexes both in the commonplace descriptions of time passage or of military groupings, and in the more profound tasks of attempting to understanding the significance of very large numbers.  A week, month, year, century, and a millennium are all commonplace examples of the former usage.  Similarly, ten soldiers walking alone are simply ten men but when in formation with a corporal become a platoon.  The ongoing importance of perceptual complexes, however, becomes especially clear when we consider their use in the latter, non-commonplace manner.  For instance, to be informed that Event Bravo (i.e., the first hydrogen bomb exploded in 1951) produced a yield of "15 megatons" holds relatively little meaning until we are also told that this quantitative yield is roughly equal to 100 Hiroshima size bombs, or that the yield of all of the explosives used during W.W.II were equal to "only" two megatons.  Indeed, it was through the very use of a numerical complex that Eisenhower (in 1953) first drove home the socio-economic costs of Cold War thermonuclear proliferation to the American public (see chapter 7).

[199] "Thinking in adults proceeds according to laws of complex combination involving the accumulation of experience and inferences from generalizations.  It follows the laws of inductive-deductive logic, whereas thinking for the young child is, according to Stern [1924], 'transductive'... It develops neither from the general to the specific nor from the specific to the general' it simply infers from one episode to another, guided each time by new features that catch the child's attention" (pp. 163-164).

[200] "When we study the memory of cultural man, strictly speaking, we do not study an isolated 'mnemonic function' -we study all the strategies and techniques aimed at fixing experience in memory and developed in the course of cultural maturation" (p. 186).

[201] I make this historiographic claim fairly confidently because this has indeed been the pattern of professional change in other subdiciplines such as: developmental psychology (see Bruner, 1985; Ratner, 1991; Rogoff, 1989, 1990), neuropsychology (Luria, 1973; Goldberg, 1990); and educational psychology (Moll, 1990).

 

[202] "Suppose I investigate two children ..., both of whom are ten years old chronologically and eight years old in terms of mental development.  Can I say that there are the same age mentally:  Of Course.  What does this mean?  It means that they can independently deal with tasks up to the degree of difficulty that has been standardized for the eight-year-old level.  If I stop at this point, people would imagine that the subsequent course of mental development and of school learning for these children will be the same, because it depends on their intellect...Now imagine that I do not terminate my study at this point, but only begin it. These children seem to be capable of handling problems up to an eight-year-old's level, but not beyond that.  Suppose that I show them various ways of dealing with the problem.  Different experimenters might employ different modes of demonstration in different cases: some might run through an entire demonstration and ask the children to repeat it, others might initiate the solution and ask the child to finish it, or offer leading questions.  In short, in some way or another I propose that the children solve the problem with my assistance.  Under these circumstances it turns out that the first child can deal with problems up to a twelve-year-old's level, the second up to a nine-year-old's.  Now, are these children mentally the same?...This difference between twelve and eight, or between nine and eight, is what we call the zone of proximal development.  It is the distance between the actual developmental level as determined by independent problem solving [crystallized knowledge] and the level of potential development as determined through problem solving under adult guidance or in collaboration with more capable peers"(Vygotsky, 1978, pp. 85-86).

[203] As Danziger (1990) pointed out, this individualized American research tradition contrast strongly with the clinical (French) and physiological (German) traditions of research (both of which permit guidance of an expert) be that expert the researcher, the clinician, or the specially trained subject.

[204] Robert Richards (1987), for instance, contrasts and compares various positions on evolution including those of Darwin, Spencer, and Baldwin.  He touches briefly (though not sufficiently) upon emergent evolution as a possible alternative for psychology.  Alan Costall (1993) focused on the reception of C.L. Morgan's emergent evolution by early 20th century North American animal psychology and thereby promoted the rehabilitation of Morgan's Canon (properly understood).  Morgan, that is, is just now beginning to receive long-delayed credit for working out a reasonable emergent evolutionary account of variety and scope of animal mentality (Wozniak, 1993).  The republication of G.H. Lewes' Study of Psychology (1879) in D.N. Robinson's (Editor) "Significant contributions to the history of psychology" series is also encouraging (see also Robinson, 1978; Lewes 1877).  Similarly, the philosopher David Blitz's Emergent Evolution: Qualitative Novelty and the Levels of Reality (1992) has established the historical pedigree of emergent evolutionary thought in the wider interdisciplinary context of integrative and levels theory analysis.

[205] For example, the well-intended but conceptually inadequate textbook by Gould & Gould (1994) called The Animal Mind, like many of its competitors, is completely devoid of a coherent theoretical position regarding the phylogenetic continuity and discontinuity of mental evolution. Indeed, the dictates of the market place demand that textbooks be written so as to allow the chapters to be read in any order and in complete isolation from each other.  The result is a pastiche of unrelated and often logically contradictory views depending upon which chapters are read.  In the Gould book, needless reduction and shameless anthropomorphism sometimes occur within the same chapter.  This tenuous position, of course, is not particular to the Gould book per se but is a wider problem of the comparative animal psychology textbooks.  What makes the Gould book especially noteworthy, is that the very theoretical tools which might have provided an overarching unity of subject matter were already presented, sixty-four years earlier, under the identical title by C.L. Morgan (1930).  Had the authors known of Morgan's emergent evolutionary approach to the animal mind, perhaps their work would have been the first modern North American textbook to present a truly integrative account of comparative psychology.  As it stands, this text (published under the auspices of American Scientist) is simply one of many that do not.  Surely American Scientist can do better. Lest this critique be faulted for possibly being slightly out-of-date, the same sort of weaknesses can be found (to a lesser degree) in Animal Minds (Griffin, 2001).