|
Predictive Modelling Methodology
by (Excerpted from, Dalla Bona, Luke (1994) "Volume 3: Methodological Considerations" A Report Prepared for the Ontario Ministry of Natural Resources. Lakehead University: Center for Archaeological Resource Prediction, Thunder Bay, Ontario. Table of Contents
Introduction Abstract The theoretical and applied aspects of conducting archaeological predictive modelling are a relatively new field within archaeology. It has its basis in studies conducted during the 1950s and 1960s but gained prominence during the late 1970s and 1980s and coincided with a surge in cultural resource management in the United States. During the 1980s the development of geographic information system (GIS) technology resulted in the integration of computer technology in archaeological predictive modelling. Predictive models developed to date are either inductively or deductively derived. Inductively-derived models are dependent upon a database from which to generate models and thus, are subject to any biases existing in the database. Deductively-derived models begin with theories predicting human behaviour. While deductive models better encompass the range of human behaviour, they suffer from changing interpretations and theoretical viewpoints. Two main directions are taken in the development of a predictive model: the numerical approach and the weighted value approach. The numerical approach makes use of statistical methodology to discover associations among archaeological sites and characteristics of the physical environment. Within these paramters, models are physically generated by either an intersection or weighted value method. The intersection method begins with the basic assumption that all variables used in the generation of a predictive model contribute equally to the determination of site location potential. Calculating high, medium, low potential areas is simply a process of determining where the the greatest number of variables that converge in a given location. The weighted value method begins with the basic assumption that each variable contributes differently to the final determination of site location potential. This is accomplished by developing and applying a weighting scale which effectively ranks variables numrically. Site potential is determined by the arithmetic addition of all variables. Areas of high potential will have the largest numeric values and areas of low potential will have smallest numeric values. During the development of a predictive model, a number of issues must be considered. These include the representativeness of the variables to that being modelled, the quality of databases consulted, the scale at which modelled should take place and the manner in which potential is presented. Predictive modelling is presented as a three stage process. Primary stage predictive modelling includes hypothesis development, organization and data collection. Secondary stage modelling includes initial model development and testing and is the stage where most predictive models stop. Tertiary stage modelling includes continued application of the model and ongoing refinement. Ideally, tertiary stage modelling is a never ending process whereby lessons learned from previous model applications are incorporated into new and future applications maintaining or increasing the predictive robustness of the model. The introduction of geographic information systems into archaeological research had two profound results. The first was that the application of research approaches such as predictive modelling could now be effected over relatively large areas. Secondly, the use of geographic information systems allowed for the uniform analysis of large areas. Concurrently, the use of GIS introduced a range of considerations not traditionally a part of archaeological research. Issues surrounding digital data, cartographic theory, and general data integrity became an integral part of research design and strategy.
IntroductionPredictive modelling is an avenue of research within archaeology that has gained prominence over the past two decades. Predictive modelling for archaeology is defined as a "...simplified set of testable hypotheses, based either on behavioral assumptions or on empirical correlations, which at a minimum attempts to predict the loci of past human activities resulting in the deposition of artifacts or alteration of the landscape" (Kohler 1988:33). Parker (1985) sees predictive modelling as a natural outgrowth of the theories and methodologies of spatial archaeology and predictive modelling has become the focus of a number of archaeological studies (e.g., Allen et. al. 1990; Brown and Stone 1982; Judge and Sebastien 1988; Kvamme 1992).
Settlement StudiesPredictive modelling has its basis in the settlement studies first carried out in the 1950s and 1960s by Gordon Willey and other archaeologists. Willey (1953) intended to examine archaeological data on a regional level in an effort to understand the processes inherent in settlement systems in the Viru Valley in Peru. On the whole, Willey was successful at developing settlement pattern archaeology, and his work provided the stimulus for other settlement studies to be conducted. Resulting publications by Willey (1956), Willey et. al. (1965), Chang (1968) and Adams and Nissen (1972) established studies of settlement patterns as a valued research methodology within archaeology. Settlement pattern studies were further refined and used by archaeologists to conduct catchment analyses (Vita-Finzi and Higgs 1970), interpret social and technological change at the regional level (Adams and Nissen 1972), while others focused on the environmental determinants of settlement location (Haury 1956; Heizer and Baumhoff 1956; Williams 1956). Throughout much of the 1950s and 1960s, archaeologists operated within an inductive framework where research into settlement patterns was based upon little or no theory. Haggett et. al. (1965) provided a more solid grounding for locational theory to archaeologists by introducing many relevant concepts into the discipline from geography. He influenced a generation of archaeologists by outlining theories of settlement hierarchies, sampling procedures and hexagonal lattices (Haggett et. al. 1965). Trigger (1968) outlined more clearly the various aspects of settlement patterns and offered some determinants of settlement location. Concurrent research in other fields of archaeology was beginning to emphasize the importance of ecological variables in understanding settlement variability (e.g. Flannery 1968). In the decade that followed the 1960s, the manner in which archaeological data was handled changed considerably. Many archaeologists adopted more systematic approaches to collecting and analysing data. The use of computers allowed for the manipulation of greater amounts of data, the generation of more detailed analyses and more generally, for a greater variety of questions to be asked of the data by archaeologists at the time. Studies ranged from examinations of minute differences in artifact types, to macroscopic studies of ceramic variability, to studies of prehistoric culture change (e.g. Flannery 1976). These studies contributed to further refinement of the level of detail in which settlement variability was presented by archaeologists. As a result of settlement pattern studies, the research emphasis of some archaeologists was shifting from the study of single sites to the study of regions and their archaeological contents. For example, following closely from the settlement studies discussed above, the Southwestern Anthropological Research Group (SARG) set out to determine "why prehistoric populations locate sites where they did" (Plog and Hill 1971:8). Clearly stated in this research goal was the delineation of the "formal variability in sites, variability in temporal loci of sites, and variability in the spatial loci of sites" (Plog and Hill 1971:8). Indeed, settlement pattern research had turned, at least in print, from the elementary description of archaeological remains to the recognition of site distribution patterning. SARG presented a detailed research design for the study of human settlement systems. They recognized that a regional approach to studying variation in human settlement patterns was absolutely necessary to understand settlement systems. Previous research in the American southwest concentrated primarily upon a few core areas and these interpretations were then generalized for the entire region. The need for more detailed and standardized investigations prompted the formation of SARG. Foremost among the goals outlined by the project leaders was the explanation of: "variability in the distribution of prehistoric sites - settlement and limited activity sitesÉ Why do we want to explain site location or settlement system patterning?... The most important reason for explaining settlement locations is that we hope to arrive at tested and useful laws that can be used by social scientists to predict site locations anywhere at any time, including the present and the future" (Plog and Hill 1971: 10-11, orig. emphasis). The majority of settlement studies carried out in the Americas contained more description of settlement locations than explanations for their specific existence. Plog and Hill (1971) recognized the need for explanation of the 'system behind the settlement pattern' but strove to arrive at explanation via other avenues. Realizing that the explanation of settlement systems derives from an understanding of their mechanics, SARG sought to predict unknown site locations from the principles of the known settlement systems. Thus, SARG's goals anticipated those of many archaeologists by several years. At the same time, much of the discipline was embroiled in a methodological debate concerning paradigms and polemics, but several research projects that complimented the directions and goals set out by SARG eventually emerged. Plog and Hill (1971) were not the only archaeologists intent upon predicting site locations. Although not an explicitly stated research strategy, prediction as a subset of settlement pattern analysis was making its way into the archaeological literature. Perhaps the first settlement pattern study designed to identify sites using prediction was that carried out in the Reese River Valley in the Great Basin of the American Southwest (Williams et. al. 1973). The authors carried out a settlement pattern study in central Nevada focusing on winter village placement. They stated that "given the proper set of environmental conditions, [they] could successfully predict presence/absence of archaeological sites" (Williams et. al. 1973:215). Wanting to confirm their intuition about 'where sites could be found', they developed hypotheses based on those intuitions and measured their soundness. Variables, definitions and criteria used to develop their predictions were carefully outlined as follows (Williams et. al. 1973:227):
These criteria were not revolutionary by any means. In fact, they appear to be criteria quite obviously related to site location. What was new about these criteria was their clear definition and implementation in the overall research strategy. If any five of the seven criteria were met, "the locus was recorded as an area of potential habitation, whether or not cultural material was found" (Williams et. al. 1973:231). The results of the research were positive. The variables outlined above were shown to be present at 97% of the sites in the study area while 85% of the potential loci contained sites (Williams et. al. 1973:233). Although the authors acknowledged that refinements could be made to the prediction criteria, on the whole, they were successful. The authors showed that no one variable determined the location of a prehistoric habitation in this area. In spite of the fact that a single locational criterion would not significantly restrict the spatial distributions of sites, combinations of two or more mildly restrictive criteria would quickly reduce the number of possible locations that will fit the specified criteria (Williams et. al. 1973:234). On a more general level, the authors confirmed the suspicion held by many archaeologists concerned with location of sites: something acknowledged as a 'feel' or 'insight' gained from intimate familiarity with the data (Williams et. al. 1973:217). Indeed, new archaeological insight was gained into the prehistoric inhabitants of the Reese River Valley regarding their choice of activity loci. In this example, it was successfully demonstrated that prediction could provide insight into the explanation of the settlement system. At approximately the same time, another settlement pattern study was carried out in the British Honduras (now Belize) by Green (1973). Drawing her methodology primarily from Haggett et. al. (1965), Green mirrored the questions posed by SARG: "the analysis is aimed at answering the question: why did the ancient inhabitants settle where they did?" (Green 1973:279). Although the author's primary goal is to explain the variability in settlement locations: "a corollary goal of the analysis is to predict the location of sites in portions of the region which have not yet been explored archaeologically. Prediction, in this case, is based on determining the correlation between sites and environmental features in the known region and projecting this knowledge to environmentally similar areas. The method can also suggest locations within the study area which should be rechecked for the presence of undiscovered sites" (Green 1973:279). Central to Green's analysis was the proposition that sites were located in order to minimize the effort expended in acquiring critical resources (1973:279). Green worked with a partial sample of the entire archaeological database. The results showed a strong association between site locations, soil types and vegetation (Green 1973:287). Also apparent from the study was the importance of proximity to navigable water. In fact, the author concludes that the location of every site in the sample can be explained by association with these three variables (Green 1973:289). Green's attempt to predict locations of undiscovered sites based upon the above criteria met with less success than did the Reese River Valley study discussed earlier. High measures of variability were generated from statistical tests. Areas that were predicted to have potential for site location were very large and impractical for efficient survey due to accessibility and the nature of the physical landscape. Overall, despite some of the questions raised by Green's conclusions and the pioneering nature of her study, the results of her attempts to use prediction to help explain the settlement pattern were promising. While some archaeologists were utilizing more sophisticated analytical techniques in performing regional archaeological analyses, many more archaeologists used analytical techniques scarcely more advanced than Willey's (1953) work. Peregrin's description of work conducted earlier in his career exemplifies this point: "We began by laying out the Rosario phase 1:20,000 map. One inch colored beads were used to designate sites by level of population, mounded architecture, specialized activities, pottery characteristics, etc. By standing up on stools we could get a visual impression of settlement patterns and other prominent aspects of the regional system" (emphasis added) (1988:875). In summary, predictive modelling developed from studies of settlement patterns. Settlement studies often provided data on site location and their distribution and it was with this information that researchers attempted to predict other, unknown, site locations. The first predictive models attempted to turn an understanding of specific settlement systems into predictions of site location which would hopefully contribute to explaining the settlement pattern. The above examples of prediction within settlement studies are representative of the directions and results of research in this field of archaeology. By the 1980s, researchers built upon the base provided by settlement studies. Predictive modelling became the subject of much research, but its role in settlement pattern research diminished as it was applied more and more as a cultural resource management tool.
Predictive ModellingThe literature concerning predictive modelling has become more extensive throughout the 1980s. For the most part, it has reflected the fact that developing a predictive model is not as elementary as that outlined by Williams et. al. (1973). Although much of the literature and examples of predictive models is buried in government files and consulting reports to business, some academic archaeologists doubt the efficacy and value of prediction in archaeology (Kohler and Parker 1986:396). It is seen as an expensive exercise to discover the obvious, regarded as suspect or unreliable or being limited in value (Kohler and Parker 1986:398). The concerns of cultural resource managers, contract archaeologists and academic archaeologists have resulted in a body of literature that begins to address some of the issues relevant to developing a predictive model (Brown 1981; Carr 1985; Kohler and Parker 1986; Limp and Carr 1985; Ebert and Kohler 1988; Judge and Sebastien 1988; Kohler 1988; Kvamme 1988a, 1988b, 1989, 1990; Warren 1990). Kohler has contributed extensively to the literature of predictive modelling, both in published and unpublished (contract/government) areas and sees predictive modelling developing in two directions: that is, inductive versus deductive modelling (Kohler and Parker 1986:399; Kohler 1988:37), elsewhere called the behavioral approach (Hay et. al. 1982:14).
Inductive ModelsAn inductive model usually begins with data and then builds its conclusions based upon all the biases inherent in the original data set. "They begin with survey data... and then they estimate the spatial distribution of the population of archaeological materials from which the sample was drawnÉ Any inferential locational model predicts only what would have been found had the population of space from which the sample was drawn been surveyed in the same manner as was the sample, using the same rules for attribute coding, site recognition and data analysis. Such inferential models predict neither the systemic interaction between a cultural system and a landscape nor the archaeological context resulting from it; rather they predict what we will find and how we will interpret it if we consistently follow a particular set of rules" (Kohler 1988:37). Inductive models form the basis for a large percentage of predictive models developed to date. Since for many areas of North America there already exist large site databases, their examination could provide tremendous amounts of site-related information. In fact, these data are readily integrated into many predictive models. "[F]or this particular exercise the computerized database (AZITE computer database) faithfully represents our current knowledge of site locationÉ [and] contains a variety of descriptive information pertaining to the environment, location, cultural affiliation, site function and temporal components..." (Altschul 1990:228). While existing databases contain a wealth of invaluable information, these data are not without error and bias. For example, site locations may be incorrectly recorded, environmental information may be recorded in too little detail, data may be missing from some records, or information gathered by previous researchers may differ in quality compared to the standards of present-day archaeologists. More seriously, systematic biases may exist in the current site inventory. As this biased inventory will form the basis of the search for landscape correlates of site distribution, the original biases will be perpetuated into the resultant predictive model. That is not to say that this information should not be used; rather, it should be used carefully only after evaluation of its integrity as a complete database.
Deductive Models"The challenge for deductive models is to build the bridge to the analytic context from the systemic context, which is where the outputs of the system can be observed. This bridge-building... is called explanation" (Kohler 1988:37). He, with Parker, sees deductive models as encompassing three considerations.
"From the standpoint of human adaptation, patterns of local vegetation are of crucial concern. Many plants serve as primary food and technological resources as well as secondary resources which attract economically important animals. The distribution of non-food resources, especially water and fuel, can be equally important to settlement decisions. Diversity is also beneficial when considering non-food resources. In addition to fuel, a variety of trees provide the raw materials for tools, utensils, shelter, and weapons, pitch for sealing seams, and fibres from the inner bark for cordage, bags, and nets. A variety of plants can be used to make dyes, reeds can be woven into mats, and clay from local stream banks can be made into pottery. Evaluations of topography, water, soils, vegetation, precipitation, temperature, and availability of rock outcrops or glacial till exposures are all important in decisions about the adequacy of shelter and the availability of economic resources" (Schermer and Tiffany 1985:220). Dean (1983:11) has pointed out that people may look for only a few clues in their surroundings when identifying and selecting activity locations, rather than processing the entire range of environmental "cues" available. It may only be these basic variables that really have any association with archaeological sites. This raises interesting questions about the analysts' choice of the proper environmental variables for inclusion in the modelling process. "Perhaps in building predictive models we are too ready to make the assumption that only a complex multivariate model can adequately account for human locational behaviour, when in fact, a few (proxy?) variables, observed in the highly correlated data base that is our environment, may be sufficient for forming locational decisions" (Kohler and Parker 1986:433). Support for this position lies with the fact that archaeologists have presented successful predictive models using very few variables. For example, Altschul (1990) developed a predictive model for the 9,000 acre Mount Trumbell area of Arizona. There were 228 known sites in the study area that had been sampled by various agencies in the past. Three environmental variables were identified which account for the majority of site locations that include elevation, slope and aspect (Altschul 1990:229-230). Altschul concluded that in this area "over 70 per cent of all component locations can be predicted with just three variables" (1990:234). However, Altschul does not discuss what his three variables are measuring. What are they 'proxy' variables for? Without this information, we are unable to discuss why sites are being found where they are nor are we able to offer explanations for settlement systems in the area.
|