|
Predictive Modelling Methodology (continued)
by
Introduction Another point involves the consideration of land use choice derived from 'habitual behaviour' derived from cultural norms, traditions and spiritual proscriptions, rather than an overriding consideration of the economic attractiveness of a specific locality (Kohler and Parker 1986:435 citing Wright and Dirks 1983). Factors related to actions having little archaeological visibility, such as spiritual influences, may have resulted in activities being located in less 'typical' locations. Choice of activity location may also be the result of historical events that override environmental considerations. Other criteria have been recognized by archaeologists to be important in choosing activity location. Flannery (1976) and Reynolds (1976) discuss social factors that condition site placement. Jochim (1976:12) details criteria of economic relevance and assumes that "the determination of resource use tends to precede and condition the site placements and demographic arrangements of a hunter-gatherer group". A predictive model may take into account distance to resources and activities carried out at a location. Wood (1978:161-162) offers the following criteria for different site types:
2) Multiple activity sites with dominant subsets of activities will be located so that the distances between a site and the matching resources indicated by the dominant subsets are minimal; 3) Multiple activity sites will be located so that the acreage distance to all of the critical resources is minimal. In addition to the inductive and deductive theoretical frameworks, the methodological approaches employed in predictive modelling may be separated into two different groups. The first is described as the numerical approach, and the second as the graphical approach. The numerical approach may be considered a direct outgrowth of the emphasis placed on the statistical analysis of archaeological data since the early 1970s. Predictive models using the numerical approach employ multivariate statistics as a discovery technique to identify associations among variables which ultimately lead to predictions of areas with archaeological resources. This approach makes a number of primary assumptions that are crucial to the validity of the model. The first relates to the nature of the sample. Because statistical methodology discovers meaningful associations among variables from known site information, it is important that the known site information is representative of the actual sites that exist. Probabilistic designs are of little use if the population sample is not the same as the population across which predictions are to be made (the target populations)... As one practitioner remarks, '[We] cannot make inferences about the archaeology of verdant grasslands with good intermittent and permanent streams from a sample restricted to scoria ridge tops, badlands and breaks' (Peebles 1983:8) (Parker 1985:406). Roper echoes this view in her comments on a predictive model developed for the Vermilion River/Embarass River region of Illinois: "Methodologically, multiple regression should eventually be a valuable predictive tool but its use with the poor data available for east central Illinois is unwarranted. The discriminant function analysis at the end of the report is an interesting idea, but I wonder if it is really describing where sites are located or where people have intuitively felt they should be and have therefore looked for them" (Roper 1981:149). Thus, users of predictive models derived using the numerical approach must carefully evaluate the nature of the existing database. In addition to a very careful examination of the representativeness of these data, an assessment must be made as to whether known site locations reflect the actual distribution of archaeological sites, or simply reflect where archaeologists have conducted their surveys (e.g., Acheson and French 1992). It is also important to recognize that the physical and cultural environment has changed over time, and these changes may have affected the choice of activity location through time. Kohler and Parker state that "... despite numerous studies in diverse areas indicating change in site location through time in response to changes in adaptation type, and despite evidence that within any adaptation type, functional subsets of sites may have differing environmental determinants, most empirical correlative models aggregate sites of all types and ages together for prediction" (1986:408). Models developed using the numerical approach rarely address temporal considerations. Some researchers opt to avoid the issue of 'time' and develop a generalized model,such as that generated by Lewis and Murphy (1981). Other researchers do not avoid 'time' as a variable, rather it is suggested that discernible patterns of human behaviour cross-cut considerations of time. This perspective is discussed by Kvamme (1992:23). By associating sites representing many different functional, chronological, and cultural types into a single open-air class, a great deal of locational variability is introduced to the modeling problem, thereby reducing the potential power of the result. Nevertheless, it is believed, and it has been elsewhere shown (e.g., Kvamme 1985, Kvamme and Jochim 1989), that there are common locational tendencies that may cross-cut functional categories, such as preferences for level ground or proximity to water. Few researchers have developed models applicable to specific time periods (e.g. Lewis and Murphy 1981). The reasons for this are not clearly presented in the literature. In fact, there seems to be a fixation to create one model to explain everything as if one magic set of variables could predict all site types in all time periods. At one level, it is recognized that many different factors influenced site location through time just as different factors influenced the location of different site types. Perhaps controlling for many factors including changes in physical geography, climate, flora and fauna, cultural groups and technology proves too formidable a task for archaeologists working under tight budgets and/or strict mandates. Whatever the reason, the majority have developed predictive models that encompass all prehistoric time periods and all site types. Another consideration relates to the choice of variables, and the detail with which information will be selected and manipulated. The choice of variables is determined by the nature of the predictive modelling project, the type of data available, the nature of the study area, and other considerations. Parker (1985) describes two characteristics of variables used in predictive models. The first, site-focused data, all require measurement at the site level. Examples of site-focused data include distance to water, vegetation and slope. The second characteristic, and the one Parker suggests is commonly employed, is quadrant data. These are data that are generalized from survey quadrants. In some cases, where a high resolution model is being developed, quadrant data may closely resemble and augment site-focused data. In other cases, where coarse resolution models are developed, the quadrant data may generalize the study area to the point where the data are less meaningful than is preferred (Kohler and Parker 1986:408). An example of research using the numerical approach is Sandra Parker's Sparta Mine predictive model. In this study, Parker aims to "...develop an explanatory model relating site locations in an area to the biophysical characteristics of that area. To perform the desired functions, such a model must allow one to state the probability that a particular geographic unit in the area would have been selected for the location of a site. Such a model may be in the form of a prediction equation in which the dependent variable is site presence/absence and the independent or predictor variables are the biophysical variables" (Parker 1985:176). Parker's primary means of discovering associations between variables and site locations is multivariate statistics. Two basic data collection methods were employed. First, biophysical data were collected from United States Geological Survey (USGS) 7.5 minute topographic maps to provide the independent or predictor variables for the entire Sparta area. Secondly, a field survey was conducted to provide data about site presence/absence, the dependent variable (Parker 1985:182). The model was evaluated using a number of different tests: observed vs. predicted site frequencies, cross-validation tests, and field tests. Overall, Parker (1985:198) demonstrates by these tests her "confidence in the validity of the model". The numerical approach is certainly a valuable method which can lead to the discovery of significant associations between site locations and variables. However, it is an approach which requires a high degree of statistical training and competence in order to develop the model, interpret the results and validate/replicate the results. Invariably, with the results of the model presented in a numerical table outlining the associations between variables and sites, a great deal of interpretation is required to relate the results to 'on-the-ground' locations. Roper, commenting on a specific predictive model, and summarizes some deficiencies of the numerical approach. While the authors make a reasonably good start at such, they fail to produce a satisfactory end product because of naive use of statistics. They begin with cross tabulations of variables... cross tabulation of each variable with each other variable is not an efficient use of statistics, and does not discern those variables that do or do not have predictive power. Further, this report declines to summarize those statistics into a meaningful interpretation (i.e., predictive model) of site location patterns; rather, it assumes that the tables will speak for themselves. The text reflects neither a good understanding of statistical analysis nor an ability to employ statistics in interpretation of archaeological data (Roper 1981:150-151). Roper's criticism of the above model does not invalidate the use of statistics as a means for developing predictive models. In fact, the development of predictive modelling is coincident with the use of advanced statistical techniques. However, despite the sustained use of statistics in archaeology, there are still those, including some developers of predictive models, who are familiar with only basic statistical procedures and tests. The use of multivariate logistical regressions requires an advanced level of understanding of statistical theory and techniques. Accepting the validity of a model like Parker's requires a tacit acceptance of the calculations and results presented. Verification and/or duplication of the methodology might prove daunting to some archaeologists who may ultimately accept the results primarily on faith. This may contribute to the statistical approach not being the choice of some developers of predictive models. Thus, while the statistical approach is still used to generate valid predictive models, other models utilize different approaches.
The Graphical ApproachAn example of this graphical approach is the model developed for the Mt. Trumbell area of northern Arizona. Jeffrey Altschul began the study by asking cultural resource managers if modelling was a useful tool for their specific needs. He discovered that what "...managers need to know is where the 'red flags' are...what is needed are not models predicting the unknown, but rather models that bring some order and direction to the huge databases that have been, and are continuing to be, amassed" (Altschul 1990:227). In other words, in some jurisdictions data has been amassed at such a rate that information managers cannot adequately cope with it. Predictive modelling becomes useful as a means of aiding the identification of landscape variables that are consistently correlated with known site distributions. These correlates offer a means of organizing the existing database, and identifying presently uninvestigated localities which have a high probability of containing sites similar to the presently known sites. In essence, this is not investigating the unknown, rather it is merely investigating more of the same, only focussing on areas not yet field surveyed. This can be viewed as modelling existing assumptions and expectations. To predict the unknown, implies and requires that archaeologists step outside what is 'expected' and employ a modelling rationale that does not build exclusivity into its results. From its very inception, this type of modelling approach necessarily will consider all possible options allowing areas to be excluded because of the manner in which variables interact. With these red flag models, Altschul takes an entirely different approach from previous orientations. While valuable, this approach has weaknesses. The predictions generated tend to perpetuate the 'presently known' site distribution, and enable the prediction of 'average', repeatedly used, site locales that cluster around stable and important environmental variables. However, these models, no matter how sophisticated, are not discovering anything more than new sites that conform to presently known site distribution patterns. However, using the graphical approach, one may identify landscape characteristics that are associated with a significant proportion of the site inventory, and thereby highlighting a minority of sites that clearly are different. This smaller subset can then be subjected to another round of analysis to determine patterns of association with different landscape variables. This second round of analysis may lead to significant new insight. A predictive model that seeks site localities that do not conform with the known pattern offer advantages in that it offers new information of a different order. That is, types of sites, land use strategies, and idiosyncratic behaviour that are presently unknown. In the event that the model identifies sites that do not conform to the expected site distribution, a resource manager is in the position to re-focus research to identify new sorts of ancient land use that are not immediately apparent in the current heritage resource inventory. This will result in an ongoing refinement of what appears to be of 'low potential' for containing archaelogical sites. Sites in settings presumed to be anomalous according to conventional wisdom, by definition, must be the result of behaviour that does not fit current explanations of why prehistoric inhabitants settled where they did. Under any definition, these sites must be significant, for they more than any others have the potential of telling us something new about prehistory (Altschul 1990:228). Additionally, as more anomalies are identified, patterns may emerge, become predictable and therefore are no longer anomalous. Those sites whose locations remain anomalous become the target of further study for it is these sites that will give us greater insight into the past (Altschul 1990:228). Altschul developed a predictive model for the 9000 acre Mount Trumbell area of Arizona. There were 228 known sites in the study area that were sampled by various agencies in the past. Three environmental variables were defined to account for site location: elevation, slope and aspect. Altschul outlines four steps in the development of his predictive model (1990:230-232):
Step 1: Data exploration
Step 2: Confidence and independence
Step 3: The favourability map
Step 4: The red flags Altschul's approach differs from that of other researchers in that he is focusing on predicting anomalies rather than predicting the already known. He is trying to identify why sites are located in unexpected places; in other words, why did we not expect them to be there? Rather than "viewing models as end products, we view them as analytical tools" (1990:237). Altschul's graphical approach is more useful to cultural resource managers because patterns or non-patterns are more readily apparent. While statistics are used for validation and confirmation, the results need not be translated from statistical tables to archaeologically-meaningful statements. Using the graphical approach "...at a glance, managers can determine the likelihood of determining sites on a particular development project. For archaeologists, [a map such as this] represents a compilation of the relationship between environmental variables and site location" (Altschul 1990:233).
MODELLING PROCEDURES
The Intersection MethodThe assumption that all variables contribute equally to the determination of the predictive model is one that does not accurately reflect the complexity of human land use decision-making. For example, if modelling the location of prehistoric fishing camps, variables such as 'proximity to water' are of greater significance than such variables as 'vegetation zone'. Thus, while the intersection method is employed in a number of predictive models, it does not result in a model which faithfully reflects the true range of prehistoric decisions employed in determining the location of activities.
The Weighted Value Method
A value (V) is applied by the researcher to the category to reflect its importance and contribution in the modelling process. In addition, variables are assigned weights (W) to reflect differences within categories in their contribution to the modelling process. For example, the category Proximity to Water might be given a value of 3 to reflect its importance in determining site location. The variable 0-100 metres from water might be given a weight of 3, 101-200 metres a weight of 2, 201-400 metres a weight of 1, and 401+ metres a weight of 0. By multiplying the category value by the weight of the variable (W x V), a weighted value is defined for each variable used in the modelling process (Table 1). The determination of the numerical weight or value is researcher specific. There must be some basis upon which the researcher makes these numerical assignments. Reference may be made to previous archaeological work which has identified characteristics of the landscape presumed to be associated with archaeological sites. Ethnographic, ethnological, historic or ethnoarchaeological studies may also be sources upon which the basis for weighting of variables is based. The experience of the archaeologist and colleagues also working in the area may also contribute to determining a weighting scheme. Additionally, the nature of the project itself may have some bearing on the weighting applied to variables. For example, a researcher applying a predictive model within a given theoretical framework may give more importance to economically-related variables than some geographic variables. In another instance, a researcher may combine his or her own experience with data obtained from the ethnographic literature and derive weights and values accordingly. In conclusion, the manner in which weights and values are applied is subjective yet it is based upon data obtained and evaluated by the researcher from a variety of sources and applied within project specific frames of reference. Because the weighted value method allows certain variables to have more 'predictive strength' than other variables, it results in a model that better reflects the decisions made by prehistoric people when choosing their activity locations. In addition, because it is imperative that the manner in which the categories and variables are weighted is clearly outlined, the contribution of each variable to the final model is also clearly established. This final point is the most important point of all. For any model to be valid, it must be reproducible and defensible. With the weighting factor of each variable clearly defined, discussions can occur concerning the weights of individual variables and, the effects of changing weights can be tested. The results of these tests can then be evaluated. In the end, one is left with a model for prehistoric activity location which is clearly defined, testable and reproducible.
|