Published online 8 March 2006
Published in Vadose Zone J 5:248-260 (2006)
DOI: 10.2136/vzj2005.0025
© 2006 Soil Science Society of America
677 S. Segoe Rd., Madison, WI 53711 USA
SPECIAL SECTION: FROM FIELD- TO LANDSCAPE-SCALE VADOSE ZONE PROCESSES
Automatic Calibration and Predictive Uncertainty Analysis of a Semidistributed Watershed Model
Zhulu Lin* and
David E. Radcliffe
Dep. of Crop and Soil Sciences, Univ. of Georgia, Athens, GA 30602
* Corresponding author (linzhulu{at}uga.edu)
Received 14 February 2005.
 |
ABSTRACT
|
|---|
Semidistributed models are commonly calibrated manually, but software for automatic calibration is now available. We present a two-stage routine for automatic calibration of the semidistributed watershed model Soil and Water Assessment Tool (SWAT) that finds the best values for the model parameters, preserves spatial variability in essential parameters, and leads to a measure of the model prediction uncertainty. In the first stage, a modified global Shuffled Complex Evolution (SCE-UA) method was employed to find the "best" values for the lumped model parameters. In the second stage, the spatial variability of the original model parameters was restored and a local search method (a variant of LevenbergMarquart method) was used to find a more distributed set of parameters using the results of the previous stage as starting values. A method called "regularization" was adopted to prevent the parameters from taking extreme values. In addition, we applied a nonlinear calibration-constrained method to develop confidence intervals for annual and 7-d average flow predictions. We calibrated stream flow in the Etowah River measured at Canton, GA (a watershed area of 1580 km2) for the years 1983 to 1992 and used the years 1993 to 2001 for validation. The Parameter Estimator (PEST) software was used to conduct the two-stage automatic calibration and prediction uncertainty analysis. Calibration for daily and monthly flow produced a very good fit to the measured data. Nash-Sutcliffe coefficients for daily and monthly flow over the calibration period were 0.60 and 0.86, respectively. They were 0.61 and 0.87, respectively, for the validation period. The nonlinear prediction uncertainty analysis worked well for long-term (annual) flow in that our prediction confidence intervals included or were very near to the observed flow for most years. It did not work well for short-term (7-d average) flows in that the prediction confidence intervals did not include the observed flow, especially for low and high flow conditions.
Abbreviations: AWC, available water capacity BMP, best management practice ET, evapotranspiration HRU, hydrologic response units MCMC, Markov Chain Monte Carlo PEST, Parameter Estimator SAC-SMA, Sacramento Soil Moisture Accounting SCE-UA, Shuffled Complex Evolution SCEM, Shuffled Complex Evolution Metropolis STATSGO, State Soil Geographic database SWAT, Soil and Water Assessment Tool
 |
INTRODUCTION
|
|---|
AUTOMATIC CALIBRATION has been applied to conceptual rainfallrunoff models for more than three decades, usually to lumped models. Even when a (semi)distributed model that allows spatial variability of parameters is calibrated using an automated process, the parameters of the model are often lumped over space so that the model is simplified to a lumped model (e.g., Eckhardt and Arnold, 2001; Doherty and Johnston, 2003; van Griensven and Bauwens, 2003). The reasons are twofold. First, the available data used for watershed model calibration are normally limited to rainfall, temperature (minimum and maximum), and watershed outlet streamflow and water quality, which are obviously not sufficient to identify a semidistributed model with scores (or hundreds) of model parameters. Jakeman and Hornberger (1993) applied time series analysis techniques to determine how many parameters are appropriate to describe the rainfallrunoff relationship using transfer function models. They found that, for catchments in temperate climates covering a wide range of spatial scales, only a handful of parameters (47 parameters) can be reliably estimated from rainfallrunoff data. This result coincided with suggestions made by others using conceptual or more physically based models (e.g., Beven, 1989). Second, except for a few cases (e.g., Doherty and Johnston, 2003), most of the recent rainfallrunoff model automatic calibrations were conducted using Monte Carlo based global methods, such as the Shuffled Complex Evolution developed at University of Arizona (SCE-UA). While these global methods were robust in finding the global minimum in the hyper-surface of the objective function that may be pitted with local minima with regions of attraction of varying sizes (Duan et al., 1992), the cost in terms of model runs is prohibitive when the number of the estimated model parameters exceeds a dozen. For instance, Eckhardt and Arnold (2001) reported that it took nearly 18 000 model runs to achieve convergence when they used the SCE method to automatically calibrate 18 parameters of a variant of the SWAT model (Arnold et al., 1998). Therefore, it has become a common practice among modelers to reduce the number of model parameters to be estimated at the expense of the model's spatially differentiated representation of a real watershed.
However, it is often necessary to recognize the spatial distributions of morphology, land use, and soils of the entire watershed when modeling the impacts of land cover changes and/or best management practices (BMPs) on flow and water quality of streams. For instance, land disturbances and/or BMPs far from streams may have less impact than those near the streams. In addition, the fact that more and more spatially distributed information is available encourages the use of semidistributed or distributed watershed model representations with a large number of parameters. However, this is not to say that a (semi) distributed model with a large number of parameters is necessarily favored over a lumped model with just a few parameters if sufficient observations are not available. To calibrate these (semi)distributed watershed models, it may be necessary to make more field measurements at fine-grained levels of the watershed (e.g., stream discharges and water quality parameters at subbasins outlets) and subsequently use the multiobjective optimization methods to simultaneously calibrate the (semi)distributed model at different scales (e.g., Yapo et al., 1998; Gupta et al., 1998; Madsen, 2000). For a discussion of recent developments in automatic calibration of conceptual watershed models, refer to the excellent review by Gupta et al. (2003).
Given that modelers usually only have access to a set of observations of stream discharge (and perhaps water quality parameters), neither a sampling-based global nor a gradient-based local method is suitable for such a situation. On one hand, the cost in terms of model runs required for a sampling-based method to simultaneously calibrate the large number of parameters of a (semi)distributed watershed model may be too high. On the other hand, the highly correlated relationships among parameters and the susceptibility to initial conditions of the local methods make it unlikely for a gradient-based local method to find a global optimal set of values for all the parameters. The goal of our study was to develop a two-stage procedure for automatically calibrating the semidistributed SWAT watershed model, while preserving the spatial variability of some essential model parameters. In the first stage of this proposed calibration scheme, an affordable (i.e., requiring fewer model runs) global searching scheme was employed to find the "best" values for the lumped model parameters. That is, with the aim of limiting the number of calibrated parameters, model parameters were assumed to be identical for different subbasins and hydrologic response units (HRUs, the basic calculation unit in the SWAT model). In the second stage, the spatial variability of the original model parameters was restored and the number of the calibrated parameters was dramatically increased. In this stage, a local search method was preferred to find the more distributed set of parameters using the results from the previous stage as starting values. To prevent numerical instabilities and parameters taking extreme values due to overparameterization, a strategy called "regularization" (Hensen, 2001) was adopted in the second stage, by which the distributed parameters were constrained to vary as little as possible between HRUs and subbasins. Regularization is commonly used in calibrating groundwater models (Doherty, 2003a).
This proposed two-stage automatic calibration also leads to a measure of the model prediction uncertainty. Environmental models are normally used for predicting the behavior of natural systems under situations similar to those in which the model has been calibrated using historical data. Since there are inherent uncertainties associated with the model structure, parameter values, initial conditions, and measurements, predictions of the model are inevitably fraught with uncertainty. While the influence of model structure uncertainty is difficult to assess, a number of methods are available to assess the effect of parameter uncertainty (i.e., initial conditions can be treated as model parameters) and measurement error on model prediction uncertainty (Beck, 1987; Beven, 1993; Omlin and Reichert, 1999; Qian et al., 2003; and many others). Among these methods, Monte Carlo and Markov Chain Monte Carlo (MCMC) methods have many advantages, primarily providing probability distribution functions for model predictions. Such methods include the recently developed Shuffled Complex Evolution Metropolis optimization algorithm (SCEM-UA, a modified version of the SCE-UA method) that converges to a stationary approximation of the posterior distribution of the parameter values (Vrugt et al., 2003a, 2003c, 2004). The SCEM-UA provides both an estimate of the mode of the posterior parameter distribution (the "best" parameter set) and a sample set of parameter values describing the probabilistic representation of remaining parameter uncertainty. This posterior description of parameter uncertainty is then used to produce probabilistic model forecasts (such as 95% confidence intervals at each time step). However, their main disadvantage is the high cost in terms of model runs, especially in semidistributed watershed models where there are more than just a few parameters. Linear (or first-order) methods have the advantage that they can be implemented with virtually no computational burden, but they require that the model output be linear with respect to its parameters, which is normally not the case in large complex environmental models. In our approach, a nonlinear calibration-constrained method based on the theory presented by Vecchia and Cooley (1987), which utilizes the preceding automatic calibration results, will be used to provide an estimate of the flow prediction uncertainty of the SWAT model under a variety of flow conditions.
 |
MATERIALS AND METHODS
|
|---|
Watershed Model and Study Site
The SWAT model was developed by the USDA-ARS to predict the impact of land management practices on water, sediment, and agricultural chemical yields in large basins (Arnold et al., 1998; Neitsch et al., 2002). In SWAT, a watershed is partitioned into a number of subbasins. Each subbasin is simulated as a homogeneous area in terms of climatic forcing (e.g., precipitation, air temperature), but with additional subdivisions within a subbasin to represent different combinations of soils and land uses. Each land use and soil combination is referred to as an HRU and is assumed to be spatially uniform in terms of soil, land use, topography, and climatic data. A water budget is computed for each HRU on the basis of precipitation, runoff, evapotranspiration (ET), percolation, and return flow from subsurface and groundwater flow. Subdivision of a watershed into HRUs allows the model to reflect heterogeneity of the watershed. Thus, the SWAT model is considered a semidistributed watershed model.
The Etowah River basin, located in central north Georgia just north of Atlanta, is one of the richest river systems in the world in terms of fish and mussel biodiversity (Burkhead et al., 1997). From the USGS gauging station at Canton, GA, we delineated an area we called the Upper Etowah River watershed, which covered approximately 1580 km2 (158 112 ha) (northernmost 35, southernmost 34, easternmost 84, westernmost 85). The watershed was subdivided into six subbasins (Fig. 1
) and 48 HRUs. The primary land covers were forest (89.7%), grassland or pasture (7.9%), and agriculture (1.9%); urban cover was <0.5%. The soils consisted primarily of mapping units with hydrologic group categories of B and C, having slow to moderate infiltration rates. The land use data were obtained from the National Land Cover Dataset (19911992), and the soils data were obtained from the State Soil Geographic database (STATSGO).
The Etowah River is the main tributary of Lake Allatoona, which has a total maximum daily load limit for P. Our long-range objective is to use SWAT to determine the uncertainty in nonpoint sources of P loading to Lake Allatoona and develop a framework for trading P credits between point and nonpoint sources in the Etowah River basin. In this study, the hydrologic phase of SWAT was calibrated against the daily streamflow records from the USGS gauging station (No. 02392000) at Canton, GA (Fig. 2
and also available online at http://waterdata.usgs.gov/nwis, verified 5 Dec. 2005). Rainfall data included measurements at four different weather stations within the watershed (19-yr daily precipitation at Canton, GA is displayed in Fig. 2). The minimum and maximum air temperature measurements were taken from two weather stations within and near the Upper Etowah River watershed. A summary of the weather and flow data for the Upper Etowah River watershed modeling is provided in Table 1. All these weather stations are administered by the National Climatic Data Center. The Hargreaves method (Hargreaves et al., 1985) was used to calculate potential ET.

View larger version (24K):
[in this window]
[in a new window]
|
Fig. 2. Daily precipitation (dashed line) and stream flow observations (solid dot) at Canton, GA (19832001).
|
|
Numerical Methods and Software
Two-Stage Automatic Calibration Strategy
Our automatic calibration scheme of the semidistributed SWAT model consisted of two stages. In the first stage, we simplified the SWAT model to a lumped model by eliminating the spatial variability of parameters that could not be calculated from available GIS measurements and needed to be determined through calibration. By this simplification, the dimension of the adjustable parameter space was reduced so that a global search algorithm (i.e., SCE-UA) could be used in searching for the optimal parameter sets. Although the parameter estimation problem of the semidistributed watershed model was then a tractable one after the lumping process, it is still difficult to obtain a unique calibration solution due to the complex model structure and large number of estimated parameters. Our primary goal of the first stage was not to find the unique global minimum of the objective function (if it existed), but to find a reasonable set of parameter values that would serve as starting points for the parameters to be estimated in the second stage. Therefore, we used a less stringent convergence criterion for the global searching method than would be used to find the exact minimum, dramatically reducing the number of computer runs. The SCE-UA method was described by Duan et al. (1992, 1993, 1994), and it has been successfully applied directly or indirectly to rainfallrunoff model calibration on many occasions (e.g., Yapo et al., 1996, 1998; Hogue et al., 2000, 2005; Madsen, 2000; Bastidas et al., 2001; Eckhardt and Arnold, 2001; Wang et al., 2001; van Griensven and Bauwens, 2003). In brief, the SCE-UA algorithm involves the following steps. First a sample of parameter sets is randomly generated from the feasible parameter space (usually defined by lower and upper bounds of parameter values) and partitioned into a number of complexes. Next each of the complexes is allowed to evolve in the direction of global improvement, using competitive evolution techniques that are based on the downhill simplex method. At periodic stages in the evolution, the entire set of points is shuffled and reassigned to new complexes to enable sharing of information.
In the second stage, the heterogeneous design of the SWAT model that was lost in the lumped stage was restored and a local method (a variant of the MarquardtLevenberg method) was employed to calibrate the distributed parameters. In SWAT, the fundamental calculation unit is the HRU, which includes a unique combination of land cover and soil type. In the previous lumping process, the distinction between different soil types was ignored. That is, the same curve number is assigned to each category of land cover, regardless of the underlying soil types. In the second stage, however, each HRU potentially has a different curve number, and the different characteristics of the major soil types are explicitly incorporated by using different parameters. In our approach, when a local search algorithm was used to calibrate these parameters that characterize the heterogeneity of the watershed, these spatially variable parameters were not allowed to change freely. Instead, they were only allowed to change by the limited amount that was required to lower the objective function to an acceptable level. Constraining a parameter's variability not only helped to stabilize the inverse problem but also prevented the parameters from taking unrealistic numeric values. This numerical treatment is called "regularization" (Hensen, 2001) and has been successfully applied to groundwater model calibration in conjunction with "pilot points" as a methodology for spatial hydraulic property characterization (Doherty, 2003a). Through such regularization, additional information about the parameter estimates can be incorporated into the calibration process to stabilize the inversion problem and ensure that all parameters take on physically reasonable values.
Nonlinear Calibration-Constrained Predictive Uncertainty Analysis
For a nonlinear model, a 100(1
)% joint confidence region for all model parameters (ß) can be approximately given by:
 | [1] |
where
(ß) is the objective function calculated for a parameter set ß and
(
) is the objective function calculated for the optimized (or estimated) parameter set
(i.e., the minimized objective function obtained in the automatic calibration process), p is the number of parameters to be estimated, n is the number of observations, F
(p, n p) is the F distribution with (p, n p) degrees of freedom, and s2 =
(
)/(n p) is the sum of squared errors. When the number of observations (n) is large, this approximation is usually good. For a specific model prediction, a 100(1
)% confidence interval can then be formed through minimizing or maximizing that prediction while constraining the model parameters (ß) to lie within the joint confidence region defined by Eq. [1]. Readers are referred to Vecchia and Cooley (1987) for theoretical details, Doherty (2004) for development of the software, and Doherty and Johnston (2003) for its application.
When applying this nonlinear calibration-constrained predictive uncertainty analysis method, we first specified a prediction of interest (e.g., an annual flow volume or a 7-d average flow rate) whose uncertainty required exploration. Then we found a parameter set that maximized or minimized that prediction while still maintaining the model in a calibrated state. The calibrated state was defined by an upper objective function limit, 
(ß), a number that was slightly larger than the minimum objective function,
(
), obtained in the model calibration stage. For example, to construct a 95% confidence interval for a 7-d average flow prediction, the calibrated state was calculated using Eq. [2], which resulted from manipulating Eq. [1]:
 | [2] |
Care should be exercised in applying Eq. [2] too rigorously because its validity is based on the assumption that the n observations used for model calibration are independent from each other, which is rarely the case in environmental modeling. Hence, a more qualitative assessment of 
(ß) may be warranted, perhaps based on visual inspection of model-to-measurement misfit (Doherty, 2004). Also note that it is the maximum and the minimum values of the specified model prediction instead of the parameter sets generating these extreme values; that is our major interest. Therefore, the prediction confidence interval limited by the maximum and minimum values represents the prediction uncertainty of the watershed model on the specified prediction. Compared with the Monte Carlo and the MCMC methods, the nonlinear calibration-constrained method requires much fewer model runs to give an estimated confidence interval for a model prediction. But unlike the Monte Carlo and the MCMC methods, the nonlinear calibration-constrained method is unable to provide a corresponding probability or cumulative probability distribution function for the model prediction.
PEST Software
The automatic model calibration and predictive uncertainty analysis were undertaken using PEST (Doherty, 2004) in conjunction with a suite of utility software written to support the use of PEST in the surface water modeling context (Doherty, 2003b). PEST, a model independent parameter estimator, implements a particularly robust variant of the MarquardtLevenberg method for parameter estimation and uncertainty analysis. When PEST is used to estimate a great number of parameters contained in a semidistributed model like SWAT, the regularization mode can be invoked to avoid parameters taking extreme values.
 |
RESULTS AND DISCUSSION
|
|---|
Model Calibration
A weighted multiobjective function similar to that used in Doherty and Johnston (2003) was adopted in this research. This objective function, which the automatic calibration method minimized, combined daily flow with monthly flow volume and exceedance times, as shown in Eq. [3].
 | [3] |
where
(ß) was the total objective function; M, O, w represented model outputs, observations, and weights, respectively; subscripts "D", "M", and "E" denote daily flow rate, monthly flow volume, and exceedance times, respectively. Thus, n1 and n2 were the number of days and months within the calibration period and n3 was the number of flow categories. wD,i is defined in Eq. [4] and ensured that high flows did not dominate the parameter estimation process simply because of their large numerical values
 | [4] |
where c, wM, and wE were chosen such that daily flow terms in Eq. [3] were not allowed to either dominate the objective function or be dominated by other terms.
In selecting the calibration period, our goal was to choose a dataset representative of the various hydrologic phenomena experienced by the watershed. For example, Sorooshian et al. (1983) pointed out that it is important that the calibration data contain a wide variety of hydrologic behaviors from dry to wet conditions. Research by Yapo et al. (1996) indicated that approximately 8 yr of data, specifically including some of the wettest years, were adequate to ensure a quality calibration of the Sacramento Soil Moisture Accounting (SAC-SMA) model. This is also consistent with the experience of National Weather Service hydrologists that approximately 11 yr of data should be used to calibrate the SAC-SMA model (Hogue et al., 2000; Gupta et al., 2003). Therefore, the selected calibration period for this research includes 10 yr, from January 1983 to December 1992, in which, the spring of 1990 was the wettest season and the years of 1985 and 1986 were the driest years during the study. The remaining 9 yr of flow observations (Fig. 2) from January 1993 to December 2001 were then selected as a validation period.
Like many other watershed models, SWAT is based on conceptual representations of the physical processes that govern the flow of water through the watershed. It typically has two types of parameters, "physical" parameter and "process" parameters (Sorooshian and Gupta, 1995). The physical parameters represent the physically measurable properties of the watershed. For example, the areas of the watershed, subwatersheds, and HRUs, as well as the length of overland flow paths and average slopes within each HRU. The process parameters, such as those listed in Table 2, represent watershed properties that are not directly measurable at the scale of application and need to be determined through calibration. Therefore, in this research, only the process parameters listed in Table 2 were subject to optimization through automatic calibration. The physical parameters were calculated in accordance with watershed geography using ArcView GIS tools (ESRI, Redlands, CA). Preceding the automatic model calibration, the spatially variable parameters in SWAT to be estimated were lumped so that the entire watershed was treated as a homogenous unit, except that four different land coversforest, grassland or pasture, agriculture, and urban (in which we have special interests for development of a nutrient trading framework)were accounted for separately. The STATSGO database provided the available water capacity (AWC) for each soil layer of each mapping unit. We introduced a new parameter
AWC to describe the change of the AWC (increase or decrease) applied to each soil layer during calibration. Therefore, the calibrated AWC in each layer of each soil type was the sum of the original AWC and
AWC. It was assumed, in the first calibration stage, that
AWC was identical across all soil layers and soil mapping units. Sixteen parameters (listed in Table 2) governing the rainfallrunoff process were estimated in the first stage of the automatic calibration process. Note that the parameters governing the snowmelt process were not included. Most of the lower and upper bounds of variation were chosen from tables in the SWAT User's Manual (Neitsch et al., 2002). For parameters for which SWAT does not explicitly give ranges, we set their boundary values arbitrarily to encompass the default values given by SWAT (e.g., SURLAG and GW_DELAY).
View this table:
[in this window]
[in a new window]
|
Table 2. SWAT parameters, descriptions, default values, first stage optimized values, and lower and upper bounds.
|
|
First Stage Calibration
In the first stage of the automatic calibration process, the SCE-UA global method was used to find a suitable set of the initial values for the spatially variable parameters to be refined in the second stage. Most of the control parameters of the SCE-UA optimization listed in Table 3 were suggested by Duan et al. (1994), except that the required percentage for the method's converging criterion was deliberately set higher (2%). The normal (suggested) value for this control parameter is 1%. By increasing the "required percentage" from 1 to 2%, the number of model runs of the autocalibration routine was substantially reduced. The fourth column of Table 2 lists a typical parameter set obtained from the SCE-UA optimization in the first stage of the routine. Most of the parameter values are different from the default values provided by the SWAT manual (the third column of Table 2). Our first stage SCE-UA optimization was able to reduce the objective function value by about 80%. It should be noted that different runs of the SCE-UA optimization using different random number generating seeds may generate different "optimal" parameter sets. This was not caused by the loose convergence criterion. Even when a more strict convergence criterion (1%) was used, the SCE-UA optimizations using different random number generating seeds still gave different "optimal" parameter values. Vrugt et al. (2003b) argued that the convergence problems of the SCE-UA algorithm might be caused by the large number of interacting parameters in models and the highly complex, nonconvex shape of the response surface mapped out in the parameter space. Another possible reason why the SCE-UA algorithm failed to converge to a single optimal solution may be attributed to inappropriate setting of the algorithm parameters.
Second Stage Calibration
In the second stage, a local gradient-based method combined with regularization was used with the "optimal" parameter values obtained in the first stage serving as initial values for the estimated parameters. In this stage, the number of the parameters to be estimated was increased from 16 to 69 to take into account the impact of different land use and soil combinations in the watershed on stream discharge. We included three major types of soil in the Upper Etowah River basin. They were the dominant (in terms of area) soil series in the STATSGO mapping units GA015, GA025 and GA028, respectively. At this stage, instead of assigning the same curve number (CN2) for a given type of land cover to the entire watershed, we specified a curve number for each HRU, which was a unique combination of land use and soil within each subbasin. Furthermore, a different soil evaporation compensation factor (ESCO) and change of available water capacity (
AWC) were given to these three different soil types. The variations of the estimated curve numbers, ESCO, and
AWC across the different HRUs in the watershed are displayed in Table 4. The parameters' overall mean values (Column 5 in Table 4) obtained after the second stage were very similar to the corresponding values from the first stage (Column 4 in Table 2), but the parameters took different values across soil types. As noted earlier, forest is the most common land cover in the Etowah watershed (89.7% of the total area). Although the curve number for forest land cover (CN2_FST) varied over a substantial range (see the minimum and maximum values), there was little difference among the mean values for soil mapping units. However, mapping unit averages for
AWC were substantially different, with the highest value occurring in GA025. An increase in AWC favors greater infiltration in SWAT through its effect on curve number antecedent moisture condition. Soils within mapping unit GA025 are all hydrologic group B ("moderate" infiltration rate according to the National Soil Survey Handbook; USDA-NRCS, 2003), whereas soils within mapping units GA015 and GA028 are hydrologic groups B and C ("slow" infiltration rate). As such, one might expect greater infiltration in GA025 compared with the other mapping units. There was no consistent pattern among other parameters. It is also noted that the parameters in some of the HRU's took boundary values.
Graphical comparisons between modeled and observed daily stream discharge through part of the calibration period, between modeled and observed monthly volumes for the entirety of the calibration period, and between modeled and observed exceedance fractions pertaining to the whole of the calibration period are shown in Fig. 3a
, 3b, and 3c. Note that the restriction of graphed flows in Fig. 3a to only a part of the calibration period is done for the sake of clarity. Graphs over the remainder of the calibration period were similar. Note also that the flow axis is logarithmic in Fig. 3a to afford a better comparison between model outputs and field measurements under both high and low flow conditions. Overall, the model predictions fit the observed data rather well. There was a tendency for SWAT to underpredict peak flow, especially at the daily time step.

View larger version (15K):
[in this window]
[in a new window]
|
Fig. 3. Observed and modeled stream discharge at Canton, GA during the calibration period (19831992): (a) 1-yr daily flow (1987), (b) monthly flow volume, and (c) exceedance fractions.
|
|
Model Validation
Flow data recorded at the same gauge station (No. 2392000) used for calibration were used for the validation period from 1993 to 2001. Figure 4a
shows a comparison between observed and model generated daily stream discharge during part of the validation period. As indicated above, the fitting of the remainder of the validation period was similar. Observed and model generated monthly volumes and exceedance fractions pertaining to the whole of the validation period are shown in Fig. 4b and 4c. The model fit during the validation period was similar to that found during the calibration period.

View larger version (16K):
[in this window]
[in a new window]
|
Fig. 4. Observed and modeled stream discharge at Canton, GA during the validation period (19932001): (a) 1-r daily flow (1995), (b) monthly flow volume, and (c) exceedance fractions.
|
|
The calibration and validation results obtained from our proposed two-stage automatic calibration procedure were also shown in Table 5. Besides values of the objective function, the NashSutcliffe Coefficient (E) and Index of Agreement (d) (Legates and McCabe, 1999) were used to evaluate the closeness of fit between modeled and observed time series. E and d are defined as follows:
 | [5] |
 | [6] |
where Pi values are model-simulated data, Oi values are observed data,
is the mean of observations, and N is the total number of observations.
It should be remembered that the "optimal" parameter set obtained through our two-stage automatic calibration routine is not unique. Since semidistributed watershed models are highly overparameterized, many sets of parameters can lead to a similarly good agreement of model-generated results with measurements. Besides overparameterization, uncertainty analysis should be also included as integral part of hydrologic modeling, due to inherent uncertainty/errors in input, output, model structure and parameters. For these reasons, it is important to proceed to an uncertainty analysis for the model prediction rather than just stop at the model parameter estimation stage.
Predictive Uncertainty Analysis
The previous calibration process found that the minimum objective function value was 1450 [i.e.,
(
) = 1450]. If we want to construct a 95% confidence interval over a specific SWAT model flow prediction, the upper objective function limit that defines the calibrated state should be around 1500 [i.e.,
0.05(ß)], according to Eq. [2]. In other words, as long as the objective function value is <1500, the corresponding parameter set that generates this objective function value is considered statistically allowable. Therefore, the minimum and maximum values of the SWAT prediction of interest construct the lower and upper bounds of the confidence interval that represents the uncertainty associated with the particular SWAT prediction. We used this approach to predictive uncertainty analysis for SWAT long-term (annual) and short-term (7-d average) stream discharge predictions. Table 6 shows the 95% confidence intervals for SWAT predictions, along with the observed stream discharge at Canton, GA for the validation period. The 95% confidence intervals for annual flow volume predictions are listed in the left panel and those for 7-d average flow predictions in the right panel. The prediction intervals for annual flow were surprisingly narrow, within 15% of the observed value. For 7-d average flow, the prediction intervals were wider (within 48% of the observed flow), as might be expected. Most of the observed annual flow volumes fell inside of the prediction confidence intervals, while most of the observed 7-d average flows fell outside of the intervals. Figure 5
displays flow observations (daily flow, monthly volumes and exceedance fractions) along with all the model predictions from parameters sets that resulted in a calibrated state. From Fig. 5a through 5c, the observed quantities are indicated by solid dots, while each of the thin colored solid lines represents the model-generated quantity that was generated by the parameter set, which, in turn, produced one of the lower or upper bounds (minimum or maximum) of the confidence intervals in Table 6. It is obvious from Fig. 5 that each of those parameter sets has calibrated the model and the resulting model-generated time series or exceedance fractions are equivalently close to the observed values.

View larger version (15K):
[in this window]
[in a new window]
|
Fig. 5. Observations (solid dot) and model-generated stream flows from predictive uncertainty analysis (thin solid lines) during calibration period (a) 1-yr daily flow (1987), (b) monthly flow volume, (c) exceedance fractions.
|
|
Returning to Table 6, the 3 yr whose annual flow volumes were not included in the confidence intervals were either the wettest years (1996 and 1998; the observed annual flow volume in 1998 coincides with the lower bound of the corresponding confidence interval) or driest years (1999 and 2000) in Georgia during the validation period. This implies that SWAT is less reliable in predicting extreme wet or dry years than normal years, which is not surprising. Table 6 shows the three 7-d average flow conditions (low, medium and high) across 3 yr of different weather conditions (1993 as a normal year, 1996 as a wet year and 1999 or 2000 as dry years) that were examined. The low flow condition was defined as a flow below the 7Q2 flow rate (7-d average flow rate with 2-yr returning period, that is, 9.543 m3 s1 calculated from the historical flow data at Canton, GA). For the high flow condition, we used a flow above 80.00 m3 s1. For the medium flow condition, we used flow between 9.543 and 80.00 m3 s1. For each year (1993, 1996, 1999, or 2000), each 7-d period was classified as low, medium, or high flow. From this set, we randomly chose three 7-d intervals for analysis (Table 6). The observed 7-d average low or medium flow usually exceeded the lower bound of the prediction confidence interval, while observed 7-d average high flow exceeded the upper bound. It's clear from Table 6 that SWAT predictions are reasonably accurate for long-term (annual) average flows, but less accurate for short-term (7-d average) flows.
Another reason why the prediction confidence interval did not encompass the observed flow under certain condition is that the application of the nonlinear calibration-constrained method to the determination of model prediction uncertainty is not without problems. Most of the problems are associated with the determination of the upper objective function limit, 
(ß), which is only approximated by Eq. [1] or Eq. [2] for several reasons. First, for a nonlinear model, the confidence regions based on Eq. [1] are not ellipsoids; therefore, there is a slight approximation involved in selection of the F statistic, which is mitigated by using a large sample size (n). Second, even when n is large (such as in our case), the residuals of flows are normally both serially correlated and heteroscedastic unless an appropriate error model can be established, which is evidently extremely difficult (Doherty and Gallagher, 2004, unpublished data). The failure to account for correlation of residuals may result in underestimation of the width of the confidence interval because an erroneously small sum of squared error (s2) has been used in determining the upper objective function limit, 
(ß), in Eq. [2]. Perhaps this has been the case in our study because many of the uncertainty intervals calculated for the flow predictions do not include the actual flow observations, especially for the daily flow predictions (see Table 6). Third, to the extent that determination of the global objective function minima is not assured during automatic model calibration, it is also problematic in the uncertainty assessment in that 
(ß) is used to calculate the minimum and maximum of the confidence interval. Another significant problem is that, like all single-objective function methods, the current uncertainty analysis method did not account for the model structural errors.
In addition to the above-mentioned difficulties pertaining to selection of the upper objective function limit, the uncertainty assessment of model prediction is also susceptible to the definition of the single-valued objective function (Doherty and Gallagher, 2004, unpublished data). In the definition of the objective function of the current study, we adopted a weighting strategy defined by Eq. [4] to ensure that high flows would not dominate the parameter estimation process because of their large numerical values. It is possible then that the model parameters favored better matches in low flow conditions than in high flow conditions, which is immediately apparent from Fig. 3a and 4a. Therefore, the calibrated model may have been more reliable in making low flow rather than high flow predictions. This may explain why the prediction confidence intervals of low and medium flow scenarios are closer to the observations while those of high flow scenarios are far from their corresponding observations (see the right panel of Table 6). In spite of the difficulties of applying the nonlinear calibration constrained for uncertainty assessment, the method provided us with some important indications of SWAT model prediction uncertainty (summarized in Table 6) at minimal computational cost and with no assumption of model linearity.
It also should be mentioned that these problem can be avoided if a multicriteria optimization method is adopted to explicitly recognize the multiobjective nature of the calibration problem. Because of the presence of structural inadequacies in the hydrologic models, any single (scalar) objective function, no matter how carefully chosen, is inadequate to properly measure all of the characteristics of the observed data deemed to be important (Vrugt et al., 2003b). Recently, in the context of hydrologic modeling, the use of a multiobjective function composed of a number of different criteria describing different aspects of the fit between model outputs and field data is now commonplace (Gupta et al., 2003, 1998, 1999; Yapo et al., 1998; Boyle et al., 2000; Madsen, 2000; Vrugt et al., 2003b). The result of a watershed model calibration using multicriteria optimization method is a discrete collection of possible parameter sets (so-called trade-off set, nondominated set, or Pareto set) that represent trade-offs between different optimal ways of constraining the model to be consistent with the observed data. The Pareto parameter set, which explores the parameter uncertainty (or parameter nonuniqueness), can also be used to generate a Pareto ensemble of model simulations to represent model prediction uncertainty.
 |
CONCLUSIONS
|
|---|
We employed a two-stage strategy, designed to exploit the merits of both global and local optimization methods while avoiding their faults, to automatically calibrate the SWAT watershed model. In addition, a nonlinear calibration-constrained method based on Vecchia and Cooley's (1987) theory was applied to analyze uncertainty of flow predictions. Since the software used for the computation is available to modelers at no cost and independent of the model, it may be extended to the automatic calibration of other (semi)distributed watershed and water quality models. The two-stage automatic model calibration not only produced a very good fit of the model against observations, but also preserved the heterogeneity of the watershed, which is essential in the analysis of the impact of the land uses and soils to water quality in streams. Although regularization was used to prevent parameters from taking extreme (and perhaps meaningless) values, nonuniqueness of parameter sets remains an issue for physically based semidistributed models such as SWAT. Like most calibration procedures for these types of models, this two-stage calibration scheme must deal with difficulties such as local minima, valleys, and plateaus arising from the use of threshold parameters, from intercorrelation between parameters, from insensitive parameters, and from autocorrelation and heteroscedasticity in the residuals. A comprehensive uncertainty analysis is all the more useful to assess the relative reliability of model predictions. The results of our uncertainty analysis indicated that SWAT was more reliable in long-term (annual) flow prediction than short-term (7-d average) flow prediction.
 |
ACKNOWLEDGMENTS
|
|---|
The research was sponsored by a USDA CSREES Water Quality grant GEO-2003-04944 entitled "A Framework for Trading Phosphorus Credits in the Lake Allatoona Watershed." In addition, we acknowledge the kind assistance of John Doherty in guiding our use of PEST. The authors also thank the three anonymous reviewers for their invaluable comments and suggestions to make this paper more balanced and accessible to readers.
 |
REFERENCES
|
|---|
- Arnold, J.G., S. Srinivasan, R.S. Muttiah, and J.R. Williams. 1998. Large area hydrologic modeling and assessment. Part I. Model development. J. Am. Water Resour. Assoc. 34(1):7389 (JAWRA).
- Bastidas, L.A., H.V. Gupta, and S. Sorooshian. 2001. Bounding the parameters of land-surface schemes using observational data. p. 6576. In V. Lakshmi et al. (ed.) Land surface hydrology, meteorology, and climate: Observations and modeling. AGU, Washington, DC.
- Beck, M.B. 1987. Water quality modeling: A review of the analysis of uncertainty. Water Resour. Res. 23:13931442.
- Beven, K. 1989. Changing ideas in hydrologyThe case of physically-based models. J. Hydrol. (Amsterdam) 105:157172.
- Beven, K. 1993. Prophecy, reality and uncertainty in distributed hydrological modeling. Adv. Water Resour. 16:1451.
- Boyle, D.P., H.V. Gupta, and S. Sorooshian. 2000. Toward improved calibration of hydrologic models: Combining the strengths of manual and automatic methods. Water Resour. Res. 36:36633674.[CrossRef]
- Burkhead, N.M., S.J. Walsh, B.J. Freeman, and J.D. Williams. 1997. Status and restoration of the Etowah River, and imperiled southern Appalachian ecosystem. In G.W. Benz and D.E. Collins (ed.) Aquatic fauna in peril: The Southeastern perspective. Southeast Aquatic Research Institute, Decatur, GA.
- Doherty, J. 2003a. Ground water model calibration using pilot points and regularization. Ground Water 41:170177.[Medline]
- Doherty, J., 2003b. PEST surface water utilities user's manual. Watermark Numerical Computing, Brisbane, Australia and University of Idaho, Idaho Falls.
- Doherty, J. 2004. PESTModel-independent parameter estimation user's manual. 5th ed. Watermark Numerical Computing, Brisbane, Australia.
- Doherty, J., and J.M. Johnston. 2003. Methodologies for calibration and predictive analysis of a watershed model. J. Am. Water Resour. Assoc. 29(2):251265.
- Duan, Q.Y., V.K. Gupta, and S. Sorooshian. 1993. Shuffled complex evolution approach for effective and efficient global minimization. J. Optimiz. Theory Appl. 76:501521.[CrossRef]
- Duan, Q.Y., S. Sorooshian, and V. Gupta. 1992. Effective and efficient global optimization for conceptual rainfall runoff models. Water Resour. Res. 28:10151031.[CrossRef]
- Duan, Q.Y., S. Sorooshian, and V.K. Gupta. 1994. Optimal use of the SCE-UA global optimization method for calibrating watershed models. J. Hydrol. (Amsterdam) 158:265284.
- Eckhardt, E., and J.G. Arnold. 2001. Automatic calibration of a distributed catchment model (Technical note). J. Hydrol. (Amsterdam) 251:103109.
- Gupta, H.V., L.A. Bastidas, S. Sorooshian, W.J. Shuttleworth, and Z.L. Yang. 1999. Parameter estimation of a land surface scheme using multicriteria methods. J. Geophys. Res. 104(D16):1949119503.[CrossRef]
- Gupta, H.V., S. Sorooshian, T.S. Hogue, and D.P. Boyle. 2003. Advances in automatic calibration of watershed models. p. 928. In Q.Y. Duan et al. (ed.) Calibration of watershed models. AGU, Washington, DC.
- Gupta, H.V., S. Sorooshian, and P.O. Yapo. 1998. Toward improved calibration of hydrologic models: Multiple and noncommensurable measures of information. Water Resour. Res. 34:751763.[CrossRef]
- Hargreaves, G.L., G.H. Hargreaves, and J.P. Riley. 1985. Agricultural benefits for Senegal River Basin. J. Irrig. Drain. Eng. 111:113124.
- Hensen, P.C. 2001. Regularization tools: A Matlab package for analysis and solution of discrete ill-posed problems. Department of Mathematical Modelling, Technical University of Denmark, Lyngby.
- Hogue, T.S., L. Bastidas, H. Gupta, S. Sorooshian, K. Mitchell, and W. Emmerich. 2005. Evaluation and transferability of the Noah Land Surface Model in semiarid environments. J. Hydrometeorol. 6:6884.
- Hogue, T.S., S. Sorooshian, H. Gupta, A. Holz, and D. Braatz. 2000. A multistep automatic calibration scheme for river forecasting models. J. Hydrometeorol. 1:524542.[CrossRef]
- Jakeman, A.J., and G.M. Hornberger. 1993. How much complexity is warranted in a rainfall-runoff model? Water Resour. Res. 29:26372649.[CrossRef]
- Legates, D.R., and G.J. McCabe, Jr. 1999. Evaluating the use of "goodness-of-fit" measures in hydrologic and hydroclimatic model validation. Water Resour. Res. 35:233241.
- Madsen, H. 2000. Automatic calibration of a conceptual rainfall-runoff model using multiple objectives. J. Hydrol. (Amsterdam) 235:276288.
- Neitsch, S.L., J.G. Arnold, J.R. Kiniry, R. Srinivasan, and J.R. Williams. 2002. Soil and water assessment tool user's manual. Version 2000. Texas Water Resources Institute, College Station.
- Omlin, M., and P. Reichert. 1999. A comparison of techniques for the estimation of model prediction uncertainty. Ecol. Modell. 115:4559.[CrossRef]
- Qian, S.S., C.A. Stow, and M.E. Borsuk. 2003. On Monte Carlo methods for Bayesian inference. Ecol. Modell. 159:269277.[CrossRef]
- Sorooshian, S., and V.K. Gupta. 1995. Model calibration. In Vijay P. Singh (ed.) Computer models of watershed hydrology. Water Resources Publications, Highlands Ranch, CO.
- Sorooshian, S., V.K. Gupta, and J.L. Fulton. 1983. Evaluation of maximum likelihood parameter estimation techniques for conceptual rainfall-runoff models: Influence of calibration data, variability and length on model credibility. Water Resour. Res. 19:251259.
- USDA-NRCS. 2003. National soil survey handbook. Title 430-VI [Online]. Available at http://soils.usda.gov/technical/handbook/ (verified 5 Dec. 2005).
- van Griensven, A., and W. Bauwens. 2003. Multiobjective autoclibration for semidistributed water quality models. Water Resour. Res. 39(12):1348. doi:10.1029/2003WR002284.[CrossRef]
- Vecchia, A.V., and R.L. Cooley. 1987. Simultaneous confidence and prediction intervals for nonlinear regression models with application to a groundwater flow model. Water Resour. Res. 23:12371250.
- Vrugt, J.A., W. Bouten, H.V. Gupta, and J.W. Hopmans. 2003a. Toward improved identifiability of soil hydraulic parameters: On the selection of a suitable parametric model. Available at www.vadosezonejournal.org. Vadose Zone J. 2:98113.[Abstract/Free Full Text]
- Vrugt, J.A., H.V. Gupta, L.A. Bastidas, W. Bouten, and S. Sorooshian. 2003b. Effective and efficient algorithm for multiobjective optimization of hydrologic models. Water Resour. Res. 39(8):1214. doi:10.1029/2002WR001746.[CrossRef]
- Vrugt, J.A., H.V. Gupta, W. Bouten, and S. Sorooshian. 2003c. A Shuffled Complex Evolution Metropolis algorithm for optimization and uncertainty assessment of hydrologic model parameters. Water Resour. Res. 39(8):1201. doi:10.1029/2002WR001642.[CrossRef]
- Vrugt, J.A., G. Schoups, J.W. Hopmans, C. Young, W.W. Wallender, T. Harter, and W. Bouten. 2004. Inverse modeling of large-scale spatially distributed vadose zone properties using global optimization. Water Resour. Res. 40:W06503. doi:10.1029/2003WR002706.[CrossRef]
- Wang, J., Z. Lu, and H. Habu. 2001. The SCE-UA to solution of constrained nonlinear problem. J. Hohai Univ. 29(3):4650 (Natural Sciences).
- Yapo, P.O., H.V. Gupta, and S. Sorooshian. 1996. Calibration of conceptual rainfall-runoff models: Sensitivity to calibration data. J. Hydrol. (Amsterdam) 181:2348.
- Yapo, P.O., H.V. Gupta, and S. Sorooshian. 1998. Multi-objective global optimization for hydrologic models. J. Hydrol. (Amsterdam) 204:8397.
This article has been cited by other articles:

|
 |

|
 |
 
T. Wohling, J. A. Vrugt, and G. F. Barkle
Comparison of Three Multiobjective Optimization Algorithms for Inverse Modeling of Vadose Zone Hydraulic Properties
Soil Sci. Soc. Am. J.,
January 25, 2008;
72(2):
305 - 319.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
D. L. Corwin, J. Hopmans, and G. H. de Rooij
From Field- to Landscape-Scale Vadose Zone Processes: Scale Issues, Modeling, and Monitoring
Vadose Zone J.,
March 8, 2006;
5(1):
129 - 139.
[Abstract]
[Full Text]
[PDF]
|
 |
|