VZJ Journal of Natural Resources and Life Sciences Education
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Published online 17 May 2007
Published in Vadose Zone J 6:423-431 (2007)
DOI: 10.2136/vzj2006.0131
© 2007 Soil Science Society of America
677 S. Segoe Rd., Madison, WI 53711 USA
This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Agyare, W. A.
Right arrow Articles by Vlek, P. L. G.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Agyare, W. A.
Right arrow Articles by Vlek, P. L. G.
GeoRef
Right arrow GeoRef Citation
Agricola
Right arrow Articles by Agyare, W. A.
Right arrow Articles by Vlek, P. L. G.
Related Collections
Right arrow Hydraulic Conductivity
Right arrow Pedotransfer Functions
Right arrow Soil Physics

ORIGINAL RESEARCH

Artificial Neural Network Estimation of Saturated Hydraulic Conductivity

W. A. Agyarea,*, S. J. Parkb and P. L. G. Vlekc

a Savanna Agricultural Research Inst. (SARI), CSIR, P.O. Box 52, Tamale, Ghana
b Dep. of Geography, Seoul National Univ., Shilim-Dong, Kwanak-Gu, Seoul, Korea
c Center for Development Research (ZEF), Univ. of Bonn, Walter-Flex-Str. 3, 53113 Bonn, Germany

* Corresponding author (wagyare{at}yahoo.co.uk).

All rights reserved. No part of this periodical may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher.


Received 11 September 2006.



    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 Principles of Artificial Neural...
 Materials and Methods
 Results and Discussion
 Conclusions
 REFERENCES
 
Soil data serve as an important initialization parameter for hydro-ecological and climatological modeling of water and chemical movement, heat transfer, or land-use change. Most soil hydraulic properties are difficult to measure and therefore have to be estimated in most cases. Efficient methods for estimating soil hydraulic properties are lacking for tropical soils. This study examines and uses easy-to-measure soil properties together with terrain attributes in artificial neural networks (ANNs) to estimate saturated hydraulic conductivity (Ks), one of the key soil hydraulic properties for two pilot sites in the Volta basin of Ghana. It was observed that good data distribution, range, and amounts are prerequisites for good ANN estimation and, therefore, data preprocessing is important for ANN. With adequate and sensitive data, ANN can be used to estimate Ks, using soil properties such as sand, silt, and clay content, bulk density, and organic carbon. Although the inclusion of terrain parameters can improve the estimation of Ks using ANN, they cannot be relied on as the sole input parameters as they yield poor results for the scale considered in this study. The source of training data was found to significantly influence the topsoil Ks, but the subsoil was not sensitive to training data source.

Abbreviations: ANN, artificial neural network • CEC, cation exchange capacity • LS factor, length–slope factor • MSE, mean square error • NMSE, normalized mean square error • PE, processing element • PTF, pedotransfer function.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 Principles of Artificial Neural...
 Materials and Methods
 Results and Discussion
 Conclusions
 REFERENCES
 
Artificial neural networks have become a common tool for modeling complex "input–output" dependencies. In the past, neural network models have been used as a special class of pedotransfer functions (PTFs) using feed-forward back propagation or radial basis functions to approximate any continuous (nonlinear) function (Hecht-Nielsen, 1990; Pachepsky et al., 1996; Schaap and Bouten, 1996; Minasny and McBratney, 2002; Pachepsky and Schaap, 2004).

Studies in which ANNs have been examined from a statistical perspective indicate that ANN models with certain geometries, connectivities, and internal parameters are either equivalent or close to existing statistical models (White, 1989; Cheng and Titterington, 1994; Hill et al., 1994; Sarle, 1994). Moreover, they are flexible and simple; by altering the transfer function or architecture, one can vary model complexity. They can also be easily extended from univariate to multivariate cases incorporating nonlinearities effortlessly. In neural networks, the concern is primarily one of estimation or prediction accuracy and methods that work, whereas the main objective of statisticians is to develop a universal methodology and to achieve statistical optimality (Breiman, 1994; Tibshirani, 1994).

Saturated hydraulic conductivity, among other soil hydraulic properties, is important for initializing climate and hydrologic models. However, measuring Ks is time consuming and expensive. As a result of the high variability associated with soil hydraulic properties (Warrick and Nielson, 1980; Wilding, 1984), most work performed in the past has been limited to the use of empirical and physical relationships referred to as pedotransfer functions (Bouma and van Lanen, 1987; Bouma, 1989; Rawls et al., 1992) and in recent times, ANN (Pachepsky et al., 1996; Schaap and Bouten, 1996; Schaap et al., 1999; Minasny and McBratney, 2002).

Terrain plays a fundamental role in modulating the earth surface and atmospheric processes because the landform configuration frequently governs the movement of materials and water on the landscape (Burt and Butcher, 1986; Moore et al., 1993; Gessler et al., 1995; Western and Blöschl, 1999; Park and Vlek, 2002; Romano and Chirico, 2004) and, consequently, the catchment hydrology at the topo-scale (Wilson and Gallant, 2000). The use of terrain attributes for modeling Ks may serve as a suitable alternative, as terrain data are fairly easy to collect compared to intensive soil sampling. The inclusion of terrain data influence on PTFs for estimating soil water content, van Genuchten hydraulic parameters, has recently been studied (Romano and Palladino, 2002; Sharma et al., 2006) with mixed effect. The terrain parameters have low to appreciable influence in producing unbiased estimation, which depends on the model and the parameter of interest. In the numerous soil–landscape studies performed in the past, the relationship between terrain attributes and Ks has not been adequately addressed. The important question is: Will the inclusion of terrain attributes in estimating Ks also help improve ANN model performance?

Some of the setbacks in the use of ANN are the issue of data form or distribution, sensitivity, and amount required to make a good estimate of the parameter of concern. Also, most ANNs in the past have relied solely on soil physical properties—such as sand, silt, and clay content and bulk density—for estimating Ks. It is noteworthy that many of the studies on PTFs are based on soil data from the temperate region because data are lacking from the tropics. These PTFs are inadequate for predicting hydraulic characteristics of tropical soils due to differences in physical and chemical properties (Tomasella and Hodnett, 2004). In view of these, a three-set objective is proposed to investigate the use of ANN for estimating Ks in the Volta basin of Ghana: (i) to identify the data form or distribution, sensitive parameters among soil and terrain parameters, and possible data size that may be needed for estimating Ks, (ii) to model Ks using soil and/or terrain parameters, and (iii) to estimate Ks for sites different from those of the training data.


    Principles of Artificial Neural Network
 TOP
 ABSTRACT
 INTRODUCTION
 Principles of Artificial Neural...
 Materials and Methods
 Results and Discussion
 Conclusions
 REFERENCES
 
An ANN consists of many interconnected simple computational elements called nodes or neurons. A neuron has multiple inputs and a single output, and within a neuron, each input is weighted and combined (also suitably biased) to produce a single value (Maier and Dandy, 2001). The input, z, is operated on by an activation or transfer function f to produce an output. Equation [1] is an example of a sigmoidal activation function such as used in this study:

Formula 1[1]
In ANN, one operates with a training dataset, and the performance is evaluated on an independent testing dataset. Training of neural network entails (i) calculating output sets from the input sets, (ii) comparing measured and estimated outputs, and (iii) adjusting the weights and bias in the transfer function for each neuron to decrease the difference between measured and estimated values. The mean square sum of the difference between measured and estimated values serves as a measure of the goodness-of-fit.

In general, learning rate, momentum factors, network architecture, stopping criterion, and transfer function in a feed-forward ANN affect the development and results of training with a certain dataset. Artificial neural network learning extracts the required information from the input data with the help of the output. This is done through the selection of initial weights, learning rates, search algorithms, and stop criteria. The learning is a stochastic process that depends not only on the learning parameters but also on the initial conditions (data quality). The momentum learning used in this study utilizes a memory term (i.e., the past increment to the weight) that speeds up and stabilizes convergence. The weights in this method are changed proportionally, based on how much they were updated in the previous iteration. For further reading on different types of algorithms, see Golden (1996), Principe et al. (2000), and Yu and Chen (1997).

How and when to stop the learning machine after it has learned the task is one of the major dilemmas in ANN. The selection of an acceptable output mean square error (MSE) level is useful in stopping the training. However, it does not address the problem of generalization performance on data that do not belong to the training set. One approach to solving this problem is to stop the training at the point of maximum generalization and avoid overtraining (Vapnik, 1995) using early stopping, or stopping with cross-validation (smallest error in the validation set).

The size of the training set directly influences the performance of any regression that is trained nonparametrically (e.g., neural networks). This class of learning machines requires much data for appropriate training because there are no a priori assumptions about the data. Artificial neural networks belong to the class of data-driven modeling approaches that can determine which model inputs are critical, so there is no need for a priori rationalization about relationships between variables (Lachtermacher and Fuller, 1994). In situations with a large number of input variables, sensitivity analysis can be used to determine the relative significance of each input variable in the trained network (Maier, 1995; Faraway and Chatfield, 1998).

There are two basic approaches that deal with the size of a learning machine (Hertz et al., 1991). By heuristics, we either start with a small machine or large machine and increase (growing method) or decrease (pruning method) its size, respectively.

It is generally accepted that ANNs cannot extrapolate beyond the range of the data used for training (Minns and Hall, 1996). Consequently, it is unlikely that ANNs can account for deterministic components in the data (such as trends in the mean or variance, seasonal and cyclic components in time series). Methods of removing and dealing with trends in ANNs are well covered in the literature (see Chng et al., 1996; Khotanzad et al., 1997).

In contrast to earlier suggestions, the probability distribution of the input data needs to be known (Burke and Ignizio, 1992). Since the MSE function is used to optimize the connection weights in ANN models, data need to be normally distributed for optimal results (Fortin et al., 1997; Minasny and McBratney, 2002).

Generally, different variables span different ranges. To ensure that all variables receive equal attention during the training process, they should be transformed to uniform ranges that are commensurate with the limits of the activation functions in the output layer (Masters, 1993). The transfer function in the output layer has an influence on the range to which the data should be scaled (Maier and Dandy, 1999).


    Materials and Methods
 TOP
 ABSTRACT
 INTRODUCTION
 Principles of Artificial Neural...
 Materials and Methods
 Results and Discussion
 Conclusions
 REFERENCES
 
Study Areas
The study was performed at two locations in the Volta basin of Ghana: Tamale (9°28' N, 0°55' W) and Ejura (7°19' N, 1°16' W). The sites were selected so as to represent the diverse geomorphologic, relief, climatic, and sociocultural conditions in the basin across the country. The Volta basin in Ghana covers almost 69% of the country's total 238,539 km2 land area. The Tamale and Ejura sites are located in the Guinea savanna and forest–savanna transition zones, respectively. The geology found at both pilot sites is Voltaian sandstone made up mainly of sandstone, quartzite, shale, and mudstone, covering the largest portion of the basin (about 45% of the country). The soils found in the Volta basin of Ghana are predominantly of Lixisols, Leptosols, Plinthosols, Acrisols, and Luvisols (SRI, 1999). The soils formed from the Voltaian sediments vary widely in terms of soil properties (Agyare, 2004).

The Tamale site in northern Ghana has a tropical continental or interior savanna climate, mainly influenced by the tropical continental air mass. According to the Köppens climatic classification, it is a dry hot low latitude climate (Aw). This study area receives about 1000 to 1200 mm of rainfall in a single rain season (April–October), with a mean annual temperature of 28°C. The climate at the Ejura site is classified as tropical Monsoon climate (Am). This area experiences a bimodal rainy season with a mean annual rainfall of 1200 to 1300 mm. It has high annual and monthly rainfall variability and no clear-cut beginning and end of rains. The mean annual temperature at the Ejura area is 26.6°C.

Soil Sampling and Analysis
Soil data was collected from an area of 6 km2 on a rectangular grid of 100 by 200 m at the Tamale site. At the Ejura site, an area of 0.64 km2 was sampled using a triangular grid, with a base of 40 m and height of 40 m. At the grid points, disturbed and undisturbed soil samples were collected from 0 to 15 cm (topsoil) and 30 to 45 cm (subsoil) depths. All disturbed soil samples were air dried and sieved (2 mm and 0.5 mm). The samples were analyzed for particle size distribution, pH, organic carbon, and cation exchange capacity (CEC).

Undisturbed samples were taken using a cylindrical metal core (10 cm long, 8.3-cm diam.) with the help of a ring holder. The undisturbed soil samples were analyzed gravimetrically for bulk density and for saturated hydraulic conductivity using the falling head method.

Terrain Data Generation
A differential global positioning system (DGPS), Ashtech brand (Santa Clara, CA; Ashtech, 1998), was used to generate point measurements. For details on procedures of DGPS point elevation mapping, see Agyare (2004).

Semivariogram analysis was conducted using S-Plus software (Mathsoft, 1999) before kriging interpolation of the elevation data. The point measurements were interpolated at a grid size of 30 m, using the Surfer 7 program (Golden Software, 1999). The grid data was then imported into DiGeM (Conrad, 2001)—a terrain analysis program—and subsequently used to generate nine different terrain parameters. The morphometric properties (aspect and slope gradient) and curvatures (plan curvature, profile curvature, and curvature) were generated using the Zevenbergen and Thorne (1987) method. See Romano and Chirico (2004) for an explanation of the different terrain parameter. In addition, an upslope contribution area was generated using the multiple flow direction algorithm (Freeman, 1991). Finally, topographic indices (wetness index, stream power, and length–slope [LS] factor) were generated. The resulting grid point data was used for further ANN analysis.

Artificial Neural Network Procedure and Statistical Analysis
The ANN structure used in this study is the multilayer perceptron, which is the most commonly used neural network structure in ecological modeling and soil science (Schulze, 2000; Dawson and Wilby, 2001). The nonlinear hyperbolic tangent transfer function was used to introduce nonlinearity during training or calibration. For parsimony and ease of data interpretation and on the basis of recommendations by Principe et al. (2000), one hidden layer was adopted. The number of neurons or processing elements (PEs) was determined by trying with different numbers. Four PEs were adopted for all analyses. Furthermore, through a number of trials using the momentum-learning rule, a step size of 1.0 and 0.01 was adopted for the hidden and output layers, respectively. A momentum value of 0.5 was used.

The maximum epoch was set at 1000, but an early stopping of learning was used for training by a cross-validation procedure using 10% of the total training dataset. Each analysis was performed using five runs on different realizations of the dataset, simulated by repeating 10 times with different randomization of the data. The best performance is that with lowest normalized mean square error (NMSE) (Eq. [2]) and highest coefficient of determination (R2). The variability of the randomized estimations is quantified by standard deviations (assuming the estimated parameters had approximately normal distribution):


Formula 2

[2]
where xi and

Formula 2

are the measured and estimated parameters, respectively.

A sensitivity analysis (Eq. [3]) was performed to determine which input data have the most influence on the output data or, alternatively, how changes of an input variable affect the output variable. The data in different formats (such as raw, normalized, z-scored transform, or minimum–maximum transform), with the range of 0 to 1 for all continuous data and all categorical data coded as presence (1) or absence (0), were evaluated for their potential in estimation of the output, with the stability evaluated based on the standard deviation from 10 randomized datasets (see Agyare, 2004 for a detailed presentation of these transformations). In addition, the optimum net for sensitive input parameters and data size were determined:

Formula 3[3]
The input parameters used include site (Tamale or Ejura, which takes into account differences in environment, parent material, and climate), soil depth (topsoil or subsoil), soil properties (sand, clay, silt, CEC, organic carbon, bulk density, pH, gravel concretion, subangular blocky structure, granular structure, weak structure, moderately strong structure, strong structure, fine structural size, medium structural size, coarse and medium structural size), and terrain attributes (profile curvature, plan curvature, curvature, elevation, wetness index, upslope contribution area, stream power index, slope gradient, LS factor, and slope aspect).

Table 1 presents the descriptive statistics for the continuous data used for the ANN analysis. It also gives the equations used for normalization (Hamilton, 1990) and as also used by Minasny and McBratney (2002). Shown in the table are the 10 most sensitive parameters obtained from sensitivity analysis and used for further analysis.


View this table:
[in this window]
[in a new window]

 
TABLE 1. Descriptive statistics of continuous parameters used and their transformation parameters.

 
Out of the complete 1126 dataset, a test dataset of 126 data points was set aside after each randomization for testing, and the same test dataset was used for the different data sizes (200, 400, 600, 800, and 1000). The ANN performance criteria adopted in this study were based on model accuracy or ability (NMSE and R2 of training dataset) and generalization or estimation ability using NMSE and R2 for the testing dataset. The models were implemented using the commercially available software package NeuroSolutions 4.0 (Principe et al., 2000).

Analysis of variance (ANOVA) was used to identify differences between the estimated Ks from the different data distributions, groups, and source. This was performed in SPSS (SPSS, 1999).


    Results and Discussion
 TOP
 ABSTRACT
 INTRODUCTION
 Principles of Artificial Neural...
 Materials and Methods
 Results and Discussion
 Conclusions
 REFERENCES
 
Sensitivity Analysis of Artificial Neural Network
To understand the effect of input and output data distribution or form on ANN models, raw, normalized, z-scored and minimum–maximum (0–1) transform data were used. The results presented in Fig. 1 show the sensitivity of ANN output to soil input parameters in different data forms. The raw, normalized, and z-scored data exhibit high standard errors, which indicates instability. Also, contrary to expectation, the sensitivity of the categorical input parameters was comparatively higher than that of continuous parameters, except when minimum–maximum (0–1) data was used.


Figure 1
View larger version (33K):
[in this window]
[in a new window]

 
FIG. 1. Sensitivity of different soil input parameters for different data forms, (A) raw, (B) normalized, (C) z-scored, and (D) minimum–maximum (0–1) data, for estimating saturated hydraulic conductivity in the artificial neural network. (SABST, subangular blocky structure; GST, granular structure; WKSG, weak structure; MSG, moderately strong structure; SSG, strong structure; FSS, fine structural size; MedSS, medium structural size; CSS, coarse and medium structural size; OC, organic carbon; CEC, cation exchange capacity; BD, bulk density.

 
To establish the effect of data distribution on ANN output, the R2 and NMSE for the different data distributions were compared using ANOVA. Presented in Table 2 are the R2 and NMSE for training and testing for four different data forms (raw, normalized, z-scored, and minimum–maximum [0–1]). The analysis was done using only the 10 most sensitive parameters which, in order of decreasing sensitivity, were bulk density, sand content, site (Tamale or Ejura), gravel and/or concretion, soil sampling depth (topsoil or subsoil), soil structural grade (strong), structural type (subangular blocky), clay content, silt content, and structural size (coarse). These are the parameters with high contribution to R2 or low contribution to NMSE. The results show that the raw data have significantly lower R2 and higher NMSE compared with the other data forms and are least suitable. Among the remaining data forms, the minimum–maximum data form with the range of 0–1 performed slightly better and was selected for use in further analysis.


View this table:
[in this window]
[in a new window]

 
TABLE 2. Comparison of artificial neural network (ANN) coefficient of determination (R2) and normalized mean square error (NMSE) for training and testing datasets for saturated hydraulic conductivity based on different data forms using ANN.

 
To determine the importance of the individual input parameters in modeling Ks, a sensitivity analysis was performed using the 1000 and 126 datasets for training and testing, respectively. Figure 2 illustrates the effect of increasing the number of input parameters on the variation of R2 and NMSE for estimating Ks for training and testing data. The input parameters were selected based on their sensitivity—with the most sensitive entering first. Figure 2a depicts a rapid improvement in R2 for both training and testing data for the first two most sensitive parameters; the increase then becomes gradual, with the training data maintaining a plateau after inclusion of about eight input parameters, whereas for the testing data, R2 declines with additional input parameters. The NMSE shows an opposite trend to that of the R2, with a more or less constant value for the training data and a gradually increasing NMSE for the testing data (Fig. 2b). This illustrates the need to use only the most sensitive input parameters to keep the model simple for estimation purposes.


Figure 2
View larger version (16K):
[in this window]
[in a new window]

 
FIG. 2. Variation in saturated hydraulic conductivity estimation with increasing number of input parameters for (A) coefficient of determination (R2) and (B) normalized mean square error (NMSE) in the artificial neural network.

 
Figure 3 shows variations in R2 and NMSE for training and testing datasets as training data size is varied. According to the figure, as the training data size increases, the R2 and NMSE for the training data linearly increase and decrease, respectively. This is an indication of the increasing ability to train the ANN as the size of the input data is increased. However, in the case of the testing data, the R2 and NMSE increase and decrease, respectively, at a decreasing rate following a natural log function. This shows that with an additional dataset (>1000), one can expect an increase in model performance. Thus, with the maximum data size used, the model still does not capture all the relationships between the input parameters and Ks. The trend depicted by the testing data is an indication that with increasing training data size, it is possible to improve the generalization ability of the ANN, but that after a certain maximum training data size, there will be no further increase in the ability to generalize or estimate.


Figure 3
View larger version (20K):
[in this window]
[in a new window]

 
FIG. 3. Trend of training data size effect on (A) coefficient of determination (R2) and (B) normalized mean square error (NMSE) for training and testing data in estimating saturated hydraulic conductivity using combined data from Ejura and Tamale sites in the artificial neural network.

 
Artificial Neural Network Modeling with Soil and Terrain Parameters
To investigate the importance of different data groups and the effect of soil and terrain data on ANN for Ks estimation, four parameter groups were considered. Table 3 presents the ANOVA for the R2 and NMSE with their standard error for training and testing datasets for the different groups of input data: (A) all parameters, (B) the 10 most sensitive parameters (see Table 1), (C) the 6 most sensitive soil parameters (bulk density, sand, silt, clay content, CEC, and organic carbon), and (D) terrain attributes only.


View this table:
[in this window]
[in a new window]

 
TABLE 3. Coefficient of determination (R2) and normalized mean square error (NMSE) for saturated hydraulic conductivity, Ks, using different data groups for all sites and sampling depths with standard error in parentheses.

 
Using only terrain attributes (D) gives an R2 and NMSE that are significantly lower and higher, respectively, than for the other three parameter groups for both training and testing datasets. Using all input parameters (A), 10 most sensitive (B), and 6 most sensitive soil parameters (C), R2 and NMSE are not significantly different for both training and testing datasets, suggesting that a minimum parameter set can be defined.

In investigating the effect of data source on R2 and NMSE, an ANOVA was performed for data from the two sites and soil depths (topsoil [0–15 cm] and subsoil [30–45 cm]). Table 4 shows the R2 and NMSE, respectively, for the training and testing datasets for the different sites and soil depths using the six most sensitive soil parameters. It shows that differences may result in training and testing ability when using data from the different sites and soil depths. These results are comparable to those obtained by Schaap and Leij (1998) (0.44–0.67 for the training dataset and 0.28–0.55 for testing with different datasets). The generally low R2 obtained for the Tamale topsoil compared with that of Ejura site underlines the fact that it is more difficult to estimate saturated hydraulic conductivity for highly disturbed land areas such as in Tamale. Grouping based on soil depth improves PTF estimation similar to the finding of Romano and Palladino (2002). The terrain parameters introduce process-based rationale related to pedogenesis and also account for the local environment. This supports the fact that extrapolating Ks estimation using ANN does not yield good result and can only be done with confidence in a restricted soil types and environmental conditions (Bruand, 2004). Therefore, some data of the site of interest needs to be included in the training process for good estimation.


View this table:
[in this window]
[in a new window]

 
TABLE 4. Comparison of artificial neural networks coefficient of determination (R2) and normalized mean square error (NMSE) with the standard error of saturated hydraulic conductivity, Ks, for training and testing data at different sites and sampling depths using the six sensitive soil parameters.

 
To illustrate the relationship between measured and estimated data, a line plot of measured saturated hydraulic conductivity (Ksm) and estimated saturated hydraulic conductivity (Kse) for a given randomization of training and testing data was generated (Fig. 4). For this particular Ejura topsoil dataset, an R2 of 0.69 (NMSE = 0.31) and 0.64 (NMSE = 0.36) was obtained for the training and testing data, respectively, using only the 10 most sensitive input parameters. The figure illustrates a very good relationship between Ksm and Kse.


Figure 4
View larger version (26K):
[in this window]
[in a new window]

 
FIG. 4. Measured compared to estimated saturated hydraulic conductivity (Ks) for (A) training and (B) testing data for 60 randomly selected Ejura topsoil datasets in the artificial neural network with Ks transformed between 0 and 1.

 
Estimating Saturated Hydraulic Conductivity Using Artificial Neural Network
To evaluate the potential of using ANN to estimate Ks for sites very different and far removed from the particular site from which data is used to develop or build the model, different model scenarios were investigated. For this purpose, data from the Ejura site was used to build a model, and the model tested with data from both sites, and vice versa. Figures 5, 6, and 7 illustrate the estimation potential.


Figure 5
View larger version (17K):
[in this window]
[in a new window]

 
FIG. 5. Comparison of coefficient of determination (R2) for estimated saturated hydraulic conductivity for different testing data using training data from the same and different sites and indicating their SE bars in the artificial neural network (a, b, c shows Bonferroni mean separation test, with same letter indicating not significantly different at p < 0.05).

 

Figure 6
View larger version (16K):
[in this window]
[in a new window]

 
FIG. 6. Comparison of normalized mean square error (NMSE) for estimated saturated hydraulic conductivity for different testing data using training data from the same and different sites and indicating their SE bars in the artificial neural network.

 

Figure 7
View larger version (28K):
[in this window]
[in a new window]

 
FIG. 7. Comparison of measured (Ksm) and estimated (Kse) saturated hydraulic conductivity for Tamale (A and B) and Ejura sites (C and D)– using training data from different site– in the artificial neural network with Ks transformed between 0 and 1.

 
Figures 5 and 6 illustrates the R2 and NMSE, respectively, for the different sites by soil depths when Ks is estimated with testing data from the same or a different site as the training data. Shown on the graphs are the Bonferroni mean separation results, using a, b, and c. Also marked on the graphs are standard error bars. The two figures show a trend of higher R2 and lower NMSE for testing data when they are from the same site as the training data. The R2 for the topsoil at the two sites is significantly higher for situations when the training and testing data are from the same site but low when the testing data are from a site different from that of the training data. The R2 for subsoil at both sites were not significantly different whether the testing and training data were from same site or not. A consistent but opposite trend was observed for the NMSE. These indicate a more site-independent Ks estimation for subsoil than for the topsoil. This is further illustrated in Fig. 7, which shows the closer relationship between saturated hydraulic conductivity for 80 samples, measured (Ksm) and estimated (Kse), for the subsoil compared with the topsoil. This difference seems mainly due to the greater influence of management practices on Ks of the topsoil compared with the subsoil.


    Conclusions
 TOP
 ABSTRACT
 INTRODUCTION
 Principles of Artificial Neural...
 Materials and Methods
 Results and Discussion
 Conclusions
 REFERENCES
 
On the basis of the analysis presented, it was found that transforming the data into a maximum–minimum range of 0–1 yields the best ANN model performance. This is because with this transformation, all data falls within the same range, and the undue influence of contrasting data ranges is therefore minimized. Also, the use of sensitive data was found to be critical for modeling Ks with ANN, as too many insensitive input parameters reduced the generalization or estimation accuracy of the model. Furthermore, it was evident that increasing the data size, even beyond the maximum 1000 training data-set size used for this analysis, may help improve the model performance of the ANN.

The artificial neural network yields a high R2 if adequate and sensitive data are used, suggesting good estimation of Ks. Artificial neural networks can be used to estimate Ks using soil data with possible improvement in model performance when additional parameters from relevant terrain attributes are included. Although the inclusion of terrain parameters can improve the estimation of Ks using ANN, based on the scale of data considered, it cannot be relied on solely for modeling Ks. Because cross-sectional data—in contrast to aggregate time series or replicate data—are used, an R2 ≥ 0.5 is high (Greene, 2000), knowing that the parameter Ks being considered is highly variable.

It is evident from the results that, for estimation purposes, it is important when using ANN to use data from the same environment to do the training when topsoil Ks is being estimated. In the case of subsoil, where the land management effect is minimal, it is possible to estimate Ks for an area far away from the training data site (within the Volta basin of Ghana) if the necessary input data exist. This result need to be repeated with different datasets, however, to ascertain its general applicability.

The good performance of ANN and the fact that it does not require an a priori model make it ideal for estimating Ks in the Volta basin of Ghana. However, it should be remembered that increasing the number of samples increases the estimation potential of ANN and that adequate and sensitive data is important for model development.


    ACKNOWLEDGMENTS
 
The authors wish to express their gratitude to the German government through its Ministry of Education, Science, and Technology (BMBF), for its financial support. This gratitude also goes to the management and staff of Zentrum für Entwicklungsforschung (ZEFc), University of Bonn, Bonn, Germany, and the Savanna Agricultural Research Institute (SARI), Tamale, Ghana, for their technical and administrative support.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 Principles of Artificial Neural...
 Materials and Methods
 Results and Discussion
 Conclusions
 REFERENCES
 





This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Agyare, W. A.
Right arrow Articles by Vlek, P. L. G.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Agyare, W. A.
Right arrow Articles by Vlek, P. L. G.
GeoRef
Right arrow GeoRef Citation
Agricola
Right arrow Articles by Agyare, W. A.
Right arrow Articles by Vlek, P. L. G.
Related Collections
Right arrow Hydraulic Conductivity
Right arrow Pedotransfer Functions
Right arrow Soil Physics


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
The SCI Journals Agronomy Journal Crop Science
Journal of Natural Resources
and Life Sciences Education
Soil Science Society of America Journal
Journal of Plant Registrations Journal of
Environmental Quality
The Plant Genome