VZJ Download to Citation Manager
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Published online 9 April 2007
Published in Vadose Zone J 6:255-259 (2007)
DOI: 10.2136/vzj2006.0156
© 2007 Soil Science Society of America
677 S. Segoe Rd., Madison, WI 53711 USA
This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Web of Science (4)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Hardelauf, H.
Right arrow Articles by Vereecken, H.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Hardelauf, H.
Right arrow Articles by Vereecken, H.
GeoRef
Right arrow GeoRef Citation
Agricola
Right arrow Articles by Hardelauf, H.
Right arrow Articles by Vereecken, H.
Related Collections
Right arrow Coupled Flow/Transport Models
Right arrow Vadose Zone Processes and Chemical Transport

TECHNICAL NOTES

PARSWMS: A Parallelized Model for Simulating Three-Dimensional Water Flow and Solute Transport in Variably Saturated Soils

H. Hardelauf, M. Javaux, M. Herbst, S. Gottschalk, R. Kasteel, J. Vanderborght* and H. Vereecken

Agrosphere, ICG-IV, Forschungszentrum Jülich GmbH, D-52425 Jülich, Germany
* Corresponding author (j.vanderborght{at}fz-juelich.de).

All rights reserved. No part of this periodical may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher.


Received 24 October 2006.


ABSTRACT

Three-dimensional vadose zone models are used more and more for solving hydrological problems on a broad range of scales with large amount of nodes. Currently, the problems we can solve in reasonable computational time may have up to 5 · 106 nodes. However, distributed models may need up to 1010 nodes to properly predict flow and transport at the watershed scale. The speed and efficiency of current flow and transport models therefore need to be improved. The parallelization of the code is one possible way to decrease the computational time by distributing a complex large geometry problem over multiple processors working in parallel. This is the solution we implemented by developing PARSWMS, a parallelized version of SWMS_3D (Simunek et al., 1995). The objective of this technical note is to describe the PARSWMS model, test its reliability, and show its performance and efficiency compared with single processor runs.

Abbreviations: CDE, convection–dispersion equations • MPI, Message-Passing Interface • PETSc, Portable, Extensive Toolkit for Scientific Computation.

With the increasing computational power in the 1980s, numerical three-dimensional models for water flow and solute transport have been developed. The objective was to build codes capable of solving the nonlinear Richards' and convection–dispersion equations (CDE) for any domain geometry and for complex boundary and initial conditions. Since then, three-dimensional models have been used for a wide range of problems in terms of scale, boundary conditions, and processes involved. Nowadays, processes such as biodegradation, nonlinear sorption, geochemical reactions, and runoff, which render the numerical problem even more complex, have been added to the original Richards' equation and the CDE. From simple and local-scale problems (Kasteel et al., 2000), the current use of three-dimensional flow and transport models has been extended to much larger scales and to bigger number of nodes. Three-dimensional vadose zone models are being used more and more for solving hydrological problems at the catchment scale (Herbst et al., 2006). Yet, a key issue of the distributed modeling is the maximum grid resolution at which such models are valid. Theoretically, the grid resolution should be of the same order of magnitude as the Darcy (or elementary representative volume) scale, that is, between 10–2 and 100 m. This implies that the computational need for a large-scale, distributed flow and transport model will increase in the future. Harter and Hopmans (2004) estimated that 1010 nodes would be needed for an appropriate modeling of a 10-km2 watershed. Even for smaller-scale problems but with very fine spatial discretization, the number of nodes may quickly become limiting, considering that today, the maximum number of nodes we can handle with reasonable computation time is about 106 (Harter and Hopmans, 2004). Cirpka and Kitadinis (2002) used more than 2 x 106 nodes to simulate solute dispersion in a 5 by 10 by 4 m heterogeneous soil plot. Russo et al. (2006) used 3.84 · 106 cells for a plot of 24 by 16 by 20 m.

The major drawback of fine-grid models when large-scale problems have to be faced is the enormous demand for computational time and resources. The speed and efficiency of current models therefore have to be improved. The parallelization of the code is one possible way to decrease the computational time, by distributing a complex, large geometry problem over multiple processors working in parallel (Vereecken et al., 1996; Elmroth et al., 2001; Ashby and Falgout, 1996). This is the solution we implemented by developing PARSWMS, a parallelized version of SWMS_3D (Simunek et al., 1995). The objective of this technical note is to describe the PARSWMS model, test its reliability, and show its performance and efficiency as compared to single processor runs.

Model Description

SWMS_3D Main Features
SWMS_3D is a code for simulating water flow and solute transport in three-dimensional, variably saturated media (Simunek et al., 1995). This model was chosen for its flexibility; its performances are ensured by its broad use through the vadose zone hydrology community (e.g., Herbst and Diekkruger, 2002, 2003; Lewandowska et al., 2004; Javaux and Vanclooster, 2006; Javaux et al., 2006). Water flow is solved using Richards' equation, while transport is described with the CDE. Galerkin-type linear finite element schemes are used for the discretization of the flow and transport equations. The resulting equations are solved in an iterative fashion, by linearization and subsequent Gaussian elimination for banded matrices, a conjugate gradient method for symmetric matrices, or the ORTHOMIN method for asymmetric matrices (Mendoza et al., 1991). Additional measures are taken to improve solution efficiency in transient problems, including automatic time step adjustment and checking if the Courant and Peclet numbers do not exceed preset levels. The water content term is evaluated using the mass-conservative method proposed by Celia et al. (1990). For any additional information about the code, refer to Simunek et al. (1995).

Overview of the Parallel Implementation
The PARSWMS code is based on the SWMS_3D code. Most of the subroutines, functions, and variable names have been kept the same. Apart from the parallelization itself, the main change concerns the programming language, which was switched from FORTRAN 77 to C++ to take advantage of the dynamic allocation of all variables. The output file format was also slightly modified. Basically, the code gives one series of output files by processor. Therefore, a postprocessing code was written to merge all the data together after running. Note, however, that the input files are the same as for SMWS_3D. The parallelization essentially involves three different parts: communication between different processors, the distribution of the problem over the different processors, and the solution of the subproblems on individual processors. The following section presents the implementation of these three parts in the parallelized code.

Communication between Processors
The new code is based on Message-Passing Interface (MPI), a library specification for message passing between different processors. Message-Passing Interface provides source-code portability and allows efficient implementation across a range of computer architectures. It is free software for LINUX or UNIX operating systems, which needs to be installed on the working machine.

Grid Distribution
The distribution of the flow domain or partitioning of the problem between processors is automatically performed using algorithms of the ParMETIS library. With the aid of this library, even the partitioning process itself is parallelized. ParMETIS is a MPI-based parallel library that implements a variety of algorithms for partitioning unstructured graphs, meshes, and for computing fill-reducing orderings of sparse matrices (http://glaros.dtc.umn.edu/gkhome/views/metis/parmetis/). This library allows dynamic partitioning and an adaptive or irregular grid, and it can be run on heterogeneous clusters. Currently, the partitioning is static, in the sense that it is done once at the beginning of the run. The algorithm optimizes the partitioning between processors based on weights given by the user to each node or element by minimizing the surface of the subvolumes on each processor as well as the connection numbers between processors.

Matrix Solving
The solution of the system of linear equations is achieved for the nodes of the sub-volume allocated on each processor. The PETSc (Portable, Extensive Toolkit for Scientific Computation) library (http://www-unix.mcs.anl.gov/petsc/petsc-as/) was used because it allows the user to choose between a large range of solvers and preconditioners that can handle linear and nonlinear problems in parallel mode. Therefore, the solver implemented in PARSWMS is different than SWMS_3D and could easily be further optimized. Herbst et al. (unpublished data) investigated this problem by comparing the efficiency of several preconditioning approaches used in PARSWMS. For water-flow simulations, we used a Jacobi preconditioning method for a parallel Conjugate Gradient solver.

Methodology

To check the accuracy and stability of the new code, we first compared PARSWMS with SWMS_3D simulations for a few benchmark scenarios developed for testing one-dimensional flow and transport numerical codes (Vanderborght et al., 2005). Then, we evaluated the performance of the PARSWMS code in terms of computational time by running two computationally demanding problems for different numbers of processors on a massive parallel system.

Benchmarking Scenarios
The benchmarking scenarios (S1) considered simple one-dimensional flow processes for simple boundary conditions applied to uniform (loam) or two-layered soils (loam over sand). The problem was discretized into 100 elements of 1-cm thickness each along the z axis. We found that the SWMS_3D code compared favorably with the analytical solutions (results not shown) so that we focused on the comparison between SWMS_3D and PARSWMS. A summary of the scenarios we used is given in Table 1. More detailed information about these scenarios is given by Vanderborght et al. (2005).


View this table:
[in this window]
[in a new window]

 
TABLE 1. Benchmark scenario.

 
Three-Dimensional Flow and Transport Scenarios
Two other scenarios were tested to compare the performances of SWMS_3D and PARSWMS models for more complex three-dimensional cases. The first (S2) concerned solute transport and was derived from a study of Javaux et al. (2006). It consists of a steady-state short pulse release of an inert tracer on the surface of a cylindrical heterogeneous soil monolith. The column geometry is described by a mesh of 492 264 nodes. The heterogeneity of the subsoil core was characterized at two spatial levels. The macrostructure was described by the delineation of three texturally contrasted bodies (sand, clay, and concretions). Additionally, a microvariability was implemented in the sand body with isotropic scaling factor distributions for the van Genuchten–Mualem parameters (Vogel et al., 1991). Table 2 shows the three averaged parameter sets. The scaling factor auto- and cross-variograms were parameterized using a nested model with a short range (range = 0.25 m) exponential behavior and a larger scale (range = 0.9 m) Gaussian model. More details may be found in Javaux et al. (2006).


View this table:
[in this window]
[in a new window]

 
TABLE 2. Parameters of van Genuchten–Mualem soil hydraulic functions for the complex solute transport scenario (S2).

 
A second complex scenario (S3) consisted of bare soil submitted to a time series of variable climatic conditions. The description of the upper boundary condition is shown in Fig. 1 . The problem geometry consisted in a cube of 2.5-m side length, discretized in 275 706 nodes in total. The nodal distance was 5 cm in the horizontal direction compared with 2.5 cm in the vertical direction, with a spatial refinement in the upper 2.5 cm. This soil cube had a heterogeneous parameter distribution represented by independent random fields of scaling factors for the water retention curve and the hydraulic conductivity function. Averaged soil characteristic curves were modeled using the modified van Genuchten–Mualem model with parameters {theta}r = 0.078, {theta}s = 0.430, {theta}m = 0.431, {alpha} = 0.036 cm–1, n = 1.56, Ks = 10.4 cm h–1, Km = 1.04, and l = 0.5, and linear scaling factors were used to account for the spatial variability of the soil hydraulic properties (Vogel et al., 1991).


Figure 1
View larger version (25K):
[in this window]
[in a new window]

 
FIG. 1. Cumulative outflow (upper subplot) and average surface pressure head (middle subplot) resulting from the atmospheric boundary conditions (bottom subplot).

 
Time Scaling
We investigated the behavior of PARSWMS in terms of computational time reduction when the number of processors is increased. A massive computer system was used to run the scenarios S2 and S3 on a range between 1 and 256 processors. Details about the technical features of the supercomputer are given in the Appendix.

Results and Discussion

One-Dimensional Benchmarking Scenarios
First results confirm that SWMS_3D and PARSWMS give convergent simulations for all cases, under several benchmarking scenarios for water. Visual inspection of the data revealed identical results. The computed coefficient of determination R2 between SWMS_3D and the parallelized version are 0.9999468, 0.9999994, 0.9999982, 0.9999913, and 0.9997436 for scenarios S1.1 to S1.5, respectively (Table 1). R2 is always larger than 0.999, which means that the new encoded PARSWMS routines and the new parallelized solver do not generate discrepancies between simulations.

In addition, the PARSWMS simulations for scenarios S2 and S3 (Table 1) compared well with the SWMS_3D results. Figure 2 shows a cross-section of the cylindrical lysimeter concentration distribution during the solute breakthrough, and again, both simulations yield almost identical results.


Figure 2
View larger version (39K):
[in this window]
[in a new window]

 
FIG. 2. Comparison of the SWMS_3D (upper row) and PARSWMS (bottom) simulations of solute transport in a heterogeneous monolith (vertical cross-section): scenario S2.

 
These tests indicate a successful parallelization and reliability of the new implementation on modeling simple and complex flow and transport cases.

CPU Time Scaling Behavior
The speedup of using parallel processors on a problem compared with using only one serial processor is governed by Amdahl's law (Amdahl, 1967). This law states that if F is the fraction of a calculation that is sequential, and (1 – F) is the fraction that is parallelized, the maximum speedup that can be achieved on np processors is

Formula

In our case, since roughly 100% of the code is parallelized, speedup versus np should optimally be a 1:1 line (or –1 slope line when time gain is plotted in function of np). Figure 3 shows the time scaling for the scenarios S2 and S3 compared with Amdahl's law. We plotted the relative time (tnp/tnp=1) as a function of the number of processors np used for the run. The time scaling is almost linear and remarkably close to the optimum defined by Amdahl's law. There is a short deviation from the linear trend for large numbers of processors, which is likely due to the architecture of the supercomputer, which is composed of a series of nodes having 32 interconnected processors. The internode being lower than the intranode connection speed, doubling the number of processors beyond 25, apparently decreases the code performance.


Figure 3
View larger version (22K):
[in this window]
[in a new window]

 
FIG. 3. Time gain compared with the one processor run (in log2) as a function of the number of processors np (in log2) for (i) solute transport scenario S2 with 49 2264 nodes (open circles) and (ii) water flow with atmospheric upper boundary conditions (scenario S3, open diamonds). Dashed line represents Amdahl's law.

 
Fitted linear equations give a slope of –0.93 and –0.7, respectively, for scenarios S2 and S3. Thus, doubling the number of processors results in a computational time decrease of 47.5 and 39%, respectively, while Amdahl's law predicts an optimum of 50%. The difference in performance between scenarios depends on the boundary conditions and the type of geometry, both of which impact the quantity of information that has to be exchanged between processors. In the S3 case, the complex atmospheric boundary conditions caused a frequent switch between head- or flux-type boundary conditions, which needed more numerical time steps to converge with lower mass balance errors on PARSWMS compared with SWMS_3D.

Conclusions

We successfully developed a parallel code based on SWMS_3D for solving Richards' equation and CDE under variably saturated conditions. The PARSWMS code was developed so that it can be compiled and run on any network (or even a uniprocessor) of LINUX or UNIX workstations or on specialized parallel hardware as long as MPI, PETSc, and PARMETIS freeware is installed. A comprehensive benchmarking was performed to validate the new code. The performances were investigated on a massive parallel computer. The two case studies indicate that doubling the number of processors may lead to a decrease of the computational time up to 48%. However, this figure may be affected by the type of boundary conditions and the geometry of the problem. Thanks to this parallelized algorithm and with better accessibility to cluster calculation and supercomputers, we expect that problems with larger amount of nodes may be tackled in the near future.

Next steps include a coupling with a particle tracking code and with advanced root soil modeling approaches.

Appendix

The scale evolution of computational time performance was performed on the jump supercomputer at the Forschungszentrum Juelich IBM Regatta p690+. Main characteristic features of this machine include the following:

ACKNOWLEDGMENTS

We thank the John von Neumann, Institute for Computing (NIC), Jülich, for providing the opportunity to use their parallel computing resources.

REFERENCES




This article has been cited by other articles:


Home page
Vadose Zone JHome page
H. Vereecken, P. Burauel, J. Groeneweg, E. Klumpp, W. Mittelstaedt, H.-D. Narres, T. Putz, J. van der Kruk, J. Vanderborght, and F. Wendland
Research at the Agrosphere Institute: From the Process Scale to the Catchment Scale
Vadose Zone J., August 11, 2009; 8(3): 664 - 669.
[Abstract] [Full Text] [PDF]


Home page
Vadose Zone JHome page
T. Schroder, M. Javaux, J. Vanderborght, B. Korfgen, and H. Vereecken
Effect of Local Soil Hydraulic Conductivity Drop Using a Three-Dimensional Root Water Uptake Model
Vadose Zone J., August 13, 2008; 7(3): 1089 - 1098.
[Abstract] [Full Text] [PDF]


Home page
Vadose Zone JHome page
J. Simunek and S. A. Bradford
Vadose Zone Modeling: Introduction and Importance
Vadose Zone J., May 27, 2008; 7(2): 581 - 586.
[Full Text] [PDF]


Home page
Vadose Zone JHome page
J. Simunek, M. Th. van Genuchten, and M. Sejna
Development and Applications of the HYDRUS and STANMOD Software Packages and Related Codes
Vadose Zone J., May 27, 2008; 7(2): 587 - 600.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Web of Science (4)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Hardelauf, H.
Right arrow Articles by Vereecken, H.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Hardelauf, H.
Right arrow Articles by Vereecken, H.
GeoRef
Right arrow GeoRef Citation
Agricola
Right arrow Articles by Hardelauf, H.
Right arrow Articles by Vereecken, H.
Related Collections
Right arrow Coupled Flow/Transport Models
Right arrow Vadose Zone Processes and Chemical Transport


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
The SCI Journals Agronomy Journal Crop Science
Journal of Natural Resources
and Life Sciences Education
Soil Science Society of America Journal
Journal of Plant Registrations Journal of
Environmental Quality
The Plant Genome