|
|
||||||||
Agrosphere, ICG-IV, Forschungszentrum Jülich GmbH, D-52425 Jülich, Germany
* Corresponding author (j.vanderborght{at}fz-juelich.de).
All rights reserved. No part of this periodical may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher.
Received 24 October 2006.
ABSTRACT
Three-dimensional vadose zone models are used more and more for solving hydrological problems on a broad range of scales with large amount of nodes. Currently, the problems we can solve in reasonable computational time may have up to 5 · 106 nodes. However, distributed models may need up to 1010 nodes to properly predict flow and transport at the watershed scale. The speed and efficiency of current flow and transport models therefore need to be improved. The parallelization of the code is one possible way to decrease the computational time by distributing a complex large geometry problem over multiple processors working in parallel. This is the solution we implemented by developing PARSWMS, a parallelized version of SWMS_3D (Simunek et al., 1995). The objective of this technical note is to describe the PARSWMS model, test its reliability, and show its performance and efficiency compared with single processor runs.
Abbreviations: CDE, convectiondispersion equations MPI, Message-Passing Interface PETSc, Portable, Extensive Toolkit for Scientific Computation.
With the increasing computational power in the 1980s, numerical three-dimensional models for water flow and solute transport have been developed. The objective was to build codes capable of solving the nonlinear Richards' and convectiondispersion equations (CDE) for any domain geometry and for complex boundary and initial conditions. Since then, three-dimensional models have been used for a wide range of problems in terms of scale, boundary conditions, and processes involved. Nowadays, processes such as biodegradation, nonlinear sorption, geochemical reactions, and runoff, which render the numerical problem even more complex, have been added to the original Richards' equation and the CDE. From simple and local-scale problems (Kasteel et al., 2000), the current use of three-dimensional flow and transport models has been extended to much larger scales and to bigger number of nodes. Three-dimensional vadose zone models are being used more and more for solving hydrological problems at the catchment scale (Herbst et al., 2006). Yet, a key issue of the distributed modeling is the maximum grid resolution at which such models are valid. Theoretically, the grid resolution should be of the same order of magnitude as the Darcy (or elementary representative volume) scale, that is, between 102 and 100 m. This implies that the computational need for a large-scale, distributed flow and transport model will increase in the future. Harter and Hopmans (2004) estimated that 1010 nodes would be needed for an appropriate modeling of a 10-km2 watershed. Even for smaller-scale problems but with very fine spatial discretization, the number of nodes may quickly become limiting, considering that today, the maximum number of nodes we can handle with reasonable computation time is about 106 (Harter and Hopmans, 2004). Cirpka and Kitadinis (2002) used more than 2 x 106 nodes to simulate solute dispersion in a 5 by 10 by 4 m heterogeneous soil plot. Russo et al. (2006) used 3.84 · 106 cells for a plot of 24 by 16 by 20 m.
The major drawback of fine-grid models when large-scale problems have to be faced is the enormous demand for computational time and resources. The speed and efficiency of current models therefore have to be improved. The parallelization of the code is one possible way to decrease the computational time, by distributing a complex, large geometry problem over multiple processors working in parallel (Vereecken et al., 1996; Elmroth et al., 2001; Ashby and Falgout, 1996). This is the solution we implemented by developing PARSWMS, a parallelized version of SWMS_3D (Simunek et al., 1995). The objective of this technical note is to describe the PARSWMS model, test its reliability, and show its performance and efficiency as compared to single processor runs.
Model Description
SWMS_3D Main Features
SWMS_3D is a code for simulating water flow and solute transport in three-dimensional, variably saturated media (Simunek et al., 1995). This model was chosen for its flexibility; its performances are ensured by its broad use through the vadose zone hydrology community (e.g., Herbst and Diekkruger, 2002, 2003; Lewandowska et al., 2004; Javaux and Vanclooster, 2006; Javaux et al., 2006). Water flow is solved using Richards' equation, while transport is described with the CDE. Galerkin-type linear finite element schemes are used for the discretization of the flow and transport equations. The resulting equations are solved in an iterative fashion, by linearization and subsequent Gaussian elimination for banded matrices, a conjugate gradient method for symmetric matrices, or the ORTHOMIN method for asymmetric matrices (Mendoza et al., 1991). Additional measures are taken to improve solution efficiency in transient problems, including automatic time step adjustment and checking if the Courant and Peclet numbers do not exceed preset levels. The water content term is evaluated using the mass-conservative method proposed by Celia et al. (1990). For any additional information about the code, refer to Simunek et al. (1995).
Overview of the Parallel Implementation
The PARSWMS code is based on the SWMS_3D code. Most of the subroutines, functions, and variable names have been kept the same. Apart from the parallelization itself, the main change concerns the programming language, which was switched from FORTRAN 77 to C++ to take advantage of the dynamic allocation of all variables. The output file format was also slightly modified. Basically, the code gives one series of output files by processor. Therefore, a postprocessing code was written to merge all the data together after running. Note, however, that the input files are the same as for SMWS_3D. The parallelization essentially involves three different parts: communication between different processors, the distribution of the problem over the different processors, and the solution of the subproblems on individual processors. The following section presents the implementation of these three parts in the parallelized code.
Communication between Processors
The new code is based on Message-Passing Interface (MPI), a library specification for message passing between different processors. Message-Passing Interface provides source-code portability and allows efficient implementation across a range of computer architectures. It is free software for LINUX or UNIX operating systems, which needs to be installed on the working machine.
Grid Distribution
The distribution of the flow domain or partitioning of the problem between processors is automatically performed using algorithms of the ParMETIS library. With the aid of this library, even the partitioning process itself is parallelized. ParMETIS is a MPI-based parallel library that implements a variety of algorithms for partitioning unstructured graphs, meshes, and for computing fill-reducing orderings of sparse matrices (http://glaros.dtc.umn.edu/gkhome/views/metis/parmetis/). This library allows dynamic partitioning and an adaptive or irregular grid, and it can be run on heterogeneous clusters. Currently, the partitioning is static, in the sense that it is done once at the beginning of the run. The algorithm optimizes the partitioning between processors based on weights given by the user to each node or element by minimizing the surface of the subvolumes on each processor as well as the connection numbers between processors.
Matrix Solving
The solution of the system of linear equations is achieved for the nodes of the sub-volume allocated on each processor. The PETSc (Portable, Extensive Toolkit for Scientific Computation) library (http://www-unix.mcs.anl.gov/petsc/petsc-as/) was used because it allows the user to choose between a large range of solvers and preconditioners that can handle linear and nonlinear problems in parallel mode. Therefore, the solver implemented in PARSWMS is different than SWMS_3D and could easily be further optimized. Herbst et al. (unpublished data) investigated this problem by comparing the efficiency of several preconditioning approaches used in PARSWMS. For water-flow simulations, we used a Jacobi preconditioning method for a parallel Conjugate Gradient solver.
Methodology
To check the accuracy and stability of the new code, we first compared PARSWMS with SWMS_3D simulations for a few benchmark scenarios developed for testing one-dimensional flow and transport numerical codes (Vanderborght et al., 2005). Then, we evaluated the performance of the PARSWMS code in terms of computational time by running two computationally demanding problems for different numbers of processors on a massive parallel system.
Benchmarking Scenarios
The benchmarking scenarios (S1) considered simple one-dimensional flow processes for simple boundary conditions applied to uniform (loam) or two-layered soils (loam over sand). The problem was discretized into 100 elements of 1-cm thickness each along the z axis. We found that the SWMS_3D code compared favorably with the analytical solutions (results not shown) so that we focused on the comparison between SWMS_3D and PARSWMS. A summary of the scenarios we used is given in Table 1. More detailed information about these scenarios is given by Vanderborght et al. (2005).
|
|
r = 0.078,
s = 0.430,
m = 0.431,
= 0.036 cm1, n = 1.56, Ks = 10.4 cm h1, Km = 1.04, and l = 0.5, and linear scaling factors were used to account for the spatial variability of the soil hydraulic properties (Vogel et al., 1991).
|
Results and Discussion
One-Dimensional Benchmarking Scenarios
First results confirm that SWMS_3D and PARSWMS give convergent simulations for all cases, under several benchmarking scenarios for water. Visual inspection of the data revealed identical results. The computed coefficient of determination R2 between SWMS_3D and the parallelized version are 0.9999468, 0.9999994, 0.9999982, 0.9999913, and 0.9997436 for scenarios S1.1 to S1.5, respectively (Table 1). R2 is always larger than 0.999, which means that the new encoded PARSWMS routines and the new parallelized solver do not generate discrepancies between simulations.
In addition, the PARSWMS simulations for scenarios S2 and S3 (Table 1) compared well with the SWMS_3D results. Figure 2 shows a cross-section of the cylindrical lysimeter concentration distribution during the solute breakthrough, and again, both simulations yield almost identical results.
|
CPU Time Scaling Behavior
The speedup of using parallel processors on a problem compared with using only one serial processor is governed by Amdahl's law (Amdahl, 1967). This law states that if F is the fraction of a calculation that is sequential, and (1 F) is the fraction that is parallelized, the maximum speedup that can be achieved on np processors is
![]() |
In our case, since roughly 100% of the code is parallelized, speedup versus np should optimally be a 1:1 line (or 1 slope line when time gain is plotted in function of np). Figure 3 shows the time scaling for the scenarios S2 and S3 compared with Amdahl's law. We plotted the relative time (tnp/tnp=1) as a function of the number of processors np used for the run. The time scaling is almost linear and remarkably close to the optimum defined by Amdahl's law. There is a short deviation from the linear trend for large numbers of processors, which is likely due to the architecture of the supercomputer, which is composed of a series of nodes having 32 interconnected processors. The internode being lower than the intranode connection speed, doubling the number of processors beyond 25, apparently decreases the code performance.
|
Conclusions
We successfully developed a parallel code based on SWMS_3D for solving Richards' equation and CDE under variably saturated conditions. The PARSWMS code was developed so that it can be compiled and run on any network (or even a uniprocessor) of LINUX or UNIX workstations or on specialized parallel hardware as long as MPI, PETSc, and PARMETIS freeware is installed. A comprehensive benchmarking was performed to validate the new code. The performances were investigated on a massive parallel computer. The two case studies indicate that doubling the number of processors may lead to a decrease of the computational time up to 48%. However, this figure may be affected by the type of boundary conditions and the geometry of the problem. Thanks to this parallelized algorithm and with better accessibility to cluster calculation and supercomputers, we expect that problems with larger amount of nodes may be tackled in the near future.
Next steps include a coupling with a particle tracking code and with advanced root soil modeling approaches.
Appendix
The scale evolution of computational time performance was performed on the jump supercomputer at the Forschungszentrum Juelich IBM Regatta p690+. Main characteristic features of this machine include the following:
ACKNOWLEDGMENTS
We thank the John von Neumann, Institute for Computing (NIC), Jülich, for providing the opportunity to use their parallel computing resources.
REFERENCES
This article has been cited by other articles:
![]() |
H. Vereecken, P. Burauel, J. Groeneweg, E. Klumpp, W. Mittelstaedt, H.-D. Narres, T. Putz, J. van der Kruk, J. Vanderborght, and F. Wendland Research at the Agrosphere Institute: From the Process Scale to the Catchment Scale Vadose Zone J., August 11, 2009; 8(3): 664 - 669. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Schroder, M. Javaux, J. Vanderborght, B. Korfgen, and H. Vereecken Effect of Local Soil Hydraulic Conductivity Drop Using a Three-Dimensional Root Water Uptake Model Vadose Zone J., August 13, 2008; 7(3): 1089 - 1098. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Simunek and S. A. Bradford Vadose Zone Modeling: Introduction and Importance Vadose Zone J., May 27, 2008; 7(2): 581 - 586. [Full Text] [PDF] |
||||
![]() |
J. Simunek, M. Th. van Genuchten, and M. Sejna Development and Applications of the HYDRUS and STANMOD Software Packages and Related Codes Vadose Zone J., May 27, 2008; 7(2): 587 - 600. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| The SCI Journals | Agronomy Journal | Crop Science | |||
| Journal of Natural Resources and Life Sciences Education |
Soil Science Society of America Journal | ||||
| Journal of Plant Registrations | Journal of Environmental Quality |
The Plant Genome | |||