|
|
Overview
What
is flowCart?
What
kind of discretization does it use?
How
is it integrated into the Cart3D?
What's
clic?
Parallelization
and Domain Decomposition
Multigrid
Further
Documentation:
What is flowCart?
flowCart
is a scalable, multilevel, solver for the Euler equations governing the
inviscid flow of a compressible fluid. Meshes from cubes are treated as
unstructured collections of Cartesian
cells, and it takes advantage of the fact that cells are Cartesian
wherever
possible to reduce the operation count. Both the parallelization and
multigrid
are completely transparent to the user and are turned on by simple
command
line arguments to encourage their use. OpenMP
and MPI versions of
flowCart use the same command line arguments, and scale similarly.
(top)
What kind
of discretization does it use?
Spatial
Discretization:
flowCart
uses cell-centered, finite-volume, upwind differencing. There are
several
flux functions provided, and it is extensible, so that you can add your
favorite to the growing list of available flux functions. The linear
reconstruction
scheme makes it formally second-order accurate, and great pains have
been taken to ensure that this accuracy is preserved through mesh
refinement boundaries and in the arbitrarilly shaped cells that appear
against
geometric boundaries.
The stencil for the gradient
reconstruction is central difference, and the flux is upwind.
Implementationally, the gradient
is computed using a least-squares reconstruction, and the least squares
problem is solved via the normal equations. At cut-cells, and
cells
which neighbor refinement boundaries, the reconstruction is always
performed
using true cell and face centroids. This reconstruction is linearity
preserving, even where the mesh is extremely non-uniform. This results
in substantially
improved
accuracy at wall-boundaries and through refinement interfaces.
Numerical
accuracy assessments (AIAA 2002-0863 2.4Mb pdf
format, & AIAA
2000-0808 610Kb, ps.gz format) and validation cases demonstrate true second-order
accuracy both formally and in practice. Results with
this solver are comparable with body-fitted structured and unstructured
solvers for comparable numbers of cells. Most users with experience in
other packages notice that flowCart
has substantially improved wave propagation, and very low dissipation.
Temporal
Discretization:
Advance
to steady
state is performed by an unstructured, nested multigrid procedure. The
solver is not currently time-accurate, nor is a time-accurate mode
(currently)
provided. A user selectable Runge-Kutta scheme provides inner smoothing
to drive the multigrid. A collection of Runge-Kutta schemes are
provided
for you to choose from, (clic here to
see
how) and you can substitute in your favorite one in addition to those
provided.
Multigrid performance is competitive with those found in the literature
and is really easy to use, again see AIAA
2000-0808 for details and examples. Of course, if you just want to
drive your solutions with plain old Runge-Kutta, you're free to do so.
In all cases, flowCart drives the cut-cells somewhat more
aggressively
than Tiger does since it uses a monotonicity preserving timestep in
these
cells, rather than a simple area scaling. As a result, even single grid
(no multigrid) runs converge somewhat faster.
(top)
How is it
integrated into the Cart3D package?
flowCart
is tightly integrated into Cart3D. Pre/Post processing operations have
been substantially reduced. Since it was written specifically as a
module
for Cart3D, file translation steps have been completely eliminated. The
cubes
mesh generator puts out Mesh.c3d files, and these can be used directly
by flowCart, without translation. flowCart can be asked to extract Cp's
and other flow quantities both on the body's surface and on cutting
planes
through the domain directly (-T
and -clic options).
It
also provides both residual and lift/drag information for convergence
monitoring
(the -his option).
After
a run, you can extract a variety of information using clic. Loads
and moment information are all conservatively transferred back to the
input
surface triangulation, so that you can postprocess on the surface
without
having to load the entire discrete solution.
(top)
What's clic?
clic
is a Component based force and moment module developed as a
post-processor
and data-extractor for Cart3D. Its an extremely flexible and powerful
package.
Using clic, you can extract Cp cuts on any component or group
of
components in your configuration, you can compute LDM (lift, drag,
moment)
for components, component groups or configurations, and extract the
usual
bevy of point-moments, line-moments ("hinge moments") etc.. If you want
to see some of what it was designed to do, take a look at the original
ISO software project plan (here, 64kb
acrobat
format). Clic can be run as a postprocessor, or called directly through
an API. The clic home page will get you
started
with the package.
(top)
Parallelization
and Domain Decomposition
flowCart
uses a domain decomposition approach to parallelization. As a result,
scalability
on large numbers of processors is quite good. Speedups in excess of 60
on 64 processors are typical. In domain decomposition approaches, the
computation
proceeds on subdomains that are farmed off to the machine's processors.
After a certain amount of work is done, each subdomain passes
information
to its neighboring subdomain in an explicitly coded communication step.
OpenMP
flowCart
was developed using the OpenMP
standard.
However, it is setup like a distributed memory code, in that it
uses domain decomposition, and memory locality is carefully controlled.
Thus, although it uses explicit message passing, it is formally a
shared
memory model. For machines with physically distributed memory, OpenMP
requires
a distributed
shared
memory layer (DVSM) to be present on the machine so that addresses
of physically distant memory are addressable by all processors. This
paradigm
works well on a variety of existing machines, and more machines are
being
developed with this programming model in mind.
MPI
mpi_flowCart first appeared as part
of the v1.2 release of Cart3D. This required adding an MPI
backend to most of the comunication and partitioning routines. As of
release v1.3, this version is fully supported. It runs on shared and
distributed platforms, using either the native MPI or MPICH where no
native MPI is availible (distributed clusters, for example).
Scalability of the MPI code is almost as good as that with the OpenMP
version (see a comparison here
on an SGI Origin 3800).
Domain
Decomposition
Our
domain decomposition
approach was designed to be transparent to the user. It relies on a space-filling-curve
based reordering of the cubes mesh which is then
partitioned on
the fly when you startup the solver. All you need to know is that if
you
want to run in parallel you just have to (1) reorder the mesh with the reorder
utility, and (2) set your OMP_NUM_THREADS environment variable
to the desired number of processors, flowCart does everything
else. Here's
how you'd do that for a 16 CPU case using OpenMP, (in csh):
| 1.% reorder |
#
reorder the mesh (using
default file names) |
| 2.%
setenv OMP_NUM_THREADS
16 |
# choose
number of processors |
| 3.%
flowCart |
#
start the run |
With MPI, the procedure is basically the
same, but you use the mpirun
utility to start the solver:
| 1.%
reorder |
#
reorder the mesh (using
default file names) |
2.%
mpirun -np 16 mpi_flowCArt
|
# use
mpirun to set num. of processors and run
|
(top)
Multigrid
flowCart
uses Full Approximation Storage (FAS) multigrid for convergence
acceleration.
You invoke it via the -mg %d command
line option (e.g. for 4 levels of multigrid you'd type "%
flowCart
-mg 4"). By default its set up to do
use
a full multigrid startup procedure. This means that computation starts
on the coarsest mesh, once that converges a bit, the next finer mesh
gets
into the act and it does two level multigrid. After a while the
evolving
solution gets transferred to the next finest mesh, and it does three
level
multigrid. This continues until we've reached the finest mesh, and then
the entire mesh hierarchy is used to converge the solution on the
finest
mesh. When you run in parallel, each processor gets its own hierarchy
of
fine to coarse meshes to work with, here is what it would look like for
2 processors:
Multigrid
obviously needs
a sequence of coarse meshes to get going. These meshes are derived from
the fine grid that you make with cubes. We use a special
coarsening
procedure which attempts to coarsen with an 8:1 coarsening ratio.
Anytime
a parent cell finds all its children at the same level of refinement,
that
parent cell gets inserted into the coarse mesh. Of course there are
situations
where one or more children has further subdivision, and coarsening gets
"suspended" in that cell. As a result, coarsening ratios with this
algorithm approach
(but do not exceed) 8:1. In practice, finer meshes achieve coarsening
ratios
above 7:1, which is sufficient to ensure good smoothing. W-cycle
multigrid
requires at least 4:1 to maintain its (theoretical) constant time
bound.
Meshes at any level in the hierarchy fully cover the computational
domain,
and may contain cells at many levels of refinement.
The mgPrep
utility takes a reordered fine mesh (output from reorder)
and creates a sequence of coarse meshes directly from this mesh. Since
the fine mesh has already been reordered, the coarse meshes
will
automatically be ready for domain-decomposition, should you wish to run
the final multigrid run in parallel. When a multigrid mesh is run in
parallel,
all meshes in the hierarchy get automatically partitioned using the
exact
same space-filling-curves, so coarse meshes in a given partition
overlaps
maximally with the finer grids that it supports.
Building
coarse meshes with mgPrep
is extremely easy. mgPrep takes in a reordered mesh (usually
called Mesh.R.c3d)
and outputs the input mesh, and the series of coarse meshes generated.
The hierarchy of meshes is stored in a single file, usually called Mesh.mg.c3d.
Here's how (starting from Mesh.c3d
output from cubes):
| 1.%
reorder |
#
reorder the mesh (using
default file names) |
| 2.%
mgPrep -n 6 |
# generate
coarser meshes (orig
+ 5 coarser) |
| 3.%
setenv MP_SET_NUMTHREADS
16 |
# choose
number of processors |
| 4 %
flowCart -mg 6 |
#
run on 16 CPU's with 6 level multigrid |
Note:
This
example has some options that you could play with, for example, step 2
creates 5 levels of coarser meshes (with the original mesh this makes 6
meshes total). By default it creates Mesh.mg.c3d. flowCart's
input
control file now needs to point to this file as the input mesh. In
step 3, the example uses all 6 meshes in the hierarchy, but you don't
need
to, you could use anywhere from 1 to 6 levels of mesh, so running on 3
meshes (for example) with "% flowCart -mg 3" would
be an equally valid command line. Even running single mesh "%
flowCart" will work fine (in this
case, -mg 1, is
the default).
(top)
Further
Documentation:
Technical
information
on all topics covered in this overview are addressed in
(top)
last update Jun. 2004, M. Aftosmis
|
|