|
|
Overview
What
is flowCart?
What
kind of discretization does it use?
How
is it integrated into the Cart3D?
What's
clic?
Parallelization
and Domain Decomposition
Multigrid
Further
Documentation:
What is flowCart?
flowCart
is a scalable, multilevel, solver for the Euler equations governing the
inviscid flow of a compressible fluid. The solver is intended to (ultimately)
replace the Tiger code which most (existing) Cart3D users are familiar
with. Meshes from cubes are treated as unstructured collections of Cartesian
cells, and it takes advantage of the fact that cells are Cartesian wherever
possible to reduce the operation count. Both the parallelization and multigrid
are completely transparent to the user and are turned on by simple command
line arguments to encourage their use.
(top)
What kind of discretization does it use?
Spatial Discretization:
flowCart
uses cell-centered, finite-volume, upwind differencing. There are several
flux functions provided, and it is extensible, so that you can add your
favorite to the growing list of available flux functions. The linear reconstruction
scheme makes it formally second-order accurate. In van Leer's notation,
this is a "kappa = ?" scheme, implying that the stencil for the gradient
reconstruction is central difference. Implementationally, this gradient
is computed using a least-squares reconstruction, and the least squares
problem is solved via the normal equations. At cut-cells, and cells
which neighbor refinement boundaries, the reconstruction is always performed
using true cell and face centroids. This results in substantially improved
accuracy at wall-boundaries and through refinement boundaries. Numerical
accuracy assessments (AIAA
2000-0808 610Kb, ps.gz format) demonstrate an order of accuracy of
1.88, and validation cases have demonstrated that discrete solutions from
this solver are comparable with body-fitted structured and unstructured
solvers for comparable numbers of cells. If you're an existing Cart3D user,
you can expect noticeably lower dissipation at wall boundaries, and substantially
improved shock resolution and propagation.
Temporal Discretization:
Advance to steady
state is performed by an unstructured, nested multigrid procedure. The
solver is not currently time-accurate, nor is a time-accurate mode (currently)
provided. A user selectable Runge-Kutta scheme provides inner smoothing
to drive the multigrid. A collection of Runge-Kutta schemes are provided
for you to choose from, (clic here to see
how) and you can substitute in your favorite one in addition to those provided.
Multigrid performance is competitive with those found in the literature
and is really easy to use, again see AIAA
2000-0808 for details and examples. Of course, if you just want to
drive your solutions with plain old Runge-Kutta, you're free to do so.
In all cases, flowCart drives the cut-cells somewhat more aggressively
than Tiger does since it uses a monotonicity preserving timestep in these
cells, rather than a simple area scaling. As a result, even single grid
(no multigrid) runs converge somewhat faster.
(top)
How is it integrated into the Cart3D?
flowCart
is tightly integrated into Cart3D. Pre/Post processing operations have
been substantially reduced. Since it was written specifically as a module
for Cart3D, file translation steps have been completely eliminated. The
cubes
mesh generator puts out Mesh.c3d files, and these can be used directly
by flowCart, without translation. flowCart can be asked to extract Cp's
and other flow quantities both on the body's surface and on cutting planes
through the domain directly (-T
and -clic options). It
also provides both residual and lift/drag information for convergence monitoring
(the -his option). After
a run, you can extract a variety of information using clic. Loads
and moment information are all conservatively transferred back to the input
surface triangulation, so that you can postprocess on the surface without
having to load the entire discrete solution.
(top)
What's clic?
clic
is a Component based force and moment module developed as a post-processor
and data-extractor for Cart3D. Its an extremely flexible and powerful package.
Using clic, you can extract Cp cuts on any component or group of
components in your configuration, you can compute LDM (lift, drag, moment)
for components, component groups or configurations, and extract the usual
bevy of point-moments, line-moments ("hinge moments") etc.. If you want
to see some of what it was designed to do, take a look at the original
ISO software project plan (here, 64kb acrobat
format). Clic can be run as a postprocessor, or called directly through
an API. The clic home page will get you started
with the package.
(top)
Parallelization and Domain Decomposition
flowCart
uses a domain decomposition approach to parallelization. As a result, scalability
on large numbers of processors is quite good. Speedups in excess of 56
on 64 processors are typical. In domain decomposition approaches, the computation
proceeds on subdomains that are farmed off to the machine's processors.
After a certain amount of work is done, each subdomain passes information
to its neighboring subdomain in an explicitly coded communication step.
OpenMP
In its current form,
flowCart
communicates using the OpenMP standard.
Thus, although it uses explicit message passing, it is formally a shared
memory model. For machines with physically distributed memory, OpenMP requires
a distributed shared
memory layer (DVSM) to be present on the machine so that addresses
of physically distant memory are addressable by all processors. This paradigm
works well on a variety of existing machines, and more machines are being
developed with this programming model in mind. We currently have an MPI
port in progress, but this is not the highest priority. In addition, we
are working with KAI to ensure compatibility
with the DVSM that they are developing for Beowulf type machines. We're
not tied ot KAI, nor are we promoting their products, If you are developing
a DVSM, and want an ANSI standard C application to test it out on, contact
us.
Domain Decomposition
Our domain decomposition
approach was designed to be transparent to the user. It relies on a space-filling-curve
based reordering of the cubes mesh which is then partitioned on
the fly when you startup the solver. All you need to know is that if you
want to run in parallel you just have to (1) reorder the mesh with the
reorder
utility, and (2) set your MP_SET_NUMTHREADS environment variable
to the desired number of processors, flowCart does everything else.
Here's how you'd do that for a 16 CPU case, (in csh):
| 1.% reorder |
# reorder the mesh (using
default file names) |
| 2.% setenv MP_SET_NUMTHREADS
16 |
# choose number of processors |
| 3.% flowCart |
#
start the run |
(top)
Multigrid
flowCart
uses Full Approximation Storage (FAS) multigrid for convergence acceleration.
You invoke it via the -mg %dcommand
line option (e.g. for 4 levels of multigrid you'd type "% flowCart
-mg 4"). By default its set up to do use
a full multigrid startup procedure. This means that computation starts
on the coarsest mesh, once that converges a bit, the next finer mesh gets
into the act and it does two level multigrid. After a while the evolving
solution gets transferred to the next finest mesh, and it does three level
multigrid. This continues until we've reached the finest mesh, and then
the entire mesh hierarchy is used to converge the solution on the finest
mesh. When you run in parallel, each processor gets its own hierarchy of
fine to coarse meshes to work with, here is what it would look like for
2 processors:
Multigrid obviously needs
a sequence of coarse meshes to get going. These meshes are derived from
the fine grid that you make with cubes. We use a special coarsening
procedure which attempts to coarsen with an 8:1 coarsening ratio. Anytime
a parent cell finds all its children at the same level of refinement, that
parent cell gets inserted into the coarse mesh. Of course there are situations
where one or more children has further subdivision, and coarsening gets
"suspended" in that cell. As a result, coarsening ratios with this algorithm
approach
(but do not exceed) 8:1. In practice, finer meshes achieve coarsening ratios
above 7:1, which is sufficient to ensure good smoothing. W-cycle multigrid
requires at least 4:1 to maintain its (theoretical) constant time bound.
Meshes at any level in the hierarchy fully cover the computational domain,
and may contain cells at many levels of refinement.
The mgPrep
utility takes a reordered fine mesh (output from reorder)
and creates a sequence of coarse meshes directly from this mesh. Since
the fine mesh has already been reordered, the coarse meshes will
automatically be ready for domain-decomposition, should you wish to run
the final multigrid run in parallel. When a multigrid mesh is run in parallel,
all meshes in the hierarchy get automatically partitioned using the exact
same space-filling-curves, so coarse meshes in a given partition overlaps
maximally with the finer grids that it supports.
Building coarse meshes with
mgPrep
is extremely easy. mgPrep takes in a reordered mesh (usually called
Mesh.R.c3d)
and outputs the input mesh, and the series of coarse meshes generated.
The hierarchy of meshes is stored in a single file, usually called Mesh.mg.c3d.
Here's how (starting from Mesh.c3d
output from cubes):
| 1.% reorder |
# reorder the mesh (using
default file names) |
| 2.% mgPrep -n 6 |
# generate coarser meshes (orig
+ 5 coarser) |
| 3.% setenv MP_SET_NUMTHREADS
16 |
# choose number of processors |
| 4 % flowCart -mg 6 |
#
run on 16 CPU's with 6 level multigrid |
Note: This
example has some options that you could play with, for example, step 2
creates 5 levels of coarser meshes (with the original mesh this makes 6
meshes total). By default it creates Mesh.mg.c3d. flowCart's
input
control file now needs to point to this file as the input mesh. In
step 3, the example uses all 6 meshes in the hierarchy, but you don't need
to, you could use anywhere from 1 to 6 levels of mesh, so running on 3
meshes (for example) with "% flowCart -mg 3" would
be an equally valid command line. Even running single mesh "%
flowCart" will work fine (in this
case,
-mg 1, is the default).
(top)
Further Documentation:
Technical information
on all topics covered in this overview are addressed in "A
parallel multilevel method for adaptively refined Cartesian grids with
embedded boundaries". AIAA Paper 2000-0808, Jan 2000.
(top)
last update 21 Jun 2000, M. Aftosmis
|
|