Evaluation of the solution - Acceleration of the space-time boundary element method using GPUs

After solving the system of equations (12) or (13) we know both the Dirichlet and Neumann datau and w. For unification of the two cases, we denoteu_h =g_h orw_h=h_h as well asu=g orw=h in the Dirichlet or Neumann boundary condition case, respectively.

To get the value of the solutionuin any pointxinside the region enclosed by the discretized boundary Γ_h and in any time t∈ ⟨0, T⟩, where t=t_l+ε=lh_t+ε withε∈ ⟨0, h_t), we need to evaluate the discretized representation formula (ignoring the zero initial potential term)

u(x, t) =V^˜︁(w_h)(x, t)−W(u_h)(x, t),

Modifying the formulas and integrating analytically over time, we get

V˜︁(wh)(x, t) =

The spatial integrals are again calculated using numerical quadrature, standard techniques suffice due to absence of singularities.

3 Analysis of the current code

The code developed within this thesis is a part of the BESTHEA library (Boundary Element Solver for The Heat EquAtion[17]), which aims at efficient parallel solution of problems described in the previous section.

The goal of this section is to give the reader insight into how the library currently func-tions, with emphasis on the parts of the code which are the target of GPU acceleration. The acceleration of the code itself will be discussed in the next sections.

The library is written inC++(specifically theC++17standard [6]) and is built using CMake [3]

and GNU Make. Intel Math Kernel Library has to be installed on the system, along with any MPI library.

3.1 Overview of the library

The whole library is located inside the besthea namespace, which consists of several other namespaces containing classes and structures enabling a user to find numerical solution to the heat equation efficiently and in parallel. This includes classes for managing meshes, matrix and vector types, assembler of the boundary element matrices, evaluator of the potentials and many other. A diagram of the main namespaces, classes and their inheritance hierarchy is shown in Figure 6.

A typical workflow used to find a numerical solution to the heat equation utilizing the BESTHEA library is following.

1. Create space-time mesh,

2. create and populate the boundary condition vector, 3. create and assemble necessary matrices,

4. solve the system,

5. evaluate the solution in points of space-time grid.

An example code demonstrating the numerical solution to a Dirichlet problem with zero initial condition using the library is shown in Listing 1, which we describe in greater detail in the following paragraphs.

The library uses two aliases for basic types –sc (for scalar) representing used floating point type, and lo (for local ordinal) used for indexing. In this thesis we always use sc=double and lo=long (64-bit integer on Unix-like 64-bit systems).

The space-time mesh represented by theuniform_spacetime_tensor_meshclass is a tensor product of a spatial surface mesh and a uniform discretization of the time interval (0, T). The surface mesh (triangular_surface_mesh) can be either created from a tetrahedral volume mesh, or loaded directly from a file as shown in the example on line 31. The spacetime tensor mesh is then created and refined, that is the elements are divided into smaller elements to obtain finer discretization.

The discretized boundary element spaces X_h^0,0 and X_h^0,1 are represented by (appropriately templated) classuniform_spacetime_be_space and can be created using the space-time mesh (lines 36 and 37). Using their L2_projectionmethod (as shown on line 41), one can project a user-defined boundary condition function (we use bc_dir_funcdefined on lines 8–20) onto the

Figure 6: Diagram of the main classes and namespaces in BESTHEA library

1 # i n c l u d e < string >

Listing 1: Solution of the Dirichlet problem using the BESTHEA library

appropriate discrete space, thus populating the boundary condition vector, which is an instance of the block_vector class.

We then create and assemble all the necessary matrices required for solving the problem (lines 44–57). The main boundary element matrices are all block lower triangular with block Toeplitz structure, thus are represented by the block_lower_triangular_toeplitz_matrix class. To get an assembled main boundary element matrix, we first create an empty matrix, create instance of an assembler classuniform_spacetime_be_assembler, and finally call theassemblemethod on the assembler with the matrix as a parameter to fill it with values. Creation of the mass matrix is simpler, we only need to instantiate the class uniform_spacetime_be_identityand call its method assemble.

Since K^⊤_h^s can be obtained from K_h by transposing its blocks, only K_h is actually needed.

Whenever we need to use K^⊤_h^s, we provide K_h and specify a special indicator marking that the blocks of the matrix should be transposed. Analogous approach can be used with the mass matrices Mh and M^⊤_h.

After assembling the right-hand side vector on lines 60–62 using theapply method on the matrices to perform matrix-vector multiplication, the system of equations is ready to be solved.

This is usually done by calling themkl_fgmres_solve method on the system matrix, as shown on line 68. This method uses FGMRES algorithm (flexible generalized minimal residual method [22]) implemented in the Intel Math Kernel Library (MKL).

After the system is solved, both the Dirichlet and Neumann data u and w are known and can be used to calculate values of the solution in points of space-time grid using the represen-tation formula. We load a volume mesh (see line 73) and create appropriate instances of the uniform_spacetime_be_evaluator class for evaluating the single and double layer potential (lines 77 and 82). Using their evaluate method the values of the potentials are calculated in nodes of the volume mesh in all timesteps, as shown on lines 78 and 83. Finally, the single and double layer potentials are combined to fill the block vector with values of the solution in the space-time grid on lines 86 and 87.

In document Acceleration of the space-time boundary element method using GPUs (Stránka 31-36)