On improving accuracy of the error estimates in the conjugate gradient method

(1)

On improving accuracy of the error estimates in the conjugate gradient method

Petr Tichý

Faculty of Mathematics and Physics, Charles University

based on joint work with Gérard Meurant and Zdeněk Strakoš

PANM 19

June 24-29, 2018, Hejnice, Czech Republic

(2)

The conjugate gradient method

Ais symmetric and positive definite,Ax=b

input A,b,x0

p₀ =r₀ =b−Ax₀ for k= 1,2, . . . do

αk−1 = r^T_k−1rk−1

p^T_k−1A pk−1

x_k = x_k−1+α_k−1p_k−1 r_k = rk−1−αk−1A pk−1

β_k = r_k^Tr_k r_k−1^T rk−1

p_k = r_k+β_kpk−1

end for

exact arithmetic



 y

orthogonality ri ⊥rj pi ⊥_Apj

optimality ofx_k

y∈Kmin_kkx−yk_A.

2

(3)

Estimating the A-norm of the error

A brief history

kx−xkk²_A . . . measureof the “goodness” of xk

[Hestenes, Stiefel 1952]

Gauss quadratureerror bounds (Lanczos) →GQL

[Dahlquist, Golub, Nash 1978],[Golub, Meurant 1994]

Estimating errors in CG → CGQL→ CGQ

[Golub, Strakoš 1994],[Golub, Meurant, 1997],[Meurant, T. 2013]

Why it works in finite precisionarithmetic

[Golub, Strakoš 1994],[Strakoš, T. 2002, 2005].

(4)

Estimating the A-norm of the error

A brief history

Gaussquadrature error bounds (Lanczos) →GQL

3

(5)

Estimating the A-norm of the error

A brief history

(6)

Estimating the A-norm of the error

A brief history

3

(7)

Quadrature bounds

(8)

Gauss and Gauss-Radau quadrature bounds

Given µ ≤ λ_min, it holds that

α_kkr_kk² < kx−x_kk²_A < α^(µ)_k kr_kk²

where

α^(µ)_k+1=

α^(µ)_k −αk

µα^(µ)_k −αk

+βk+1

, α^(µ)₀ = 1 µ.

Practically relevant questions:

How to getµ?

Qualityof the bound? Numericalbehavior?

5

(9)

Gauss and Gauss-Radau quadrature bounds

Given µ ≤ λ_min, it holds that

α_kkr_kk² < kx−x_kk²_A < α^(µ)_k kr_kk²

where

α^(µ)_k+1=

α^(µ)_k −αk

µα^(µ)_k −αk

+βk+1

, α^(µ)₀ = 1 µ.

Practically relevant questions:

How to getµ?

Qualityof the bound?

(10)

Upper bound in exact arithmetic

Gauss-Radau bound, bcsstk01 matrix,n= 48

µ= λ_min

1 + 10^−m, m= 2, . . . ,14

0 10 20 30 40 50

10⁻⁵ 10⁻⁴ 10⁻³ 10⁻² 10⁻¹

A−norm of the error Gauss−Radau bound

6

(11)

Upper bound in finite precision arithmetic

Gauss-Radau bound, bcsstk01 matrix,n= 48

µ= λmin

1 + 10^−m, m= 2, . . . ,14

10⁻¹⁵ 10⁻¹⁰ 10⁻⁵ 10⁰

(12)

Upper bound in finite precision arithmetic

µ > λmin, bcsstk01 matrix,n= 48

µ= λmin

1−10^−m, m= 2 : 2 : 14, α^(µ)_k <0

0 50 100 150

10⁻¹⁵ 10⁻¹⁰ 10⁻⁵ 10⁰

8

(13)

An upper bound on the upper bound

(14)

An upper bound on the Gauss-Radau bound, µ ≤ λ

_min

[Meurant, T. 2018?]

kx−x_kk²_A < α^(µ)_k kr_kk² < kr_kk² µ

kr_kk² kp_kk²

0 50 100 150

10⁻¹⁵ 10⁻¹⁰ 10⁻⁵ 10⁰

A−norm of the error Gauss−Radau bound new bound

10

(15)

The new bound

[Meurant, T. 2018?]

kx−x_kk²_A < α^(µ)_k kr_kk² < kr_kk² µ

kr_kk² kp_kk²

Havingµ, we can compute it almostfor free.

Monotonically decreasing.

Not sensitive to the choice ofµ.

As good as Gauss-Radau in many cases.

It can be used even if µ > λ_min (heuristics).

(16)

How to approximate µ?

12

(17)

The conjugate gradient method

Ais symmetric and positive definite,Ax=b

input A,b,x0

p₀ =r₀ =b−Ax₀ for k= 1,2, . . . do

αk−1 = r_k−1^T rk−1

p^T_k−1A pk−1

x_k = x_k−1+α_k−1p_k−1 r_k = rk−1−αk−1A pk−1

β_k = r^T_kr_k r_k−1^T rk−1

pk = rk+βkpk−1

end for

U_k







√1 α0

q_β

1

α0

. .. . ..

. .. ^r^β^k−1

αk−2

√1 αk−1







T_k = U^T_kU_k

(18)

Approximation of λ

_min

and λ

_max

in CG

+

Q Q QQ k

9

XX XX y

9

XX XX XX XX XX XX X y

A T_k EST

T_k=U^T_kU_k → λmin(T_k) = 1/kU⁻¹_k k²

How to approximatekU⁻¹_k k²?

14

(19)

Incremental estimation

of the smallest Ritz value in CG

U_k is bidiagonal,

U⁻¹_k → U⁻¹_k+1

byadding one column and one row.

Incremental norm estimation: incrementally improve an approximation of the maximum right singular vector.

[Bischof 1990],[Duff, Vömmel 2002],[Duintjer Tebbens, Tůma 2014].

(20)

Incremental estimation

of the smallest Ritz value in CG

U_k is bidiagonal,

U⁻¹_k → U⁻¹_k+1

byadding one column and one row.

Incremental norm estimation: incrementally improve an approximation of the maximum right singular vector.

[Bischof 1990],[Duff, Vömmel 2002],[Duintjer Tebbens, Tůma 2014].

15

(21)

Approximation of the smallest Ritz value in CG

Havingα_k andβ_k, update

σ_k = −

sα_kβ_k αk−1

(sk−1σk−1+ck−1τk−1)

τ_k = α_kb²_kτk−1+ 1 ω_k² = (ρ_k−τ_k)²+ 4σ_k²

c²_k = 1 2

1−ρ_k−τ_k ω_k

ρ_k+1 = ρ_k+ω_kc²_k sk =

q 1−c²_k, c_k = |c_k|sign(σ_k) µ_k+1 = ρ⁻¹_k+1

Very cheap, no need to store vectors or coefficients,

(22)

Approximation of λ

_min

bcsstk01,n= 48

20 40 60 80 100 120 140

10^-4 10^-2 10⁰ 10²

min approximation A-norm of the error

λmin−µk

λmin

17

(23)

Approximation of λ

_min

s3dkt3m2,n= 90449,ichol

500 1000 1500 2000 2500 3000 10^-4

10^-2 10⁰ 10²

min approximation A-norm of the error

λ_min−µ_k λ

(24)

Bounds summary

bcsstk01,n= 48

α_kkr_kk² < kx−x_kk²_A . kr_kk² µ_k

kr_kk² kp_kk²

50 100 150

10^-15 10^-10 10^-5 10⁰

A-norm of the error GQ lower bound upper upper bound upper bound

19

(25)

Bounds summary

s3dkt3m2,n= 90449,ichol

α_kkr_kk² < kx−x_kk²_A . kr_kk² µ_k

kr_kk² kp_kk²

500 1000 1500 2000 2500 3000 3500 10^-10

10^-5 10⁰

A-norm of the error Radau upper bound GQ lower bound upper bound

k upper bound

(26)

Improving accuracy of the estimates

kx−x_kk²_A = α_kkr_kk² + kx−x_k+1k²_A

[Golub, Strakoš 1994, Golub, Meurant 1997, Strakoš, T. 2002]

21

(27)

Use a delay and bound (k + d)th error

bcsstk01,n= 48,d= 10

kx−x_kk²_A =

k+d−1

X

j=k

α_jkr_jk² + kx−x_k+dk²_A

50 100 150

10^-15 10^-10 10^-5 10⁰

A-norm of the error GQ lower bound upper bound

k upper bound

(28)

Use a delay and bound (k + d)th error

bcsstk01,n= 48,d= 10

kx−x_kk²_A =

k+d−1

X

j=k

α_jkr_jk² + kx−x_k+dk²_A

50 100 150

10^-15 10^-10 10^-5 10⁰

k upper bound

22

(29)

Use a delay and bound (k + d)th error

s3dkt3m2,n= 90449,ichol,d= 100

kx−xkk²_A =

k+d−1

X

j=k

αjkr_jk² + kx−xk+dk²_A

500 1000 1500 2000 2500 3000 3500 10^-10

10^-5 10⁰

k upper bound

(30)

Use a delay and bound (k + d)th error

s3dkt3m2,n= 90449,ichol,d= 100

kx−xkk²_A =

k+d−1

X

j=k

αjkr_jk² + kx−xk+dk²_A

500 1000 1500 2000 2500 3000 3500 10^-10

10^-5 10⁰

k upper bound

23

(31)

How to choose d adaptively?

kx−xkk²_A =

k+d−1

X

j=k

αjkr_jk²

| {z }

νk,d

+kx−xk+dk²_A,

= ν_k,d + kx−x_k+dk²_A

kx−x_kk²_A kx−x_kk²_A,

If kx−x_k+dk_A

kx−xkk_A < ε, then

ν_k,d^1/2 < kx−x_kk_A < ν_k,d^1/2

√

1−ε², e.g., ε= 0.8.

(32)

How to choose d adaptively?

kx−xkk²_A =

k+d−1

X

j=k

αjkr_jk²

| {z }

νk,d

+kx−xk+dk²_A,

= ν_k,d + kx−x_k+dk²_A

kx−x_kk²_A kx−x_kk²_A,

If kx−x_k+dk_A

kx−xkk_A < ε, then

ν_k,d^1/2 < kx−x_kk_A < ν_k,d^1/2

√

1−ε², e.g., ε= 0.8.

24

(33)

Pseudo algorithm - adaptive choice of d

1: go= 1

2: d=d+ 1

3: while ((d≥1)and (go))do

4: computeνk−d+1,d 5: if _kx−x^kx−x^k^k^A

k−dk_A < εthen

6: compute lower or upper bounds at iterationk−d

7: d=d−1

8: else

9: go= 0

10: end if

11: end while

(34)

Adaptive choice of d

µk upper bound approach, bcsstk01

kx−x_k+dk_A kx−x_kk_A .

kr_k+dk

√µk+d

kr_k+dk kp_k+dk

√ν_k,d < ε= 0.5,

50 100 150

10^-15 10^-10 10^-5 10⁰

A-norm of the error with adaptive choice of d

0 50 100 150

-50 0 50

26

(35)

Adaptive choice of d

µk upper bound approach, bcsstk01

kx−x_k+dk_A kx−x_kk_A .

kr_k+dk

√µk+d

kr_k+dk kp_k+dk

√ν_k,d < ε= 0.5,

50 100 150

10^-15 10^-10 10^-5 10⁰

-50 0 50

(36)

Adaptive choice of d

µk upper bound approach, s3dkt3m2

kx−x_k+dk_A kx−xkk_A .

kr_k+dk

√µk+d

kr_k+dk kp_k+dk

√νk,d

< ε= 0.5,

500 1000 1500 2000 2500 3000 10^-10

10^-5 10⁰

500 1000 1500 2000 2500 3000 -2000

0 2000

27

(37)

Adaptive choice of d

kx−x_k+dk_A kx−xkk_A .

kr_k+dk

√µk+d

kr_k+dk kp_k+dk

√νk,d

< ε= 0.5,

500 1000 1500 2000 2500 3000 10^-10

10^-5 10⁰

500 1000 1500 2000 2500 3000 -2000

0 2000

(38)

Adaptive choice of d

2300 2400 2500 2600 2700 2800 2900 10^-10

10^-8 10^-6 10^-4

2300 2400 2500 2600 2700 2800 2900 -200

0 200

28

(39)

Another approach

superliner, linear, sublinear CG convergence

Letα>β > γ >0. Then β−γ α−β > β

α ⇔ β

α > γ β.

Apply to kx−xk−dk²_A > kx−xkk²_A > kx−xk+dk²_A. Then

νk,d

νk−d,d

> kx−x_kk²_A

kx−xk−dk²_A ⇔ kx−x_kk_A

kx−xk−dk_A > kx−x_k+dk_A kx−x_kk_A .

CG convergence:

linear → goodapproximation superliner → upperbound sublinear → lowerbound

(40)

Another approach

α ⇔ β

α > γ β.

νk,d

νk−d,d

> kx−x_kk²_A

CG convergence:

29

(41)

Another approach

α ⇔ β

α > γ β.

νk,d

νk−d,d

> kx−x_kk²_A

CG convergence:

(42)

Yet another approach

decrease formula

For`+e < k andd >0, one can relate kx−x_k+dk²_A

kx−x_kk²_A and kx−x_`+ek²_A kx−x_`k²_A .

It holds that

kx−xk+dk²_A kx−x_kk²_A =







1− 1





1− _kx− ¹

x`+ek2 A kx−x`k2

A







k−1

P

j=`+e

gj

`+e−1

P

j=`

gj

+ 1







k+d−1

P

j=k

g_j

k−1

P

j=`+e

gj

+1

where

g_j =α_jkr_jk².

30

(43)

Yet another approach

decrease formula

For`+e < k andd >0, one can relate kx−x_k+dk²_A

kx−x_kk²_A and kx−x_`+ek²_A kx−x_`k²_A . It holds that

kx−xk+dk²_A kx−x_kk²_A =







1− 1





1− _kx− ¹

x`+ek2 A kx−x`k2

A







k−1

P

j=`+e

gj

`+e−1

P

j=`

gj

+ 1







k+d−1

P

j=k

g_j

k−1

P

j=`+e

g_j +1

where

2

(44)

Conclusions

A new boundon theA-norm of the error:

simple, comparable with the Gauss-Radau bound, it can be used even forµ > λ_min (heuristics).

Cheapapproximations of extreme Ritz values can be used in the new “upper”bound µ ←→ µk,

using these approximations→estimate othercharacteristics. Strategy for improving accuracy of the error estimates

based on thedecreaseofkx−x_kk_A inditerations, three approaches, how to choosedadaptively.

Future work →combine approaches, numerical experiments

31

(45)

Conclusions

using these approximations→estimate othercharacteristics.

Strategy for improving accuracy of the error estimates based on thedecreaseofkx−x_kk_A inditerations, three approaches, how to choosedadaptively.

(46)

Conclusions

31

(47)

Conclusions

(48)

Related papers

G. H. Golub and Z. Strakoš, [Estimates in quadratic formulas, Numer.

Algorithms, 8 (1994), pp. 241–268.]

G. Meurant and P. Tichý,[Practical estimation of theA-norm of the error in CG, to be submitted soon, 2018]

G. Meurant and P. Tichý,[On computing quadrature-based bounds for the A-norm of the error in CG, Numer. Algorithms, 62 (2013), pp. 163-191]

Z. Strakoš and P. Tichý,[On error estimation in CG and why it works in FP computations, Electron. Trans. Numer. Anal., 13 (2002), pp. 56–80.]

Thank you for your attention!

32