On improving accuracy of the error estimates in the conjugate gradient method
Petr Tichý
Faculty of Mathematics and Physics, Charles University
based on joint work with Gérard Meurant and Zdeněk Strakoš
PANM 19
June 24-29, 2018, Hejnice, Czech Republic
The conjugate gradient method
Ais symmetric and positive definite,Ax=b
input A,b,x0
p0 =r0 =b−Ax0 for k= 1,2, . . . do
αk−1 = rTk−1rk−1
pTk−1A pk−1
xk = xk−1+αk−1pk−1 rk = rk−1−αk−1A pk−1
βk = rkTrk rk−1T rk−1
pk = rk+βkpk−1
end for
exact arithmetic
y
orthogonality ri ⊥rj pi ⊥Apj
optimality ofxk
y∈Kminkkx−ykA.
2
Estimating the A-norm of the error
A brief history
kx−xkk2A . . . measureof the “goodness” of xk
[Hestenes, Stiefel 1952]
Gauss quadratureerror bounds (Lanczos) →GQL
[Dahlquist, Golub, Nash 1978],[Golub, Meurant 1994]
Estimating errors in CG → CGQL→ CGQ
[Golub, Strakoš 1994],[Golub, Meurant, 1997],[Meurant, T. 2013]
Why it works in finite precisionarithmetic
[Golub, Strakoš 1994],[Strakoš, T. 2002, 2005].
Estimating the A-norm of the error
A brief history
kx−xkk2A . . . measureof the “goodness” of xk
[Hestenes, Stiefel 1952]
Gaussquadrature error bounds (Lanczos) →GQL
[Dahlquist, Golub, Nash 1978],[Golub, Meurant 1994]
Estimating errors in CG → CGQL→ CGQ
[Golub, Strakoš 1994],[Golub, Meurant, 1997],[Meurant, T. 2013]
Why it works in finite precisionarithmetic
[Golub, Strakoš 1994],[Strakoš, T. 2002, 2005].
3
Estimating the A-norm of the error
A brief history
kx−xkk2A . . . measureof the “goodness” of xk
[Hestenes, Stiefel 1952]
Gaussquadrature error bounds (Lanczos) →GQL
[Dahlquist, Golub, Nash 1978],[Golub, Meurant 1994]
Estimating errors in CG → CGQL→ CGQ
[Golub, Strakoš 1994],[Golub, Meurant, 1997],[Meurant, T. 2013]
Why it works in finite precisionarithmetic
[Golub, Strakoš 1994],[Strakoš, T. 2002, 2005].
Estimating the A-norm of the error
A brief history
kx−xkk2A . . . measureof the “goodness” of xk
[Hestenes, Stiefel 1952]
Gaussquadrature error bounds (Lanczos) →GQL
[Dahlquist, Golub, Nash 1978],[Golub, Meurant 1994]
Estimating errors in CG → CGQL→ CGQ
[Golub, Strakoš 1994],[Golub, Meurant, 1997],[Meurant, T. 2013]
Why it works in finite precisionarithmetic
[Golub, Strakoš 1994],[Strakoš, T. 2002, 2005].
3
Quadrature bounds
Gauss and Gauss-Radau quadrature bounds
Given µ ≤ λmin, it holds that
αkkrkk2 < kx−xkk2A < α(µ)k krkk2
where
α(µ)k+1=
α(µ)k −αk
µα(µ)k −αk
+βk+1
, α(µ)0 = 1 µ.
Practically relevant questions:
How to getµ?
Qualityof the bound? Numericalbehavior?
5
Gauss and Gauss-Radau quadrature bounds
Given µ ≤ λmin, it holds that
αkkrkk2 < kx−xkk2A < α(µ)k krkk2
where
α(µ)k+1=
α(µ)k −αk
µα(µ)k −αk
+βk+1
, α(µ)0 = 1 µ.
Practically relevant questions:
How to getµ?
Qualityof the bound?
Upper bound in exact arithmetic
Gauss-Radau bound, bcsstk01 matrix,n= 48
µ= λmin
1 + 10−m, m= 2, . . . ,14
0 10 20 30 40 50
10−5 10−4 10−3 10−2 10−1
A−norm of the error Gauss−Radau bound
6
Upper bound in finite precision arithmetic
Gauss-Radau bound, bcsstk01 matrix,n= 48
µ= λmin
1 + 10−m, m= 2, . . . ,14
10−15 10−10 10−5 100
A−norm of the error Gauss−Radau bound
Upper bound in finite precision arithmetic
µ > λmin, bcsstk01 matrix,n= 48
µ= λmin
1−10−m, m= 2 : 2 : 14, α(µ)k <0
0 50 100 150
10−15 10−10 10−5 100
A−norm of the error Gauss−Radau bound
8
An upper bound on the upper bound
An upper bound on the Gauss-Radau bound, µ ≤ λ
min[Meurant, T. 2018?]
kx−xkk2A < α(µ)k krkk2 < krkk2 µ
krkk2 kpkk2
0 50 100 150
10−15 10−10 10−5 100
A−norm of the error Gauss−Radau bound new bound
10
The new bound
[Meurant, T. 2018?]
kx−xkk2A < α(µ)k krkk2 < krkk2 µ
krkk2 kpkk2
Havingµ, we can compute it almostfor free.
Monotonically decreasing.
Not sensitive to the choice ofµ.
As good as Gauss-Radau in many cases.
It can be used even if µ > λmin (heuristics).
How to approximate µ?
12
The conjugate gradient method
Ais symmetric and positive definite,Ax=b
input A,b,x0
p0 =r0 =b−Ax0 for k= 1,2, . . . do
αk−1 = rk−1T rk−1
pTk−1A pk−1
xk = xk−1+αk−1pk−1 rk = rk−1−αk−1A pk−1
βk = rTkrk rk−1T rk−1
pk = rk+βkpk−1
end for
Uk
√1 α0
qβ
1
α0
. .. . ..
. .. rβk−1
αk−2
√1 αk−1
Tk = UTkUk
Approximation of λ
minand λ
maxin CG
+
Q Q QQ k
9
XX XX y
9
XX XX XX XX XX XX X y
A Tk EST
Tk=UTkUk → λmin(Tk) = 1/kU−1k k2
How to approximatekU−1k k2?
14
Incremental estimation
of the smallest Ritz value in CG
Uk is bidiagonal,
U−1k → U−1k+1
byadding one column and one row.
Incremental norm estimation: incrementally improve an approximation of the maximum right singular vector.
[Bischof 1990],[Duff, Vömmel 2002],[Duintjer Tebbens, Tůma 2014].
Incremental estimation
of the smallest Ritz value in CG
Uk is bidiagonal,
U−1k → U−1k+1
byadding one column and one row.
Incremental norm estimation: incrementally improve an approximation of the maximum right singular vector.
[Bischof 1990],[Duff, Vömmel 2002],[Duintjer Tebbens, Tůma 2014].
15
Approximation of the smallest Ritz value in CG
Havingαk andβk, update
σk = −
sαkβk αk−1
(sk−1σk−1+ck−1τk−1)
τk = αkb2kτk−1+ 1 ωk2 = (ρk−τk)2+ 4σk2
c2k = 1 2
1−ρk−τk ωk
ρk+1 = ρk+ωkc2k sk =
q 1−c2k, ck = |ck|sign(σk) µk+1 = ρ−1k+1
Very cheap, no need to store vectors or coefficients,
Approximation of λ
minbcsstk01,n= 48
20 40 60 80 100 120 140
10-4 10-2 100 102
min approximation A-norm of the error
λmin−µk
λmin
17
Approximation of λ
mins3dkt3m2,n= 90449,ichol
500 1000 1500 2000 2500 3000 10-4
10-2 100 102
min approximation A-norm of the error
λmin−µk λ
Bounds summary
bcsstk01,n= 48
αkkrkk2 < kx−xkk2A . krkk2 µk
krkk2 kpkk2
50 100 150
10-15 10-10 10-5 100
A-norm of the error GQ lower bound upper upper bound upper bound
19
Bounds summary
s3dkt3m2,n= 90449,ichol
αkkrkk2 < kx−xkk2A . krkk2 µk
krkk2 kpkk2
500 1000 1500 2000 2500 3000 3500 10-10
10-5 100
A-norm of the error Radau upper bound GQ lower bound upper bound
k upper bound
Improving accuracy of the estimates
kx−xkk2A = αkkrkk2 + kx−xk+1k2A
[Golub, Strakoš 1994, Golub, Meurant 1997, Strakoš, T. 2002]
21
Use a delay and bound (k + d)th error
bcsstk01,n= 48,d= 10
kx−xkk2A =
k+d−1
X
j=k
αjkrjk2 + kx−xk+dk2A
50 100 150
10-15 10-10 10-5 100
A-norm of the error GQ lower bound upper bound
k upper bound
Use a delay and bound (k + d)th error
bcsstk01,n= 48,d= 10
kx−xkk2A =
k+d−1
X
j=k
αjkrjk2 + kx−xk+dk2A
50 100 150
10-15 10-10 10-5 100
A-norm of the error GQ lower bound upper bound
k upper bound
22
Use a delay and bound (k + d)th error
s3dkt3m2,n= 90449,ichol,d= 100
kx−xkk2A =
k+d−1
X
j=k
αjkrjk2 + kx−xk+dk2A
500 1000 1500 2000 2500 3000 3500 10-10
10-5 100
A-norm of the error GQ lower bound upper bound
k upper bound
Use a delay and bound (k + d)th error
s3dkt3m2,n= 90449,ichol,d= 100
kx−xkk2A =
k+d−1
X
j=k
αjkrjk2 + kx−xk+dk2A
500 1000 1500 2000 2500 3000 3500 10-10
10-5 100
A-norm of the error GQ lower bound upper bound
k upper bound
23
How to choose d adaptively?
kx−xkk2A =
k+d−1
X
j=k
αjkrjk2
| {z }
νk,d
+kx−xk+dk2A,
= νk,d + kx−xk+dk2A
kx−xkk2A kx−xkk2A,
If kx−xk+dkA
kx−xkkA < ε, then
νk,d1/2 < kx−xkkA < νk,d1/2
√
1−ε2, e.g., ε= 0.8.
How to choose d adaptively?
kx−xkk2A =
k+d−1
X
j=k
αjkrjk2
| {z }
νk,d
+kx−xk+dk2A,
= νk,d + kx−xk+dk2A
kx−xkk2A kx−xkk2A,
If kx−xk+dkA
kx−xkkA < ε, then
νk,d1/2 < kx−xkkA < νk,d1/2
√
1−ε2, e.g., ε= 0.8.
24
Pseudo algorithm - adaptive choice of d
1: go= 1
2: d=d+ 1
3: while ((d≥1)and (go))do
4: computeνk−d+1,d 5: if kx−xkx−xkkA
k−dkA < εthen
6: compute lower or upper bounds at iterationk−d
7: d=d−1
8: else
9: go= 0
10: end if
11: end while
Adaptive choice of d
µk upper bound approach, bcsstk01
kx−xk+dkA kx−xkkA .
krk+dk
õk+d
krk+dk kpk+dk
√νk,d < ε= 0.5,
50 100 150
10-15 10-10 10-5 100
A-norm of the error with adaptive choice of d
0 50 100 150
-50 0 50
26
Adaptive choice of d
µk upper bound approach, bcsstk01
kx−xk+dkA kx−xkkA .
krk+dk
õk+d
krk+dk kpk+dk
√νk,d < ε= 0.5,
50 100 150
10-15 10-10 10-5 100
A-norm of the error with adaptive choice of d
-50 0 50
Adaptive choice of d
µk upper bound approach, s3dkt3m2
kx−xk+dkA kx−xkkA .
krk+dk
õk+d
krk+dk kpk+dk
√νk,d
< ε= 0.5,
500 1000 1500 2000 2500 3000 10-10
10-5 100
A-norm of the error with adaptive choice of d
500 1000 1500 2000 2500 3000 -2000
0 2000
27
Adaptive choice of d
µk upper bound approach, s3dkt3m2
kx−xk+dkA kx−xkkA .
krk+dk
õk+d
krk+dk kpk+dk
√νk,d
< ε= 0.5,
500 1000 1500 2000 2500 3000 10-10
10-5 100
A-norm of the error with adaptive choice of d
500 1000 1500 2000 2500 3000 -2000
0 2000
Adaptive choice of d
µk upper bound approach, s3dkt3m2
2300 2400 2500 2600 2700 2800 2900 10-10
10-8 10-6 10-4
A-norm of the error with adaptive choice of d
2300 2400 2500 2600 2700 2800 2900 -200
0 200
28
Another approach
superliner, linear, sublinear CG convergence
Letα>β > γ >0. Then β−γ α−β > β
α ⇔ β
α > γ β.
Apply to kx−xk−dk2A > kx−xkk2A > kx−xk+dk2A. Then
νk,d
νk−d,d
> kx−xkk2A
kx−xk−dk2A ⇔ kx−xkkA
kx−xk−dkA > kx−xk+dkA kx−xkkA .
CG convergence:
linear → goodapproximation superliner → upperbound sublinear → lowerbound
Another approach
superliner, linear, sublinear CG convergence
Letα>β > γ >0. Then β−γ α−β > β
α ⇔ β
α > γ β.
Apply to kx−xk−dk2A > kx−xkk2A > kx−xk+dk2A. Then
νk,d
νk−d,d
> kx−xkk2A
kx−xk−dk2A ⇔ kx−xkkA
kx−xk−dkA > kx−xk+dkA kx−xkkA .
CG convergence:
linear → goodapproximation superliner → upperbound sublinear → lowerbound
29
Another approach
superliner, linear, sublinear CG convergence
Letα>β > γ >0. Then β−γ α−β > β
α ⇔ β
α > γ β.
Apply to kx−xk−dk2A > kx−xkk2A > kx−xk+dk2A. Then
νk,d
νk−d,d
> kx−xkk2A
kx−xk−dk2A ⇔ kx−xkkA
kx−xk−dkA > kx−xk+dkA kx−xkkA .
CG convergence:
linear → goodapproximation superliner → upperbound sublinear → lowerbound
Yet another approach
decrease formula
For`+e < k andd >0, one can relate kx−xk+dk2A
kx−xkk2A and kx−x`+ek2A kx−x`k2A .
It holds that
kx−xk+dk2A kx−xkk2A =
1− 1
1− kx− 1
x`+ek2 A kx−x`k2
A
k−1
P
j=`+e
gj
`+e−1
P
j=`
gj
+ 1
k+d−1
P
j=k
gj
k−1
P
j=`+e
gj
+1
where
gj =αjkrjk2.
30
Yet another approach
decrease formula
For`+e < k andd >0, one can relate kx−xk+dk2A
kx−xkk2A and kx−x`+ek2A kx−x`k2A . It holds that
kx−xk+dk2A kx−xkk2A =
1− 1
1− kx− 1
x`+ek2 A kx−x`k2
A
k−1
P
j=`+e
gj
`+e−1
P
j=`
gj
+ 1
k+d−1
P
j=k
gj
k−1
P
j=`+e
gj +1
where
2
Conclusions
A new boundon theA-norm of the error:
simple, comparable with the Gauss-Radau bound, it can be used even forµ > λmin (heuristics).
Cheapapproximations of extreme Ritz values can be used in the new “upper”bound µ ←→ µk,
using these approximations→estimate othercharacteristics. Strategy for improving accuracy of the error estimates
based on thedecreaseofkx−xkkA inditerations, three approaches, how to choosedadaptively.
Future work →combine approaches, numerical experiments
31
Conclusions
A new boundon theA-norm of the error:
simple, comparable with the Gauss-Radau bound, it can be used even forµ > λmin (heuristics).
Cheapapproximations of extreme Ritz values can be used in the new “upper”bound µ ←→ µk,
using these approximations→estimate othercharacteristics.
Strategy for improving accuracy of the error estimates based on thedecreaseofkx−xkkA inditerations, three approaches, how to choosedadaptively.
Future work →combine approaches, numerical experiments
Conclusions
A new boundon theA-norm of the error:
simple, comparable with the Gauss-Radau bound, it can be used even forµ > λmin (heuristics).
Cheapapproximations of extreme Ritz values can be used in the new “upper”bound µ ←→ µk,
using these approximations→estimate othercharacteristics.
Strategy for improving accuracy of the error estimates based on thedecreaseofkx−xkkA inditerations, three approaches, how to choosedadaptively.
Future work →combine approaches, numerical experiments
31
Conclusions
A new boundon theA-norm of the error:
simple, comparable with the Gauss-Radau bound, it can be used even forµ > λmin (heuristics).
Cheapapproximations of extreme Ritz values can be used in the new “upper”bound µ ←→ µk,
using these approximations→estimate othercharacteristics.
Strategy for improving accuracy of the error estimates based on thedecreaseofkx−xkkA inditerations, three approaches, how to choosedadaptively.
Future work →combine approaches, numerical experiments
Related papers
G. H. Golub and Z. Strakoš, [Estimates in quadratic formulas, Numer.
Algorithms, 8 (1994), pp. 241–268.]
G. Meurant and P. Tichý,[Practical estimation of theA-norm of the error in CG, to be submitted soon, 2018]
G. Meurant and P. Tichý,[On computing quadrature-based bounds for the A-norm of the error in CG, Numer. Algorithms, 62 (2013), pp. 163-191]
Z. Strakoš and P. Tichý,[On error estimation in CG and why it works in FP computations, Electron. Trans. Numer. Anal., 13 (2002), pp. 56–80.]
Thank you for your attention!
32