Convolutional over Recurrent Encoder for Neural Machine Translation

(1)

Convolutional over Recurrent Encoder for Neural Machine Translation

Annual Conference of the European Association for Machine Translation

2017

(2)

Convolutional over Recurrent Encoder for Neural Machine Translation

2

Neural Machine Translation

•

End to end neural network with RNN architecture where the output of an RNN (decoder) is conditioned on another RNN (encoder).

• c is a fixed length vector representation of source sentence encoded by RNN.

• Attention Mechanism :

•

(Bahdanau et al 2015) : compute conext vector as weighted average of annotations of source hidden states.

Published as a conference paper at ICLR 2015

The decoder is often trained to predict the next word y_t⁰ given the context vector c and all the previously predicted words {y₁, · · · , y_t⁰ ₁}. In other words, the decoder defines a probability over the translation y by decomposing the joint probability into the ordered conditionals:

p(y) =

YT

t=1

p(y_t | {y₁, · · · , y_t ₁}, c), (2)

where y = y₁,· · · , y_T_y . With an RNN, each conditional probability is modeled as

p(y_t | {y₁,· · · , y_t ₁} , c) = g(y_t ₁, s_t, c), (3)

where g is a nonlinear, potentially multi-layered, function that outputs the probability of y_t, and s_t is the hidden state of the RNN. It should be noted that other architectures such as a hybrid of an RNN and a de-convolutional neural network can be used (Kalchbrenner and Blunsom, 2013).

3 L

^{EARNING TO}

A

^{LIGN AND}

T

^RANSLATE

In this section, we propose a novel architecture for neural machine translation. The new architecture consists of a bidirectional RNN as an encoder (Sec. 3.2) and a decoder that emulates searching through a source sentence during decoding a translation (Sec. 3.1).

3.1 DÊCODER: GÊNERAL DÊSCRIPTION

Figure 1: The graphical illus- tration of the proposed model trying to generate the t-th target word y_t given a source sentence (x₁, x₂, . . . , x_T).

In a new model architecture, we define each conditional probability in Eq. (2) as:

p(y_i|y₁, . . . , y_i ₁, x) = g(y_i ₁, s_i, c_i), (4) where s_i is an RNN hidden state for time i, computed by

s_i = f(s_i ₁, y_i ₁, c_i).

It should be noted that unlike the existing encoder–decoder ap- proach (see Eq. (2)), here the probability is conditioned on a distinct context vector c_i for each target word y_i.

The context vector c_i depends on a sequence of annotations (h₁, · · · , h_T_x) to which an encoder maps the input sentence. Each annotation h_i contains information about the whole input sequence with a strong focus on the parts surrounding the i-th word of the input sequence. We explain in detail how the annotations are computed in the next section.

The context vector c_i is, then, computed as a weighted sum of these annotations h_i:

c_i =

Tx

X

j=1

↵_ijh_j. (5) The weight ↵_ij of each annotation h_j is computed by

↵_ij = exp (e_ij) PTx

k=1 exp (e_ik), (6)

where

e_ij = a(s_i ₁, h_j)

is an alignment model which scores how well the inputs around position j and the output at position i match. The score is based on the RNN hidden state s_i ₁ (just before emitting y_i, Eq. (4)) and the j-th annotation h_j of the input sentence.

We parametrize the alignment model a as a feedforward neural network which is jointly trained with all the other components of the proposed system. Note that unlike in traditional machine translation,

Published as a conference paper at ICLR 2015

The decoder is often trained to predict the next word

y_t⁰

given the context vector

c

and all the previously predicted words

{y₁, · · · , y_t⁰ ₁}

. In other words, the decoder defines a probability over the translation

y

by decomposing the joint probability into the ordered conditionals:

p(y) =

YT t=1

p(y_t | {y₁, · · · , y_t ₁}, c),

(2)

where

y = y₁, · · · , y_T_y

. With an RNN, each conditional probability is modeled as

p(y_t | {y₁, · · · , y_t ₁} , c) = g(y_t ₁, s_t, c),

(3)

where

g

is a nonlinear, potentially multi-layered, function that outputs the probability of

y_t

, and

s_t

is the hidden state of the RNN. It should be noted that other architectures such as a hybrid of an RNN and a de-convolutional neural network can be used (Kalchbrenner and Blunsom, 2013).

3 L

^{EARNING TO}

A

^{LIGN AND}

T

^RANSLATE

In this section, we propose a novel architecture for neural machine translation. The new architecture consists of a bidirectional RNN as an encoder (Sec. 3.2) and a decoder that emulates searching through a source sentence during decoding a translation (Sec. 3.1).

3.1 D

^ECODER

: G

^ENERAL

D

^ESCRIPTION

Figure 1: The graphical illus- tration of the proposed model trying to generate the

t-th tar-

get word

y_t

given a source sentence

(x₁, x₂, . . . , x_T).

In a new model architecture, we define each conditional probability in Eq. (2) as:

p(y_i|y₁, . . . , y_i ₁, x) = g(y_i ₁, s_i, c_i),

(4) where

s_i

is an RNN hidden state for time

i, computed by

s_i = f(s_i ₁, y_i ₁, c_i).

It should be noted that unlike the existing encoder–decoder ap- proach (see Eq. (2)), here the probability is conditioned on a distinct context vector

c_i

for each target word

y_i

.

The context vector

c_i

depends on a sequence of

annotations (h₁, · · · , h_T_x)

to which an encoder maps the input sentence. Each annotation

h_i

contains information about the whole input sequence with a strong focus on the parts surrounding the

i-th word of the

input sequence. We explain in detail how the annotations are com- puted in the next section.

The context vector

c_i

is, then, computed as a weighted sum of these annotations

h_i

:

c_i =

T_x

X

j=1

↵_ijh_j.

(5) The weight

↵_ij

of each annotation

h_j

is computed by

↵_ij = exp (e_ij) PT_x

k=1 exp (e_ik),

(6)

where

e_ij = a(s_i ₁, h_j)

is an

alignment model

which scores how well the inputs around position

j

and the output at position

i

match. The score is based on the RNN hidden state

s_i ₁

(just before emitting

y_i

, Eq. (4)) and the

j

-th annotation

h_j

of the input sentence.

We parametrize the alignment model

a

as a feedforward neural network which is jointly trained with all the other components of the proposed system. Note that unlike in traditional machine translation,

3

(3)

2 2 2 2 S21

S11

S31

Sj1 S12

S22

S32

Sj2

C1 C2 C3 Cj

y1 y2 y3 yj

D e c o d e r

Z1 Z2 Z3 Zi

αj:n +

St-12

C’j = ∑αjiCNi

(4)

Why RNN works for NMT ?

✦

Recurrently encode history for variable length large input sequences

✦

Capture the long distance dependency which is an

h22

h32

hi2 S21

S11

S31

Sj1 S12

S22

S32

Sj2

C’1 C’2 C’3 C’j

X2

X1 X3 Xi

y2

y1 y3 yj

D e c o d e r

pad0 pad0

CN11

CN21

CN31

CNi1 CN12

CN22

CN32

CNi2

pad0 pad0

R N N - E n c o d e r C N N - L a y e r s

Z1 Z2 Z3 Zi

αj:n + C’j = ∑αjiCNi

St-12

(9)

Convolution over Recurrent encoder:

✤

Each of the vectors CN

i

now represents a feature produced by multiple kernels over h

i

✤

Relatively uniform composition of multiple previous states and current state.

✤

Simultaneous hence faster processing at the convolutional layers

PBML ??? MAY 2017

h21 h11

h31

hi1 h12

h22

h32

hi2 S21

S11

S31

Sj1 S12

S22

S32

Sj2

C1 C2 C3 Cj

X2

X1 X3 Xi

y1 y2 y3 yj

E n c o d e r D e c o d e r

Z1 Z2 Z3 Zi

αj:n +

St-12

C’j = ∑αjiCNi

Figure 1. NMT encoder-decoder framework

h21 h11

h31

hi1 h12

h22

h32

hi2 S21

S11

S31

Sj1 S12

S22

S32

Sj2

C’1 C’2 C’3 C’j

X2

X1 X3 Xi

y2

y1 y3 yj

D e c o d e r

pad0 pad0

CN11

CN21

CN31

CNi1 CN12

CN22

CN32

CNi2

pad0 pad0

R N N - E n c o d e r C N N - L a y e r s

Z1 Z2 Z3 Zi

αj:n + C’j = ∑αjiCNi

St-12

Figure 2. Convolution over Recurrent model

BN Ra03a jR 0R j@Cc. s3 UUIw LnIjCUI3 Iw3ac R8 ~u30 cCy3 ,RNqRInjCRN ~Ij3ac Rq3a j@3 RnjUnj R8 j@3 `MM 3N,R03a j 3,@ jCL3 cj3UY c c@RsN CN 7C<na3 l. 8Ra Rna LR03I j@3 CNUnj jR j@3 ~acj ,RNqRInjCRN Iw3a Cc j@3 @C003N cjj3 RnjUnj R8 j@3 `MM 3N,R03aY i@nc CN¹_i Cc 03~N30 c-

CN¹_i = σ(θ · hi−[(w−1)/2]:i+[(w−1)/2] + b) V4W

j 3,@ Iw3a. s3 UUIw NnL$3a R8 ~Ij3ac 3\nI jR j@3 RaC<CNI CNUnj c3Nj3N,3 I3N<j@Y 2,@ ~Ij3a Cc R8 sC0j@ kY MRj3 j@j j@3 I3N<j@ R8 j@3 RnjUnj R8 j@3 ,RNqRInjCRN ~IA j3ac a30n,3c 03U3N0CN< RN j@3 CNUnj I3N<j@ N0 j@3 G3aN3I sC0j@Y BN Ra03a jR a3A jCN j@3 RaC<CNI c3\n3N,3 I3N<j@ R8 j@3 cRna,3 c3Nj3N,3 s3 UUIw U00CN< j 3,@

Iw3aY i@j Cc. 8Ra 3,@ ,RNqRInjCRNI Iw3a. j@3 CNUnj Cc y3aRAU0030 cR j@j j@3 RnjUnj

✤

For Phrase-based MT, use CNN language model as

Residual connection (He et, al 2015) between each

intermediate layer

(13)

Experimental setting:

✤

Deep RNN encoder :

✦

Comparing 2 layer RNN encoder baseline to CoveR is unfair

Increased output length

✤

With additional context, CoveR model generates complete translation

P. Dakwale, C.Monz Convolutional over Recurrent Encoder for NMT (1–12)

2uLUI3 S-

bRna,3 - c j@3 a3q3a3N0 LajCN Inj@3a GCN< EaY cC0 ~8jw w3ac <R

`383a3N,3 - sC3 UcjRa LajCN Inj@3a GCN< EaY qRa 8ɫN8yC< E@a3N c<j3 -

#c3ICN3 - sC3 03a LajCN Inj@3a GCN< EaY c<j3

+Rq3a - sC3 03a LajCN Inj@3a GCN< EaY c<j3 qRa 8ɫN8yC< E@a3N - 2uLUI3 l-

bRna,3 - @3 cC0 j@3 CjCN3aaw Cc cjCII $3CN< sRaG30 Rnj Y

`383a3N,3 - 3a c<j3 . 0c <3Nn3 a3Cc3aRnj3 s3a03 NR,@ nc<3a$3Cj3j Y

#c3ICN3 - 3a c<j3 . 0cc 0C3 cja3,G3 NR,@ <nNG> Ccj Y

+Rq3a - 3a c<j3 . 0C3 a3Cc3aRnj3 sCa0 NR,@ nc<3a$3Cj3j Y

Table 2. Translation examples. Words in bold show correct translations produced by our model as compared to the baseline.

j@3 ~NI cjj3c R8 II j@3 ~q3 Iw3ac R8 j@3 3N,R03a a3cnIjCN< CN q3,jRa R8 cCy3 9u/ VȔ/ȕ Cc j@3 0CL3NcCRN R8 j@3 @C003N Iw3aW N0 j@3N 0RsN<a0CN< Cj jR cCy3 lu/ $w cCLUI3 NRNAICN3a jaNc8RaLjCRN N0 ~NIIw cUICjjCN< Cj CN jsR q3,jRac R8 cCy3 Ȕ/ȕ s@C,@ a3 nc30 jR CNCjCICy3 3,@ R8 j@3 Iw3ac R8 j@3 03,R03aY

fY `3cnIjc

i$I3 S c@Rsc j@3 a3cnIjc 8Ra Rna 2N<ICc@A;3aLN jaNcIjCRNc 3uU3aCL3NjcY i@3

~acj ,RInLN CN0C,j3c j@3 $3cj #H2m c,Ra3c RN j@3 03q3IRUL3Nj c3j N3scj3cjȕSk 8Ra II j@a33 LR03Ic 8j3a lz 3UR,@cY `3cnIjc a3 a3URaj30 RN j@3 N3scj3cjȕS: N0 N3scj3cjȔS9 j3cj c3jcY Qna +Rq3` LR03I c@Rsc CLUaRq3L3Njc R8 SYS N0 zY9 #H2m URCNjc a3cU3,A jCq3Iw Rq3a j@3 jsR j3cj c3jcY Ij@Rn<@ j@3 033U `MM 3N,R03a U3a8RaLc $3jj3a j@N j@3 $c3ICN3. j@3 CLUaRq3L3Njc ,@C3q30 a3 IRs3a j@N j@j R8 j@3 +Rq3` LR03IY

fYSY [nICjjCq3 NIwcCc N0 0Cc,nccCRN

i$I3 l UaRqC03c cRL3 R8 j@3 jaNcIjCRN 3uLUI3c UaR0n,30 $w j@3 $c3ICN3 cwcA j3L N0 Rna +Rq3` LR03IY <3N3aI R$c3aqjCRN Cc j@3 CLUaRq30 jaNcIjCRNc $w Rna LR03I Rq3a j@3 $c3ICN3 sCj@ a3<a0 jR j@3 a383a3N,3 jaNcIjCRN s@C,@ Cc IcR a3A

3,j30 $w j@3 CLUaRq30 #H2m c,Ra3cY KRa3 cU3,C~,IIw. 2uLUI3 S c@Rsc CNcjN,3c s@3a3 j@3 $c3ICN3 cn{3ac CN cRL3 ,c3c 8aRL CN,RLUI3j3 ,Rq3a<3 R8 j@3 cRna,3 c3NA j3N,3Y QN3 a3cRN 8Ra cn,@ CN,RLUI3j3 jaNcIjCRNc Cc j@3 I,G R8 ,Rq3a<3 LR03ICN<

s@C,@ @c $33N @N0I30 ncCN< ,Rq3a<3 3L$300CN<c Vin 3j IY. lzSfWY r3 R$c3aq3 j@Cc UaR$I3L 8a3\n3NjIw CN CNcjN,3c s@3a3 cU3,C~, sRa0 LC<@j cC<NI ,RLUI3jCRN R8 c3Nj3N,3 03cUCj3 LRa3 sRa0c CN j@3 c3\n3N,3 a3LCN jR $3 $3 jaNcIj30Y i@3c3 sRa0c ,N ,nc3 j@3 <3N3ajCRN R8 N3uj ja<3j sRa0 c j@3 3N0AR8Ac3Nj3N,3 Ȕ2Qbȕ cwLA

$RIY bCN,3 j@3 $3L c3a,@ 03,R0CN< I<RaCj@L ,RNcC03ac @wURj@3cCc ,RLUI3j3 s@3N j@3 3N0 R8 c3Nj3N,3 Cc <3N3aj30. CN cn,@ CNcjN,3c c3a,@ cjRUc. $RajCN< 8naj@3a 3uA

BLEU Avg sent length

Baseline

18.7

Deep RNN

19.0

CoveR

19.9

Reference

20.9

(17)

17

Qualitative analysis :

✤

More uniform attention distribution

✤

Generation of correct composite word

P. Dakwale, C.Monz Convolutional over Recurrent Encoder for NMT (1–12)

2uLUI3 S-

bRna,3 - c j@3 a3q3a3N0 LajCN Inj@3a GCN< EaY cC0 ~8jw w3ac <R

`383a3N,3 - sC3 UcjRa LajCN Inj@3a GCN< EaY qRa 8ɫN8yC< E@a3N c<j3 -

#c3ICN3 - sC3 03a LajCN Inj@3a GCN< EaY c<j3

+Rq3a - sC3 03a LajCN Inj@3a GCN< EaY c<j3 qRa 8ɫN8yC< E@a3N - 2uLUI3 l-

bRna,3 - @3 cC0 j@3 CjCN3aaw Cc cjCII $3CN< sRaG30 Rnj Y

`383a3N,3 - 3a c<j3 . 0c <3Nn3 a3Cc3aRnj3 s3a03 NR,@ nc<3a$3Cj3j Y

#c3ICN3 - 3a c<j3 . 0cc 0C3 cja3,G3 NR,@ < nNG > Ccj Y

+Rq3a - 3a c<j3 . 0C3 a3Cc3aRnj3 sCa0 NR,@ nc<3a$3Cj3j Y

Table 2. Translation examples. Words in bold show correct translations produced by our model as compared to the baseline.

j@3 ~NI cjj3c R8 II j@3 ~q3 Iw3ac R8 j@3 3N,R03a a3cnIjCN< CN q3,jRa R8 cCy3 9u/ VȔ/ȕ Cc j@3 0CL3NcCRN R8 j@3 @C003N Iw3aW N0 j@3N 0RsN<a0CN< Cj jR cCy3 lu/ $w cCLUI3 NRNAICN3a jaNc8RaLjCRN N0 ~NIIw cUICjjCN< Cj CN jsR q3,jRac R8 cCy3 Ȕ/ȕ s@C,@ a3 nc30 jR CNCjCICy3 3,@ R8 j@3 Iw3ac R8 j@3 03,R03aY

fY `3cnIjc

i$I3 S c@Rsc j@3 a3cnIjc 8Ra Rna 2N<ICc@A;3aLN jaNcIjCRNc 3uU3aCL3NjcY i@3

~acj ,RInLN CN0C,j3c j@3 $3cj #H2m c,Ra3c RN j@3 03q3IRUL3Nj c3j N3scj3cjȕSk 8Ra II j@a33 LR03Ic 8j3a lz 3UR,@cY `3cnIjc a3 a3URaj30 RN j@3 N3scj3cjȕS: N0 N3scj3cjȔS9 j3cj c3jcY Qna +Rq3` LR03I c@Rsc CLUaRq3L3Njc R8 SYS N0 zY9 #H2m URCNjc a3cU3,A jCq3Iw Rq3a j@3 jsR j3cj c3jcY Ij@Rn<@ j@3 033U `MM 3N,R03a U3a8RaLc $3jj3a j@N j@3 $c3ICN3. j@3 CLUaRq3L3Njc ,@C3q30 a3 IRs3a j@N j@j R8 j@3 +Rq3` LR03IY

fYSY [nICjjCq3 NIwcCc N0 0Cc,nccCRN

i$I3 l UaRqC03c cRL3 R8 j@3 jaNcIjCRN 3uLUI3c UaR0n,30 $w j@3 $c3ICN3 cwcA j3L N0 Rna +Rq3` LR03IY <3N3aI R$c3aqjCRN Cc j@3 CLUaRq30 jaNcIjCRNc $w Rna LR03I Rq3a j@3 $c3ICN3 sCj@ a3<a0 jR j@3 a383a3N,3 jaNcIjCRN s@C,@ Cc IcR a3A

3,j30 $w j@3 CLUaRq30 #H2m c,Ra3cY KRa3 cU3,C~,IIw. 2uLUI3 S c@Rsc CNcjN,3c s@3a3 j@3 $c3ICN3 cn{3ac CN cRL3 ,c3c 8aRL CN,RLUI3j3 ,Rq3a<3 R8 j@3 cRna,3 c3NA j3N,3Y QN3 a3cRN 8Ra cn,@ CN,RLUI3j3 jaNcIjCRNc Cc j@3 I,G R8 ,Rq3a<3 LR03ICN<

s@C,@ @c $33N @N0I30 ncCN< ,Rq3a<3 3L$300CN<c Vin 3j IY. lzSfWY r3 R$c3aq3

j@Cc UaR$I3L 8a3\n3NjIw CN CNcjN,3c s@3a3 cU3,C~, sRa0 LC<@j cC<NI ,RLUI3jCRN

(18)

18

Qualitative analysis :

✤

More uniform attention distribution

Baseline

CoveR

PBML ??? MAY 2017

UNcCRNc. s@CI3 C<NRaCN< j@3 a3LCNCN< sRa0cY 7Ra CNcjN,3 CN 2uLUI3 S CN i$I3 l.

$w a3IwCN< RN j@3 jj3NjCRN L3,@NCcL. j@3 $c3ICN3 cwcj3L <3N3aj3c j@3 jaNcIjCRN R8 ȔcC0ȕc Ȕc<j3ȕ. j@3 LR03I LC<@j <Cq3 Ua383a3N,3 jR j@3 <3N3ajCRN R8 N 3N0AR8A c3Nj3N,3 Ȕ2QbȕcwL$RI CLL30Cj3Iw 8RIIRsCN< j@3 q3a$Y QN j@3 Rj@3a @N0. 8Ra Rna +Rq3` LR03I. j ja<3j URcCjCRN 4. sC03a ,RNj3uj Cc qCI$I3 jR j@3 LR03I j@aRn<@

,RNqRInjCRNI Iw3ac 8aRL $Rj@ 0Ca3,jCRNc cC<NIICN< j@3 Ua3c3N,3 R8 Rj@3a sRa0c a3A LCNCN< CN j@3 CNUnj c3Nj3N,3. j@nc UaR0n,CN< LRa3 ,RLUI3j3 jaNcIjCRNY NRj@3a

@3 cC0 j@3 CjCN3aaw Cc cjCII $3CN< sRaG30 Rnj Y 3a

c<j3 . 0cc

0C3 cja3,G3

NR,@

ȒnNGȓ Ccj

Figure 3. Attention distribution for Baseline

c<j3 . 0cc

0C3 a3Cc3aRnj3

NR,@

CLL3a nc<3a$3Cj3j

sCa0

Figure 4. Attention distribution for CoveR model

0C{3a3N,3 $3js33N j@3 $c3ICN3 LR03I N0 Rna +Rq3` LR03I j@j ,N $3 R$c3aq30 CN 2uLUI3 l Cc j@j jj3NjCRN s3C<@jc a3 0CcjaC$nj30 LRa3 nNC8RaLIw LRN< j@3 cRna,3 sRa0cY bU3,C~,IIw. 8Ra ja<3j URcCjCRN f. c c@RsN CN 7C<na3 k j@3 $c3ICN3 LR03I Uwc

PBML ??? MAY 2017

UNcCRNc. s@CI3 C<NRaCN< j@3 a3LCNCN< sRa0cY 7Ra CNcjN,3 CN 2uLUI3 S CN i$I3 l.

$w a3IwCN< RN j@3 jj3NjCRN L3,@NCcL. j@3 $c3ICN3 cwcj3L <3N3aj3c j@3 jaNcIjCRN R8ȔcC0ȕc Ȕc<j3ȕ. j@3 LR03I LC<@j <Cq3 Ua383a3N,3 jR j@3 <3N3ajCRN R8 N 3N0AR8A c3Nj3N,3Ȕ2Qbȕ cwL$RI CLL30Cj3Iw 8RIIRsCN< j@3 q3a$Y QN j@3 Rj@3a @N0. 8Ra Rna +Rq3` LR03I. j ja<3j URcCjCRN 4. sC03a ,RNj3uj Cc qCI$I3 jR j@3 LR03I j@aRn<@

,RNqRInjCRNI Iw3ac 8aRL $Rj@ 0Ca3,jCRNc cC<NIICN< j@3 Ua3c3N,3 R8 Rj@3a sRa0c a3A LCNCN< CN j@3 CNUnj c3Nj3N,3. j@nc UaR0n,CN< LRa3 ,RLUI3j3 jaNcIjCRNY NRj@3a

c<j3 . 0cc

0C3 cja3,G3

NR,@

ȒnNGȓ Ccj

Figure 3. Attention distribution for Baseline

c<j3 . 0cc

0C3 a3Cc3aRnj3

NR,@

CLL3a nc<3a$3Cj3j

sCa0

Figure 4. Attention distribution for CoveR model

0C{3a3N,3 $3js33N j@3 $c3ICN3 LR03I N0 Rna +Rq3` LR03I j@j ,N $3 R$c3aq30 CN 2uLUI3 l Cc j@j jj3NjCRN s3C<@jc a3 0CcjaC$nj30 LRa3 nNC8RaLIw LRN< j@3 cRna,3 sRa0cY bU3,C~,IIw. 8Ra ja<3j URcCjCRN f. c c@RsN CN 7C<na3 k j@3 $c3ICN3 LR03I Uwc

✤ Baseline translates :

‘itinerary’ to ‘strecke’ (road, distance)

✤ Pays attention only to

‘itinerary’ for this position

✤ Cover translates : ‘itinerary’

to ‘reiseroute’

✤ Also pays attention to final verb

(20)