• Nebyly nalezeny žádné výsledky

Coordination Structuresin Dependency Treebanks

N/A
N/A
Protected

Academic year: 2022

Podíl "Coordination Structuresin Dependency Treebanks"

Copied!
37
0
0

Načítání.... (zobrazit plný text nyní)

Fulltext

(1)

ACL 2013 paper

Coordination Structures in Dependency Treebanks

Martin Popel, David Mareček, Jan Štěpánek, Daniel Zeman, Zdeněk Žabokrtský

Charles University in Prague,

Faculty of Mathematics and Physics,

ÚFAL (Institute of Formal and Applied Linguistics)

September 19

th

2013, Příchovice

(2)

2

Motivation

Coordination and Dependency are fundamentally different relations

Coordinations are difficult to represent in dependency treebanks

Large inter-treebank differences

dogs

cats

and dogs cats

and

(3)

3

Motivation

Coordination and Dependency are fundamentally different relations

Coordinations are difficult to represent in dependency treebanks

Large inter-treebank differences

Obstacle for cross-lingual parsing (evaluation)

Swedish

treebank train delexicalized

parser parse Danish test set

dogs

cats

and dogs cats

and

(4)

4

Styles of annotating coordinations

Topological styles

Labeling styles

Transformation of styles

Data: HamleDT (26 languages)

Outline

(5)

5

Participants of coordination

(separates two conjuncts)

Coordinating conjunction

Comma or other punctuation (semicolon)

(modifies two or more conjuncts) Examples:

more than two conjuncts (“multi-conjunct c.”)

home is a “private modifier”

nested (embedded) coordinations

coordinated shared modifier

conjunct delimiter

shared modifier

dogs , cats and rats

lazy

came home and cried Mary

John and Mary or Peter

big and cheap apples and oranges

(6)

6

Special cases

Asyndetic coordination = no conjunction

,

Don't worry be happy , keep smiling

(7)

7

Special cases

Asyndetic coordination = no conjunction

Multi-word conjunction

as well as ,

Don't worry be happy , keep smiling

(8)

8

Special cases

Asyndetic coordination = no conjunction

Multi-word conjunction

Single-conjunct coordination

as well as

I love her And

,

Don't worry be happy , keep smiling

(9)

9

Special cases

Asyndetic coordination = no conjunction

Multi-word conjunction

Single-conjunct coordination

One token with more roles

que = coord. enclitic

(The Senate and the People of Rome) Senatus Populusque Romanus

as well as

I love her And

,

Don't worry be happy , keep smiling

etc.

(10)

10

Special cases

Asyndetic coordination = no conjunction

Multi-word conjunction

Single-conjunct coordination

One token with more roles

que = coord. enclitic

(The Senate and the People of Rome)

Paratactic vs. hypotactic means (John with Mary)

Senatus Populusque Romanus

as well as

I love her And

,

Don't worry be happy , keep smiling

etc.

(11)

11

Special cases

Asyndetic coordination = no conjunction

Multi-word conjunction

Single-conjunct coordination

One token with more roles

que = coord. enclitic

(The Senate and the People of Rome)

Paratactic vs. hypotactic means (John with Mary)

red and white wine = red wine and white wine red and white flag of Poland

Senatus Populusque Romanus

as well as

I love her And

,

Don't worry be happy , keep smiling

etc.

(12)

12

Topological styles (family)

Prague Moscow Stanford

dogs cats rats and

,

dogs

cats

rats and

,

dogs

cats and rats ,

Main “family” – configuration of conjuncts

(13)

13

Topological styles (head)

dogs

cats

rats and

,

Choice of head (which delimiter/conjunct to choose):

rightmost

leftmost

dogs

cats

rats and

,

(14)

14

Topological styles (head)

Prague Moscow Stanford

dogs cats rats and

,

dogs

cats

rats and

,

dogs

cats and rats ,

Choice of head (which delimiter/conjunct to choose):

rightmost

leftmost

dogs cats rats ,

and

dogs

cats

rats and

,

rats cats

dogs , and

(15)

15

Topological styles (head)

dogs

cats and rats ,

see I

Choice of head: leftmost, rightmost or mixed

rats cats

dogs , and

sleep

(16)

16

Topological styles (head)

dogs

cats and rats ,

see I

Choice of head: leftmost, rightmost or mixed

rats cats

dogs , and

sleep

Persian treebank: rightmost for coordination of verbs

leftmost otherwise

(17)

17

Topological styles (shared modifiers)

dogs cats rats and

, lazy

Attachment of shared modifiers:

below the head

below the nearest conjunct

dogs cats rats and

, lazy

(18)

18

Topological styles (shared modifiers)

dogs cats rats and

, lazy

Attachment of shared modifiers:

below the head

below the nearest conjunct

rats cats

dogs , and

lazy

dogs cats rats and

, lazy

rats cats

dogs , and

lazy

Prague Stanford

(19)

19

Topological styles (conjunction)

Attachment of coordinating conjunctions:

“between” conjuncts

below the previous conjunct following conjunct

rats cats

dogs , and

rats cats

dogs

and ,

rats cats

dogs , and

Stanford, head=rightmost

(20)

20

Topological styles (conjunction)

Attachment of coordinating conjunctions:

“between” conjuncts

below the previous conjunct following conjunct

Moscow, head=leftmost

dogs

cats

rats and

,

dogs

cats

rats and

, dogs

cats

rats and

,

(21)

21

Topological styles (conjunction)

Attachment of coordinating conjunctions:

“between” conjuncts

below the previous conjunct following conjunct

Moscow, head=leftmost

dogs

cats

rats and

,

dogs

cats

rats and

, dogs

cats

rats and

,

“as the head”

for Prague (the only applicable)

(22)

22

Topological styles (punctuation)

Attachment of punctuation delimiters:

“between” conjuncts

below the previous conjunct following conjunct

dogs cats rats and

,

dogs cats rats and

,

dogs cats rats and

,

Prague

(23)

23

Labeling styles (dependency rel.)

Dependency relation at “upper level” = with the head node

dogs

cats and rats ,

see rats I

cats

dogs , and

sleep

Sb Obj

dogs

cats and rats ,

see rats I

cats

dogs , and

sleep

Sb Sb Obj Obj

Dependency relation at “lower level” = with the conjuncts

Stanford

(24)

24

Labeling styles (dependency rel.)

Dependency relation at “upper level” = with the head node

Sb Adv

Dependency relation at “lower level” = with the conjuncts

Allows different labels of conjuncts.

Who why

and

did

it ?

Coord

Conj Conj

Who why

and

did

it ?

Sb/Adv

Prague

(25)

25

Labeling styles (other)

Are conjuncts annotated?

additional attribute

(is_member) or

encoded into the dependency label: Sb_M, Obj_M, Atr_M,...

Are shared modifiers annotated?

In PDT not explicitly, but it can be deduced.

Proposed, but unseen in treebanks:

co-indexation attributes or bubbles

for nested coordinations and shared modifiers

(26)

26

Annotation styles – overview

How many treebanks

(out of 26 in HamleDT 1.0) use a given style?

Family

(Prague=14, Moscow=5, Stanford=6)

Head

(Leftmost=10, Rightmost=14, Mixed=1)

Shared modifiers

(below Head=11, Nearest conjunct=15)

Conjunctions

(Previous=2, Following=1, Between=8, as Head=14)

Punctuation

(Previous=7, Following=1, Between=15, Missing=2)

Dependency relation

(Upper=17, Lower=9)

Annotated conjuncts

(yes=21, no=5)

Annotated shared modifiers

(yes=8, no=18)

(27)

27

Annotation styles – overview

How many possible styles?

2*3*2*3*3+1*3*2*1*3 = 126 topological

* 8 labeling variants = 1008

How many styles really found?

16 (in 26 treebanks)

(28)

28

Transformations of styles

Subtasks

1. Detect coordinations in a sentence

(esp. boundaries of nested coordinations) 2. Classify participants of coordinations

(conjunct, commas, conjunctions, shared m.)

3. Transform each coordination to the target style

(depth-first recursion, start with inner coord.)

(29)

29

Problematic cases

big and cheap apples and oranges

big cheap

and apples oranges and

big

cheap and

apples

oranges and

Prague Moscow

(30)

30

Problematic cases

Šetřete

netelefonujte ,

,

faxujte

Šetřete netelefonujte ,

faxujte ,

Prague Moscow

Šetřete

netelefonujte ,

, faxujte

“Save money, don't phone, use fax.”

PDT 2.0

(31)

31

HamleDT v1.0 collection of treebanks

HArmonized Multi-LanguagE Dependency Treebank

http://ufal.mff.cuni.cz/hamledt/

Sources: CoNLL, ICON, other

We tried to harmonize also:

prepositions, determiners,

subordinated clauses, punctuation

We plan to harmonize:

verb groups, tokenization, …

Recent “competitor”: Google Universal Treebanks

Hamle DT

(32)

32

HamleDT v1.0 statistics

(33)

33

HamleDT v1.0

(34)

34

CoNLL (2006-2010)

(35)

35

Google Universal Treebank v1.0

(36)

36

Current / Future work

Prague family transform Moscow

family train “Moscow”

parser parse “Moscow”

test set transform “Prague”

test set

train baseline

parser parse parsed test set

compare results

HamleDT 1.5 (29 languages, done)

HamleDT 2.0 (Rudolf Rosa, Jan Mašek)

More consistent, bigger, more languages

(Hebrew, Polish, Korean, French, Northern Sami,... )

Stanford dependencies instead Afun

English translations and alignments (Google Translate)

Experiments with parsers and learnability

Different styles may be better for different parsers.

original treebank

(37)

37

Thank you

Questions?

Odkazy

Související dokumenty

Dependency propagation in coordination (22 treebanks) External subjects of controlled predicates (12 treebanks) Cyclic dependencies to/from relative clauses (9 treebanks)

◮ However, “the mind of most Western linguists goes completely blank when they are shown a dependency tree.”... The

Construction Head Dependent Exocentric Verb Subject (nsubj)!. Verb Object (dobj) Endocentric Verb

- dependency relations hold primarily between content words (rather than being indirect relations mediated by function words. Dependency Grammars and Treebanks – Dependency

Dependency Grammars and Treebanks – Stratificational Approach.. • every language comprises a restricted number of structural layers

Dependency Grammars and Treebanks – PDT: t-layer (intro) Many industry analysts. had been projecting

With nodes representing words present at the surface level and with so called copied nodes (see Section 6.6, “El- lipsis”), the t-lemma usually corresponds to the basic (default)

Dependency Grammars and Treebanks 2: (Non-)Dependencies and Word Order Czech