• Nebyly nalezeny žádné výsledky

Growing Trees: Non-Linear Incremental Parsing during Writing

N/A
N/A
Protected

Academic year: 2022

Podíl "Growing Trees: Non-Linear Incremental Parsing during Writing"

Copied!
94
0
0

Načítání.... (zobrazit plný text nyní)

Fulltext

(1)

Growing Trees:

Non-Linear Incremental Parsing during Writing

Cerstin Mahlow

Institut für Deutsche Sprache mahlow@ids-mannheim.de

Fred-Jelinek Seminar, Prague, April 4, 2016

(2)

Incremental Parsing

I Computer Science

I

on-going process during coding

I

parsing of the structure so far

I

update the parse according to changes

I NLP

I

batch process

I

processing of text

I

parse a sentence word-by-word

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 2/19

(3)

Incremental Parsing

I Computer Science

I

on-going process during coding

I

parsing of the structure so far

I

update the parse according to changes

I NLP

I

batch process

I

processing of text

I

parse a sentence word-by-word

(4)

Incremental Parsing

I Computer Science

I

on-going process during coding

I

parsing of the structure so far

I

update the parse according to changes

I NLP

I

batch process

I

processing of text

I

parse a sentence word-by-word

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 2/19

(5)

Incremental Parsing

I Computer Science

I

on-going process during coding

I

parsing of the structure so far

I

update the parse according to changes

I NLP

I

batch process

I

processing of text

I

parse a sentence word-by-word

(6)

Incremental Parsing

I Computer Science

I

on-going process during coding

I

parsing of the structure so far

I

update the parse according to changes

I NLP

I

batch process

I

processing of text

I

parse a sentence word-by-word

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 2/19

(7)

Incremental Parsing

I Computer Science

I

on-going process during coding

I

parsing of the structure so far

I

update the parse according to changes

I NLP

I

batch process

I

processing of text

I

parse a sentence word-by-word

(8)

Writing

I non-linear creation of text

I differs from:

I

speaking

I

hearing

I

reading

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 3/19

(9)

Writing

I non-linear creation of text

I differs from:

I

speaking

I

hearing

I

reading

(10)

Writing

I non-linear creation of text

I differs from:

I

speaking

I

hearing

I

reading

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 3/19

(11)

Writing

I non-linear creation of text

I differs from:

I

speaking

I

hearing

I

reading

(12)

Writing

I non-linear creation of text

I differs from:

I

speaking

I

hearing

I

reading

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 3/19

(13)

Challenges in

Live Processing of the Writing Process

I non-linear

I revisions

I big data

I media discontinuity

I author-dependent

(14)

Challenges in

Live Processing of the Writing Process

I non-linear

I revisions

I big data

I media discontinuity

I author-dependent

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 4/19

(15)

Challenges in

Live Processing of the Writing Process

I non-linear

I revisions

I big data

I media discontinuity

I author-dependent

(16)

Challenges in

Live Processing of the Writing Process

I non-linear

I revisions

I big data

I media discontinuity

I author-dependent

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 4/19

(17)

Challenges in

Live Processing of the Writing Process

I non-linear

I revisions

I big data

I media discontinuity

I author-dependent

(18)

Challenges in

Live Processing of the Writing Process

I non-linear

I revisions

I big data

I media discontinuity

I author-dependent

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 4/19

(19)

Media Discontinuity

(20)

Media Discontinuity

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 5/19

(21)

Media Discontinuity

(James Joyce)

(22)

Media Discontinuity

(Max Frisch)

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 5/19

(23)

Media Discontinuity

(Barack Obama)

(24)

Big Data

I Recording of the writing process by keystroke-logging

I Keystroke-logs grow as we type

I Size of keystroke-logs

I

Example: 2’130 data points for a 23 minutes writing session (1.3 MB XML data, 160 words final text)

I

Snapshot: Data for producing the German determiner “Die”

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 6/19

(25)

Big Data

I Recording of the writing process by keystroke-logging

I Keystroke-logs grow as we type

I Size of keystroke-logs

I

Example: 2’130 data points for a 23 minutes writing session (1.3 MB XML data, 160 words final text)

I

Snapshot: Data for producing the German determiner “Die”

(26)

Big Data

I Recording of the writing process by keystroke-logging

I Keystroke-logs grow as we type

I Size of keystroke-logs

I

Example: 2’130 data points for a 23 minutes writing session (1.3 MB XML data, 160 words final text)

I

Snapshot: Data for producing the German determiner “Die”

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 6/19

(27)

Big Data

I Recording of the writing process by keystroke-logging

I Keystroke-logs grow as we type

I Size of keystroke-logs

I

Example: 2’130 data points for a 23 minutes writing session (1.3 MB XML data, 160 words final text)

I

Snapshot: Data for producing the German determiner “Die”

(28)

Big Data

I Recording of the writing process by keystroke-logging

I Keystroke-logs grow as we type

I Size of keystroke-logs

I

Example: 2’130 data points for a 23 minutes writing session (1.3 MB XML data, 160 words final text)

I

Snapshot: Data for producing the German determiner “Die”

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 6/19

(29)

Big Data

(Producing “Die”)

(30)

Example Data

I Writing session from a seminar

I German

I Students

I Argumentative Essay:

I

150 to 160 words

I

25 minutes time

I

free topic (here: “Should assisted suicide be legalized?”)

I Original research hypothesis:

I

Restrictions on time and size trigger extensive revisions.

I

(proven only for size)

I Keystroke-Logging with Inputlog www.inputlog.net

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 8/19

(31)

Example Data

I Writing session from a seminar

I German

I Students

I Argumentative Essay:

I

150 to 160 words

I

25 minutes time

I

free topic (here: “Should assisted suicide be legalized?”)

I Original research hypothesis:

I

Restrictions on time and size trigger extensive revisions.

I

(proven only for size)

I Keystroke-Logging with Inputlog

www.inputlog.net

(32)

Example Data

I Writing session from a seminar

I German

I Students

I Argumentative Essay:

I

150 to 160 words

I

25 minutes time

I

free topic (here: “Should assisted suicide be legalized?”)

I Original research hypothesis:

I

Restrictions on time and size trigger extensive revisions.

I

(proven only for size)

I Keystroke-Logging with Inputlog www.inputlog.net

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 8/19

(33)

Example Data

I Writing session from a seminar

I German

I Students

I Argumentative Essay:

I

150 to 160 words

I

25 minutes time

I

free topic (here: “Should assisted suicide be legalized?”)

I Original research hypothesis:

I

Restrictions on time and size trigger extensive revisions.

I

(proven only for size)

I Keystroke-Logging with Inputlog

www.inputlog.net

(34)

Revisions and Non-Linearity

I Deletions and insertions at any time at any location.

I S-Notation for marking revisions.

I Snapshot: first part of the example text produced.

Die L e g a l i s i e r u n [ h ] 1 | 1 g [ [ a k r i v e r ] 2 | 2 der a k t i v e n ] 6 7 | 6 8 { a k t i v e r } 6 8 | 6 9 S t e r b e h i l f e i s t e i n sehr z w i e s p ä l t i g e s [ thema ] 3 | 3 Themas . [ Für e i n e L e g a l i s i e r u n g ] 6 9 | 7 0 { Da fü r } 7 0 | 7 1 [ w [ u r ] 4 | 4 ürde [ S ] 5 | 5 sprechen ] 7 1 | 7 2 { s p r [ öche ] 7 3 | 7 4 { äche } 7 4 | 7 5 } 7 2

| 7 3 , dass es t o t k r a n [ kenmen ] 6 | 6 ken Menschen e i n s a n f t e s Ab [ bl ben ] 7 | 7 le ben e r m ö g l i c h { t } 4 0 | 4 1 und [ [ u n ö t i g e s ] 8 | 8 w [ i e t e r e ] 9 | 9 e i t e r e s ] 7 6 | 7 7 { w e i t e r e s } 7 7 | 7 8 unnö [ ä ] 1 0 | 1 0 t i g e s

L e i d v e r h i n d e r t .

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 9/19

(35)

Revisions and Non-Linearity

I Deletions and insertions at any time at any location.

I S-Notation for marking revisions.

I Snapshot: first part of the example text produced.

Die L e g a l i s i e r u n [ h ] 1 | 1 g [ [ a k r i v e r ] 2 | 2 der a k t i v e n ] 6 7 | 6 8 { a k t i v e r } 6 8 | 6 9 S t e r b e h i l f e i s t e i n sehr z w i e s p ä l t i g e s [ thema ] 3 | 3 Themas . [ Für e i n e L e g a l i s i e r u n g ] 6 9 | 7 0 { Da fü r } 7 0 | 7 1 [ w [ u r ] 4 | 4 ürde [ S ] 5 | 5 sprechen ] 7 1 | 7 2 { s p r [ öche ] 7 3 | 7 4 { äche } 7 4 | 7 5 } 7 2

| 7 3 , dass es t o t k r a n [ kenmen ] 6 | 6 ken Menschen e i n s a n f t e s Ab [ bl ben ] 7 | 7 le ben e r m ö g l i c h { t } 4 0 | 4 1 und [ [ u n ö t i g e s ] 8 | 8 w [ i e t e r e ] 9 | 9 e i t e r e s ] 7 6 | 7 7 { w e i t e r e s } 7 7 | 7 8 unnö [ ä ] 1 0 | 1 0 t i g e s

L e i d v e r h i n d e r t .

(36)

Revisions and Non-Linearity

I Deletions and insertions at any time at any location.

I S-Notation for marking revisions.

I Snapshot: first part of the example text produced.

Die L e g a l i s i e r u n [ h ] 1 | 1 g [ [ a k r i v e r ] 2 | 2 der a k t i v e n ] 6 7 | 6 8 { a k t i v e r } 6 8 | 6 9 S t e r b e h i l f e i s t e i n sehr z w i e s p ä l t i g e s [ thema ] 3 | 3 Themas . [ Für e i n e L e g a l i s i e r u n g ] 6 9 | 7 0 { Da fü r } 7 0 | 7 1 [ w [ u r ] 4 | 4 ürde [ S ] 5 | 5 sprechen ] 7 1 | 7 2 { s p r [ öche ] 7 3 | 7 4 { äche } 7 4 | 7 5 } 7 2

| 7 3 , dass es t o t k r a n [ kenmen ] 6 | 6 ken Menschen e i n s a n f t e s Ab [ bl ben ] 7 | 7 le ben e r m ö g l i c h { t } 4 0 | 4 1 und [ [ u n ö t i g e s ] 8 | 8 w [ i e t e r e ] 9 | 9 e i t e r e s ] 7 6 | 7 7 { w e i t e r e s } 7 7 | 7 8 unnö [ ä ] 1 0 | 1 0 t i g e s

L e i d v e r h i n d e r t .

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 9/19

(37)

Revisions and Non-Linearity

I Deletions and insertions at any time at any location.

I S-Notation for marking revisions.

I Snapshot: first part of the example text produced.

Die L e g a l i s i e r u n [ h ] 1 | 1 g [ [ a k r i v e r ] 2 | 2 der a k t i v e n ] 6 7 | 6 8 { a k t i v e r } 6 8 | 6 9 S t e r b e h i l f e i s t e i n sehr z w i e s p ä l t i g e s [ thema ] 3 | 3 Themas . [ Für e i n e L e g a l i s i e r u n g ] 6 9 | 7 0 { Da fü r } 7 0 | 7 1 [ w [ u r ] 4 | 4 ürde [ S ] 5 | 5 sprechen ] 7 1 | 7 2 { s p r [ öche ] 7 3 | 7 4 { äche } 7 4 | 7 5 } 7 2

| 7 3 , dass es t o t k r a n [ kenmen ] 6 | 6 ken Menschen e i n s a n f t e s Ab [ bl ben ] 7 | 7 le ben e r m ö g l i c h { t } 4 0 | 4 1 und [ [ u n ö t i g e s ] 8 | 8 w [ i e t e r e ] 9 | 9 e i t e r e s ] 7 6 | 7 7 { w e i t e r e s } 7 7 | 7 8 unnö [ ä ] 1 0 | 1 0 t i g e s

L e i d v e r h i n d e r t .

(38)

Revisions and Non-Linearity

I Deletions and insertions at any time at any location.

I S-Notation for marking revisions.

I Snapshot: first part of the example text produced.

Die L e g a l i s i e r u n [ h ] 1 | 1 g [ [ a k r i v e r ] 2 | 2 der a k t i v e n ] 6 7 | 6 8 { a k t i v e r } 6 8 | 6 9 S t e r b e h i l f e i s t e i n sehr z w i e s p ä l t i g e s [ thema ] 3 | 3 Themas . [ Für e i n e L e g a l i s i e r u n g ] 6 9 | 7 0 { Da fü r } 7 0 | 7 1 [ w [ u r ] 4 | 4 ürde [ S ] 5 | 5 sprechen ] 7 1 | 7 2 { s p r [ öche ] 7 3 | 7 4 { äche } 7 4 | 7 5 } 7 2

| 7 3 , dass es t o t k r a n [ kenmen ] 6 | 6 ken Menschen e i n s a n f t e s Ab [ bl ben ] 7 | 7 le ben e r m ö g l i c h { t } 4 0 | 4 1 und [ [ u n ö t i g e s ] 8 | 8 w [ i e t e r e ] 9 | 9 e i t e r e s ] 7 6 | 7 7 { w e i t e r e s } 7 7 | 7 8 unnö [ ä ] 1 0 | 1 0 t i g e s

L e i d v e r h i n d e r t .

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 9/19

(39)

Revisions and Non-Linearity

I Deletions and insertions at any time at any location.

I S-Notation for marking revisions.

I Snapshot: first part of the example text produced.

Die L e g a l i s i e r u n [ h ] 1 | 1 g [ [ a k r i v e r ] 2 | 2 der a k t i v e n ] 6 7 | 6 8 { a k t i v e r } 6 8 | 6 9 S t e r b e h i l f e i s t e i n sehr z w i e s p ä l t i g e s [ thema ] 3 | 3 Themas . [ Für e i n e L e g a l i s i e r u n g ] 6 9 | 7 0 { Da fü r } 7 0 | 7 1 [ w [ u r ] 4 | 4 ürde [ S ] 5 | 5 sprechen ] 7 1 | 7 2 { s p r [ öche ] 7 3 | 7 4 { äche } 7 4 | 7 5 } 7 2

| 7 3 , dass es t o t k r a n [ kenmen ] 6 | 6 ken Menschen e i n s a n f t e s Ab [ bl ben ] 7 | 7 le ben e r m ö g l i c h { t } 4 0 | 4 1 und [ [ u n ö t i g e s ] 8 | 8 w [ i e t e r e ] 9 | 9 e i t e r e s ] 7 6 | 7 7 { w e i t e r e s } 7 7 | 7 8 unnö [ ä ] 1 0 | 1 0 t i g e s

L e i d v e r h i n d e r t .

(40)

Approaches for Processing Writing Data

I During writing, a text has various states until the final text:

I

One state per keystroke-log-event. (2’130)

I

One state per saved draft. (3)

I

One state per revision. (97)

I

One state per change in production mode. (140)

I A “state” is a “version”.

I Apply NLP to growing text/versions.

I

Show information on current word.

I

Show information on current sentence.

I

Show diffs of versions.

I Visualize growing.

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 10/19

(41)

Approaches for Processing Writing Data

I During writing, a text has various states until the final text:

I

One state per keystroke-log-event. (2’130)

I

One state per saved draft. (3)

I

One state per revision. (97)

I

One state per change in production mode. (140)

I A “state” is a “version”.

I Apply NLP to growing text/versions.

I

Show information on current word.

I

Show information on current sentence.

I

Show diffs of versions.

I Visualize growing.

(42)

Approaches for Processing Writing Data

I During writing, a text has various states until the final text:

I

One state per keystroke-log-event. (2’130)

I

One state per saved draft. (3)

I

One state per revision. (97)

I

One state per change in production mode. (140)

I A “state” is a “version”.

I Apply NLP to growing text/versions.

I

Show information on current word.

I

Show information on current sentence.

I

Show diffs of versions.

I Visualize growing.

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 10/19

(43)

Approaches for Processing Writing Data

I During writing, a text has various states until the final text:

I

One state per keystroke-log-event. (2’130)

I

One state per saved draft. (3)

I

One state per revision. (97)

I

One state per change in production mode. (140)

I A “state” is a “version”.

I Apply NLP to growing text/versions.

I

Show information on current word.

I

Show information on current sentence.

I

Show diffs of versions.

I Visualize growing.

(44)

Approaches for Processing Writing Data

I During writing, a text has various states until the final text:

I

One state per keystroke-log-event. (2’130)

I

One state per saved draft. (3)

I

One state per revision. (97)

I

One state per change in production mode. (140)

I A “state” is a “version”.

I Apply NLP to growing text/versions.

I

Show information on current word.

I

Show information on current sentence.

I

Show diffs of versions.

I Visualize growing.

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 10/19

(45)

Approaches for Processing Writing Data

I During writing, a text has various states until the final text:

I

One state per keystroke-log-event. (2’130)

I

One state per saved draft. (3)

I

One state per revision. (97)

I

One state per change in production mode. (140)

I A “state” is a “version”.

I Apply NLP to growing text/versions.

I

Show information on current word.

I

Show information on current sentence.

I

Show diffs of versions.

I Visualize growing.

(46)

Approaches for Processing Writing Data

I During writing, a text has various states until the final text:

I

One state per keystroke-log-event. (2’130)

I

One state per saved draft. (3)

I

One state per revision. (97)

I

One state per change in production mode. (140)

I A “state” is a “version”.

I Apply NLP to growing text/versions.

I

Show information on current word.

I

Show information on current sentence.

I

Show diffs of versions.

I Visualize growing.

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 10/19

(47)

Approaches for Processing Writing Data

I During writing, a text has various states until the final text:

I

One state per keystroke-log-event. (2’130)

I

One state per saved draft. (3)

I

One state per revision. (97)

I

One state per change in production mode. (140)

I A “state” is a “version”.

I Apply NLP to growing text/versions.

I

Show information on current word.

I

Show information on current sentence.

I

Show diffs of versions.

I Visualize growing.

(48)

Approaches for Processing Writing Data

I During writing, a text has various states until the final text:

I

One state per keystroke-log-event. (2’130)

I

One state per saved draft. (3)

I

One state per revision. (97)

I

One state per change in production mode. (140)

I A “state” is a “version”.

I Apply NLP to growing text/versions.

I

Show information on current word.

I

Show information on current sentence.

I

Show diffs of versions.

I Visualize growing.

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 10/19

(49)

Approaches for Processing Writing Data

I During writing, a text has various states until the final text:

I

One state per keystroke-log-event. (2’130)

I

One state per saved draft. (3)

I

One state per revision. (97)

I

One state per change in production mode. (140)

I A “state” is a “version”.

I Apply NLP to growing text/versions.

I

Show information on current word.

I

Show information on current sentence.

I

Show diffs of versions.

I Visualize growing.

(50)

Approaches for Processing Writing Data

I During writing, a text has various states until the final text:

I

One state per keystroke-log-event. (2’130)

I

One state per saved draft. (3)

I

One state per revision. (97)

I

One state per change in production mode. (140)

I A “state” is a “version”.

I Apply NLP to growing text/versions.

I

Show information on current word.

I

Show information on current sentence.

I

Show diffs of versions.

I Visualize growing.

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 10/19

(51)

Approaches for Processing Writing Data

I During writing, a text has various states until the final text:

I

One state per keystroke-log-event. (2’130)

I

One state per saved draft. (3)

I

One state per revision. (97)

I

One state per change in production mode. (140)

I A “state” is a “version”.

I Apply NLP to growing text/versions.

I

Show information on current word.

I

Show information on current sentence.

I

Show diffs of versions.

I Visualize growing.

(52)

Change in Production Mode

I Switch from:

I

normal text production to deletion or insertion

I

deletion to insertion or normal text production

I

insertion to deletion or normal text production

I “normal text production” is typing at the edge of the text produced so far.

å Operationalization of “version”:

I

Previous version can be accessed by executing undo.

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 11/19

(53)

Change in Production Mode

I Switch from:

I

normal text production to deletion or insertion

I

deletion to insertion or normal text production

I

insertion to deletion or normal text production

I “normal text production” is typing at the edge of the text produced so far.

å Operationalization of “version”:

I

Previous version can be accessed by executing undo.

(54)

Change in Production Mode

I Switch from:

I

normal text production to deletion or insertion

I

deletion to insertion or normal text production

I

insertion to deletion or normal text production

I “normal text production” is typing at the edge of the text produced so far.

å Operationalization of “version”:

I

Previous version can be accessed by executing undo.

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 11/19

(55)

Change in Production Mode

I Switch from:

I

normal text production to deletion or insertion

I

deletion to insertion or normal text production

I

insertion to deletion or normal text production

I “normal text production” is typing at the edge of the text produced so far.

å Operationalization of “version”:

I

Previous version can be accessed by executing undo.

(56)

Change in Production Mode

I Switch from:

I

normal text production to deletion or insertion

I

deletion to insertion or normal text production

I

insertion to deletion or normal text production

I “normal text production” is typing at the edge of the text produced so far.

å Operationalization of “version”:

I

Previous version can be accessed by executing undo.

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 11/19

(57)

Change in Production Mode

I Switch from:

I

normal text production to deletion or insertion

I

deletion to insertion or normal text production

I

insertion to deletion or normal text production

I “normal text production” is typing at the edge of the text produced so far.

å Operationalization of “version”:

I

Previous version can be accessed by executing undo.

(58)

Change in Production Mode

I Switch from:

I

normal text production to deletion or insertion

I

deletion to insertion or normal text production

I

insertion to deletion or normal text production

I “normal text production” is typing at the edge of the text produced so far.

å Operationalization of “version”:

I

Previous version can be accessed by executing undo.

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 11/19

(59)

Example Text (first 7 versions)

(60)

Example Text (first 7 versions)

(explicit versions) Die Legalisierunh Die Legalisierun

Die Legalisierung akriver Die Legalisierung

Die Legalisierung der aktiven Sterbehilfe ist ein sehr zwiespältiges thema

Die Legalisierung der aktiven Sterbehilfe ist ein sehr zwiespältiges

Die Legalisierung der aktiven Sterbehilfe ist ein sehr zwiespältiges Themas. Für eine Legalisierung wur

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 12/19

(61)

Example Text (first 7 versions)

(S-notation)

Die Legalisierun[h]1|1g [akriver]2|2der aktiven

Sterbehilfe ist ein sehr zwiespältiges [thema]3|3Themas.

Für eine Legalisierung wur|4

(62)

Visualize growing text

I Use versioning/diffing algorithms from document engineering.

I

rely on explicit structure.

I

focus on documents, not on text.

I Use NLP parsing.

I

implicit structure of natural language sentences.

I

sentences during writing are ill-formed, incomplete, inconsistent.

I

when to parse?

I

how to diff parse-trees?

I

we need interactive parsing!

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 13/19

(63)

Visualize growing text

I Use versioning/diffing algorithms from document engineering.

I

rely on explicit structure.

I

focus on documents, not on text.

I Use NLP parsing.

I

implicit structure of natural language sentences.

I

sentences during writing are ill-formed, incomplete, inconsistent.

I

when to parse?

I

how to diff parse-trees?

I

we need interactive parsing!

(64)

Visualize growing text

I Use versioning/diffing algorithms from document engineering.

I

rely on explicit structure.

I

focus on documents, not on text.

I Use NLP parsing.

I

implicit structure of natural language sentences.

I

sentences during writing are ill-formed, incomplete, inconsistent.

I

when to parse?

I

how to diff parse-trees?

I

we need interactive parsing!

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 13/19

(65)

Visualize growing text

I Use versioning/diffing algorithms from document engineering.

I

rely on explicit structure.

I

focus on documents, not on text.

I Use NLP parsing.

I

implicit structure of natural language sentences.

I

sentences during writing are ill-formed, incomplete, inconsistent.

I

when to parse?

I

how to diff parse-trees?

I

we need interactive parsing!

(66)

Visualize growing text

I Use versioning/diffing algorithms from document engineering.

I

rely on explicit structure.

I

focus on documents, not on text.

I Use NLP parsing.

I

implicit structure of natural language sentences.

I

sentences during writing are ill-formed, incomplete, inconsistent.

I

when to parse?

I

how to diff parse-trees?

I

we need interactive parsing!

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 13/19

(67)

Visualize growing text

I Use versioning/diffing algorithms from document engineering.

I

rely on explicit structure.

I

focus on documents, not on text.

I Use NLP parsing.

I

implicit structure of natural language sentences.

I

sentences during writing are ill-formed, incomplete, inconsistent.

I

when to parse?

I

how to diff parse-trees?

I

we need interactive parsing!

(68)

Automatic Re-Parsing

I Avoid re-parsing untouched sentences.

I Avoid constant re-parsing

(e.g., after each key pressed or every 10 seconds).

I Re-use parsed clauses/sentences that are moved around.

I Possibilities:

I

parse touched sentences when a version is saved (manually or automatically).

I

parse touched sentence after a change in production mode.

I

parse current sentence after a change in production mode.

I

parse everything after a change in production mode.

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 14/19

(69)

Automatic Re-Parsing

I Avoid re-parsing untouched sentences.

I Avoid constant re-parsing

(e.g., after each key pressed or every 10 seconds).

I Re-use parsed clauses/sentences that are moved around.

I Possibilities:

I

parse touched sentences when a version is saved (manually or automatically).

I

parse touched sentence after a change in production mode.

I

parse current sentence after a change in production mode.

I

parse everything after a change in production mode.

(70)

Automatic Re-Parsing

I Avoid re-parsing untouched sentences.

I Avoid constant re-parsing

(e.g., after each key pressed or every 10 seconds).

I Re-use parsed clauses/sentences that are moved around.

I Possibilities:

I

parse touched sentences when a version is saved (manually or automatically).

I

parse touched sentence after a change in production mode.

I

parse current sentence after a change in production mode.

I

parse everything after a change in production mode.

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 14/19

(71)

Automatic Re-Parsing

I Avoid re-parsing untouched sentences.

I Avoid constant re-parsing

(e.g., after each key pressed or every 10 seconds).

I Re-use parsed clauses/sentences that are moved around.

I Possibilities:

I

parse touched sentences when a version is saved (manually or automatically).

I

parse touched sentence after a change in production mode.

I

parse current sentence after a change in production mode.

I

parse everything after a change in production mode.

(72)

Automatic Re-Parsing

I Avoid re-parsing untouched sentences.

I Avoid constant re-parsing

(e.g., after each key pressed or every 10 seconds).

I Re-use parsed clauses/sentences that are moved around.

I Possibilities:

I

parse touched sentences when a version is saved (manually or automatically).

I

parse touched sentence after a change in production mode.

I

parse current sentence after a change in production mode.

I

parse everything after a change in production mode.

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 14/19

(73)

Automatic Re-Parsing

I Avoid re-parsing untouched sentences.

I Avoid constant re-parsing

(e.g., after each key pressed or every 10 seconds).

I Re-use parsed clauses/sentences that are moved around.

I Possibilities:

I

parse touched sentences when a version is saved (manually or automatically).

I

parse touched sentence after a change in production mode.

I

parse current sentence after a change in production mode.

I

parse everything after a change in production mode.

(74)

Automatic Re-Parsing

I Avoid re-parsing untouched sentences.

I Avoid constant re-parsing

(e.g., after each key pressed or every 10 seconds).

I Re-use parsed clauses/sentences that are moved around.

I Possibilities:

I

parse touched sentences when a version is saved (manually or automatically).

I

parse touched sentence after a change in production mode.

I

parse current sentence after a change in production mode.

I

parse everything after a change in production mode.

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 14/19

(75)

Automatic Re-Parsing

I Avoid re-parsing untouched sentences.

I Avoid constant re-parsing

(e.g., after each key pressed or every 10 seconds).

I Re-use parsed clauses/sentences that are moved around.

I Possibilities:

I

parse touched sentences when a version is saved (manually or automatically).

I

parse touched sentence after a change in production mode.

I

parse current sentence after a change in production mode.

I

parse everything after a change in production mode.

(76)

Automatic Re-Parsing

I Avoid re-parsing untouched sentences.

I Avoid constant re-parsing

(e.g., after each key pressed or every 10 seconds).

I Re-use parsed clauses/sentences that are moved around.

I Possibilities:

I

parse touched sentences when a version is saved (manually or automatically).

I

parse touched sentence after a change in production mode.

I

parse current sentence after a change in production mode.

I

parse everything after a change in production mode.

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 14/19

(77)

Interactive NLP in Text Editors

I Emacs as test bed.

I Information on current word.

(Stripey Zebra implementation in Malaga for German morphology)

I Information on POS structures (aka “syntax highlighting”).

(MBT for German)

I Information on syntax structure of current version.

(Mate for German)

(78)

Interactive NLP in Text Editors

I Emacs as test bed.

I Information on current word.

(Stripey Zebra implementation in Malaga for German morphology)

I Information on POS structures (aka “syntax highlighting”).

(MBT for German)

I Information on syntax structure of current version.

(Mate for German)

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 15/19

(79)

Interactive NLP in Text Editors

I Emacs as test bed.

I Information on current word.

(Stripey Zebra implementation in Malaga for German morphology) I Information on POS structures (aka “syntax highlighting”).

(MBT for German)

I Information on syntax structure of current version.

(Mate for German)

(80)

Interactive NLP in Text Editors

I Emacs as test bed.

I Information on current word.

(Stripey Zebra implementation in Malaga for German morphology) I Information on POS structures (aka “syntax highlighting”).

(MBT for German)

I Information on syntax structure of current version.

(Mate for German)

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 15/19

(81)

Parsing Versions

(82)

Parsing Versions

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 16/19

(83)

Parsing Versions

(84)

Parsing Versions

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 16/19

(85)

Parsing Versions

(86)

Parsing Versions

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 16/19

(87)

Parsing Versions

(88)

Summary

I Non-linear incremental parsing as parsing of text during production.

å Application of CS-style incremental parsing to natural language text “coding”.

I Static interactive NLP available on word and sentence level.

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 17/19

(89)

Summary

I Non-linear incremental parsing as parsing of text during production.

å Application of CS-style incremental parsing to natural language text “coding”.

I Static interactive NLP available on word and sentence level.

(90)

Summary

I Non-linear incremental parsing as parsing of text during production.

å Application of CS-style incremental parsing to natural language text “coding”.

I Static interactive NLP available on word and sentence level.

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 17/19

(91)

Open Challenges

I How to display parse trees to be useful for a writer?

I How to diff parse trees and display the diff?

I Apply structured editing.

I Use our version definition for scholarly editions, too.

(92)

Open Challenges

I How to display parse trees to be useful for a writer?

I How to diff parse trees and display the diff?

I Apply structured editing.

I Use our version definition for scholarly editions, too.

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 18/19

(93)

Open Challenges

I How to display parse trees to be useful for a writer?

I How to diff parse trees and display the diff?

I Apply structured editing.

I Use our version definition for scholarly editions, too.

(94)

Growing Trees:

Non-Linear Incremental Parsing during Writing

Cerstin Mahlow

Institut für Deutsche Sprache mahlow@ids-mannheim.de

Fred-Jelinek Seminar, Prague, April 4, 2016

Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 19/19

Odkazy

Související dokumenty

• In spite of the current dominance of JSON in various web solutions, XML is still used frequently for data exchange. • a quick glimpse into some data: XML mentioned 160x in

2 Data formats, markup, text processing and rendering Plain text versus binary data.. XML, HTML and L A TEX documents Writing in ArabTEX and

7 Application End application communication 6 Presentation Data conversions for applications 5 Session End nodes dialog control 4 Transport End-to-end data

– Missing completly at random - errors during data collection or data processing (Example: Age 210), in a non systematic way. – Missing at random – if a customer doesn’t have a

 XML (eXtensible Markup Language) is a format for transfer and exchange of general data.  Extensible Markup Language (XML) 1.0

From the point of view of the input data we can distinguish so-called fixed methods which store the data purely on the basis of their model and adaptive methods, where also sample

Techniques for data-centric XML documents have one common idea: XML data is stored and processed in a relational or object-relational database system and using a certain method

A general relational schema for any type of (collection of) XML data.  View XML data as a