Growing Trees:
Non-Linear Incremental Parsing during Writing
Cerstin Mahlow
Institut für Deutsche Sprache mahlow@ids-mannheim.de
Fred-Jelinek Seminar, Prague, April 4, 2016
Incremental Parsing
I Computer Science
I
on-going process during coding
I
parsing of the structure so far
I
update the parse according to changes
I NLP
I
batch process
I
processing of text
I
parse a sentence word-by-word
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 2/19
Incremental Parsing
I Computer Science
I
on-going process during coding
I
parsing of the structure so far
I
update the parse according to changes
I NLP
I
batch process
I
processing of text
I
parse a sentence word-by-word
Incremental Parsing
I Computer Science
I
on-going process during coding
I
parsing of the structure so far
I
update the parse according to changes
I NLP
I
batch process
I
processing of text
I
parse a sentence word-by-word
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 2/19
Incremental Parsing
I Computer Science
I
on-going process during coding
I
parsing of the structure so far
I
update the parse according to changes
I NLP
I
batch process
I
processing of text
I
parse a sentence word-by-word
Incremental Parsing
I Computer Science
I
on-going process during coding
I
parsing of the structure so far
I
update the parse according to changes
I NLP
I
batch process
I
processing of text
I
parse a sentence word-by-word
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 2/19
Incremental Parsing
I Computer Science
I
on-going process during coding
I
parsing of the structure so far
I
update the parse according to changes
I NLP
I
batch process
I
processing of text
I
parse a sentence word-by-word
Writing
I non-linear creation of text
I differs from:
I
speaking
I
hearing
I
reading
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 3/19
Writing
I non-linear creation of text
I differs from:
I
speaking
I
hearing
I
reading
Writing
I non-linear creation of text
I differs from:
I
speaking
I
hearing
I
reading
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 3/19
Writing
I non-linear creation of text
I differs from:
I
speaking
I
hearing
I
reading
Writing
I non-linear creation of text
I differs from:
I
speaking
I
hearing
I
reading
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 3/19
Challenges in
Live Processing of the Writing Process
I non-linear
I revisions
I big data
I media discontinuity
I author-dependent
Challenges in
Live Processing of the Writing Process
I non-linear
I revisions
I big data
I media discontinuity
I author-dependent
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 4/19
Challenges in
Live Processing of the Writing Process
I non-linear
I revisions
I big data
I media discontinuity
I author-dependent
Challenges in
Live Processing of the Writing Process
I non-linear
I revisions
I big data
I media discontinuity
I author-dependent
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 4/19
Challenges in
Live Processing of the Writing Process
I non-linear
I revisions
I big data
I media discontinuity
I author-dependent
Challenges in
Live Processing of the Writing Process
I non-linear
I revisions
I big data
I media discontinuity
I author-dependent
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 4/19
Media Discontinuity
Media Discontinuity
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 5/19
Media Discontinuity
(James Joyce)
Media Discontinuity
(Max Frisch)
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 5/19
Media Discontinuity
(Barack Obama)
Big Data
I Recording of the writing process by keystroke-logging
I Keystroke-logs grow as we type
I Size of keystroke-logs
I
Example: 2’130 data points for a 23 minutes writing session (1.3 MB XML data, 160 words final text)
I
Snapshot: Data for producing the German determiner “Die”
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 6/19
Big Data
I Recording of the writing process by keystroke-logging
I Keystroke-logs grow as we type
I Size of keystroke-logs
I
Example: 2’130 data points for a 23 minutes writing session (1.3 MB XML data, 160 words final text)
I
Snapshot: Data for producing the German determiner “Die”
Big Data
I Recording of the writing process by keystroke-logging
I Keystroke-logs grow as we type
I Size of keystroke-logs
I
Example: 2’130 data points for a 23 minutes writing session (1.3 MB XML data, 160 words final text)
I
Snapshot: Data for producing the German determiner “Die”
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 6/19
Big Data
I Recording of the writing process by keystroke-logging
I Keystroke-logs grow as we type
I Size of keystroke-logs
I
Example: 2’130 data points for a 23 minutes writing session (1.3 MB XML data, 160 words final text)
I
Snapshot: Data for producing the German determiner “Die”
Big Data
I Recording of the writing process by keystroke-logging
I Keystroke-logs grow as we type
I Size of keystroke-logs
I
Example: 2’130 data points for a 23 minutes writing session (1.3 MB XML data, 160 words final text)
I
Snapshot: Data for producing the German determiner “Die”
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 6/19
Big Data
(Producing “Die”)
Example Data
I Writing session from a seminar
I German
I Students
I Argumentative Essay:
I
150 to 160 words
I
25 minutes time
I
free topic (here: “Should assisted suicide be legalized?”)
I Original research hypothesis:
I
Restrictions on time and size trigger extensive revisions.
I
(proven only for size)
I Keystroke-Logging with Inputlog www.inputlog.net
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 8/19
Example Data
I Writing session from a seminar
I German
I Students
I Argumentative Essay:
I
150 to 160 words
I
25 minutes time
I
free topic (here: “Should assisted suicide be legalized?”)
I Original research hypothesis:
I
Restrictions on time and size trigger extensive revisions.
I
(proven only for size)
I Keystroke-Logging with Inputlog
www.inputlog.net
Example Data
I Writing session from a seminar
I German
I Students
I Argumentative Essay:
I
150 to 160 words
I
25 minutes time
I
free topic (here: “Should assisted suicide be legalized?”)
I Original research hypothesis:
I
Restrictions on time and size trigger extensive revisions.
I
(proven only for size)
I Keystroke-Logging with Inputlog www.inputlog.net
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 8/19
Example Data
I Writing session from a seminar
I German
I Students
I Argumentative Essay:
I
150 to 160 words
I
25 minutes time
I
free topic (here: “Should assisted suicide be legalized?”)
I Original research hypothesis:
I
Restrictions on time and size trigger extensive revisions.
I
(proven only for size)
I Keystroke-Logging with Inputlog
www.inputlog.net
Revisions and Non-Linearity
I Deletions and insertions at any time at any location.
I S-Notation for marking revisions.
I Snapshot: first part of the example text produced.
Die L e g a l i s i e r u n [ h ] 1 | 1 g [ [ a k r i v e r ] 2 | 2 der a k t i v e n ] 6 7 | 6 8 { a k t i v e r } 6 8 | 6 9 S t e r b e h i l f e i s t e i n sehr z w i e s p ä l t i g e s [ thema ] 3 | 3 Themas . [ Für e i n e L e g a l i s i e r u n g ] 6 9 | 7 0 { Da fü r } 7 0 | 7 1 [ w [ u r ] 4 | 4 ürde [ S ] 5 | 5 sprechen ] 7 1 | 7 2 { s p r [ öche ] 7 3 | 7 4 { äche } 7 4 | 7 5 } 7 2
| 7 3 , dass es t o t k r a n [ kenmen ] 6 | 6 ken Menschen e i n s a n f t e s Ab [ bl ben ] 7 | 7 le ben e r m ö g l i c h { t } 4 0 | 4 1 und [ [ u n ö t i g e s ] 8 | 8 w [ i e t e r e ] 9 | 9 e i t e r e s ] 7 6 | 7 7 { w e i t e r e s } 7 7 | 7 8 unnö [ ä ] 1 0 | 1 0 t i g e s
L e i d v e r h i n d e r t .
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 9/19
Revisions and Non-Linearity
I Deletions and insertions at any time at any location.
I S-Notation for marking revisions.
I Snapshot: first part of the example text produced.
Die L e g a l i s i e r u n [ h ] 1 | 1 g [ [ a k r i v e r ] 2 | 2 der a k t i v e n ] 6 7 | 6 8 { a k t i v e r } 6 8 | 6 9 S t e r b e h i l f e i s t e i n sehr z w i e s p ä l t i g e s [ thema ] 3 | 3 Themas . [ Für e i n e L e g a l i s i e r u n g ] 6 9 | 7 0 { Da fü r } 7 0 | 7 1 [ w [ u r ] 4 | 4 ürde [ S ] 5 | 5 sprechen ] 7 1 | 7 2 { s p r [ öche ] 7 3 | 7 4 { äche } 7 4 | 7 5 } 7 2
| 7 3 , dass es t o t k r a n [ kenmen ] 6 | 6 ken Menschen e i n s a n f t e s Ab [ bl ben ] 7 | 7 le ben e r m ö g l i c h { t } 4 0 | 4 1 und [ [ u n ö t i g e s ] 8 | 8 w [ i e t e r e ] 9 | 9 e i t e r e s ] 7 6 | 7 7 { w e i t e r e s } 7 7 | 7 8 unnö [ ä ] 1 0 | 1 0 t i g e s
L e i d v e r h i n d e r t .
Revisions and Non-Linearity
I Deletions and insertions at any time at any location.
I S-Notation for marking revisions.
I Snapshot: first part of the example text produced.
Die L e g a l i s i e r u n [ h ] 1 | 1 g [ [ a k r i v e r ] 2 | 2 der a k t i v e n ] 6 7 | 6 8 { a k t i v e r } 6 8 | 6 9 S t e r b e h i l f e i s t e i n sehr z w i e s p ä l t i g e s [ thema ] 3 | 3 Themas . [ Für e i n e L e g a l i s i e r u n g ] 6 9 | 7 0 { Da fü r } 7 0 | 7 1 [ w [ u r ] 4 | 4 ürde [ S ] 5 | 5 sprechen ] 7 1 | 7 2 { s p r [ öche ] 7 3 | 7 4 { äche } 7 4 | 7 5 } 7 2
| 7 3 , dass es t o t k r a n [ kenmen ] 6 | 6 ken Menschen e i n s a n f t e s Ab [ bl ben ] 7 | 7 le ben e r m ö g l i c h { t } 4 0 | 4 1 und [ [ u n ö t i g e s ] 8 | 8 w [ i e t e r e ] 9 | 9 e i t e r e s ] 7 6 | 7 7 { w e i t e r e s } 7 7 | 7 8 unnö [ ä ] 1 0 | 1 0 t i g e s
L e i d v e r h i n d e r t .
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 9/19
Revisions and Non-Linearity
I Deletions and insertions at any time at any location.
I S-Notation for marking revisions.
I Snapshot: first part of the example text produced.
Die L e g a l i s i e r u n [ h ] 1 | 1 g [ [ a k r i v e r ] 2 | 2 der a k t i v e n ] 6 7 | 6 8 { a k t i v e r } 6 8 | 6 9 S t e r b e h i l f e i s t e i n sehr z w i e s p ä l t i g e s [ thema ] 3 | 3 Themas . [ Für e i n e L e g a l i s i e r u n g ] 6 9 | 7 0 { Da fü r } 7 0 | 7 1 [ w [ u r ] 4 | 4 ürde [ S ] 5 | 5 sprechen ] 7 1 | 7 2 { s p r [ öche ] 7 3 | 7 4 { äche } 7 4 | 7 5 } 7 2
| 7 3 , dass es t o t k r a n [ kenmen ] 6 | 6 ken Menschen e i n s a n f t e s Ab [ bl ben ] 7 | 7 le ben e r m ö g l i c h { t } 4 0 | 4 1 und [ [ u n ö t i g e s ] 8 | 8 w [ i e t e r e ] 9 | 9 e i t e r e s ] 7 6 | 7 7 { w e i t e r e s } 7 7 | 7 8 unnö [ ä ] 1 0 | 1 0 t i g e s
L e i d v e r h i n d e r t .
Revisions and Non-Linearity
I Deletions and insertions at any time at any location.
I S-Notation for marking revisions.
I Snapshot: first part of the example text produced.
Die L e g a l i s i e r u n [ h ] 1 | 1 g [ [ a k r i v e r ] 2 | 2 der a k t i v e n ] 6 7 | 6 8 { a k t i v e r } 6 8 | 6 9 S t e r b e h i l f e i s t e i n sehr z w i e s p ä l t i g e s [ thema ] 3 | 3 Themas . [ Für e i n e L e g a l i s i e r u n g ] 6 9 | 7 0 { Da fü r } 7 0 | 7 1 [ w [ u r ] 4 | 4 ürde [ S ] 5 | 5 sprechen ] 7 1 | 7 2 { s p r [ öche ] 7 3 | 7 4 { äche } 7 4 | 7 5 } 7 2
| 7 3 , dass es t o t k r a n [ kenmen ] 6 | 6 ken Menschen e i n s a n f t e s Ab [ bl ben ] 7 | 7 le ben e r m ö g l i c h { t } 4 0 | 4 1 und [ [ u n ö t i g e s ] 8 | 8 w [ i e t e r e ] 9 | 9 e i t e r e s ] 7 6 | 7 7 { w e i t e r e s } 7 7 | 7 8 unnö [ ä ] 1 0 | 1 0 t i g e s
L e i d v e r h i n d e r t .
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 9/19
Revisions and Non-Linearity
I Deletions and insertions at any time at any location.
I S-Notation for marking revisions.
I Snapshot: first part of the example text produced.
Die L e g a l i s i e r u n [ h ] 1 | 1 g [ [ a k r i v e r ] 2 | 2 der a k t i v e n ] 6 7 | 6 8 { a k t i v e r } 6 8 | 6 9 S t e r b e h i l f e i s t e i n sehr z w i e s p ä l t i g e s [ thema ] 3 | 3 Themas . [ Für e i n e L e g a l i s i e r u n g ] 6 9 | 7 0 { Da fü r } 7 0 | 7 1 [ w [ u r ] 4 | 4 ürde [ S ] 5 | 5 sprechen ] 7 1 | 7 2 { s p r [ öche ] 7 3 | 7 4 { äche } 7 4 | 7 5 } 7 2
| 7 3 , dass es t o t k r a n [ kenmen ] 6 | 6 ken Menschen e i n s a n f t e s Ab [ bl ben ] 7 | 7 le ben e r m ö g l i c h { t } 4 0 | 4 1 und [ [ u n ö t i g e s ] 8 | 8 w [ i e t e r e ] 9 | 9 e i t e r e s ] 7 6 | 7 7 { w e i t e r e s } 7 7 | 7 8 unnö [ ä ] 1 0 | 1 0 t i g e s
L e i d v e r h i n d e r t .
Approaches for Processing Writing Data
I During writing, a text has various states until the final text:
I
One state per keystroke-log-event. (2’130)
I
One state per saved draft. (3)
I
One state per revision. (97)
I
One state per change in production mode. (140)
I A “state” is a “version”.
I Apply NLP to growing text/versions.
I
Show information on current word.
I
Show information on current sentence.
I
Show diffs of versions.
I Visualize growing.
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 10/19
Approaches for Processing Writing Data
I During writing, a text has various states until the final text:
I
One state per keystroke-log-event. (2’130)
I
One state per saved draft. (3)
I
One state per revision. (97)
I
One state per change in production mode. (140)
I A “state” is a “version”.
I Apply NLP to growing text/versions.
I
Show information on current word.
I
Show information on current sentence.
I
Show diffs of versions.
I Visualize growing.
Approaches for Processing Writing Data
I During writing, a text has various states until the final text:
I
One state per keystroke-log-event. (2’130)
I
One state per saved draft. (3)
I
One state per revision. (97)
I
One state per change in production mode. (140)
I A “state” is a “version”.
I Apply NLP to growing text/versions.
I
Show information on current word.
I
Show information on current sentence.
I
Show diffs of versions.
I Visualize growing.
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 10/19
Approaches for Processing Writing Data
I During writing, a text has various states until the final text:
I
One state per keystroke-log-event. (2’130)
I
One state per saved draft. (3)
I
One state per revision. (97)
I
One state per change in production mode. (140)
I A “state” is a “version”.
I Apply NLP to growing text/versions.
I
Show information on current word.
I
Show information on current sentence.
I
Show diffs of versions.
I Visualize growing.
Approaches for Processing Writing Data
I During writing, a text has various states until the final text:
I
One state per keystroke-log-event. (2’130)
I
One state per saved draft. (3)
I
One state per revision. (97)
I
One state per change in production mode. (140)
I A “state” is a “version”.
I Apply NLP to growing text/versions.
I
Show information on current word.
I
Show information on current sentence.
I
Show diffs of versions.
I Visualize growing.
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 10/19
Approaches for Processing Writing Data
I During writing, a text has various states until the final text:
I
One state per keystroke-log-event. (2’130)
I
One state per saved draft. (3)
I
One state per revision. (97)
I
One state per change in production mode. (140)
I A “state” is a “version”.
I Apply NLP to growing text/versions.
I
Show information on current word.
I
Show information on current sentence.
I
Show diffs of versions.
I Visualize growing.
Approaches for Processing Writing Data
I During writing, a text has various states until the final text:
I
One state per keystroke-log-event. (2’130)
I
One state per saved draft. (3)
I
One state per revision. (97)
I
One state per change in production mode. (140)
I A “state” is a “version”.
I Apply NLP to growing text/versions.
I
Show information on current word.
I
Show information on current sentence.
I
Show diffs of versions.
I Visualize growing.
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 10/19
Approaches for Processing Writing Data
I During writing, a text has various states until the final text:
I
One state per keystroke-log-event. (2’130)
I
One state per saved draft. (3)
I
One state per revision. (97)
I
One state per change in production mode. (140)
I A “state” is a “version”.
I Apply NLP to growing text/versions.
I
Show information on current word.
I
Show information on current sentence.
I
Show diffs of versions.
I Visualize growing.
Approaches for Processing Writing Data
I During writing, a text has various states until the final text:
I
One state per keystroke-log-event. (2’130)
I
One state per saved draft. (3)
I
One state per revision. (97)
I
One state per change in production mode. (140)
I A “state” is a “version”.
I Apply NLP to growing text/versions.
I
Show information on current word.
I
Show information on current sentence.
I
Show diffs of versions.
I Visualize growing.
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 10/19
Approaches for Processing Writing Data
I During writing, a text has various states until the final text:
I
One state per keystroke-log-event. (2’130)
I
One state per saved draft. (3)
I
One state per revision. (97)
I
One state per change in production mode. (140)
I A “state” is a “version”.
I Apply NLP to growing text/versions.
I
Show information on current word.
I
Show information on current sentence.
I
Show diffs of versions.
I Visualize growing.
Approaches for Processing Writing Data
I During writing, a text has various states until the final text:
I
One state per keystroke-log-event. (2’130)
I
One state per saved draft. (3)
I
One state per revision. (97)
I
One state per change in production mode. (140)
I A “state” is a “version”.
I Apply NLP to growing text/versions.
I
Show information on current word.
I
Show information on current sentence.
I
Show diffs of versions.
I Visualize growing.
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 10/19
Approaches for Processing Writing Data
I During writing, a text has various states until the final text:
I
One state per keystroke-log-event. (2’130)
I
One state per saved draft. (3)
I
One state per revision. (97)
I
One state per change in production mode. (140)
I A “state” is a “version”.
I Apply NLP to growing text/versions.
I
Show information on current word.
I
Show information on current sentence.
I
Show diffs of versions.
I Visualize growing.
Change in Production Mode
I Switch from:
I
normal text production to deletion or insertion
I
deletion to insertion or normal text production
I
insertion to deletion or normal text production
I “normal text production” is typing at the edge of the text produced so far.
å Operationalization of “version”:
I
Previous version can be accessed by executing undo.
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 11/19
Change in Production Mode
I Switch from:
I
normal text production to deletion or insertion
I
deletion to insertion or normal text production
I
insertion to deletion or normal text production
I “normal text production” is typing at the edge of the text produced so far.
å Operationalization of “version”:
I
Previous version can be accessed by executing undo.
Change in Production Mode
I Switch from:
I
normal text production to deletion or insertion
I
deletion to insertion or normal text production
I
insertion to deletion or normal text production
I “normal text production” is typing at the edge of the text produced so far.
å Operationalization of “version”:
I
Previous version can be accessed by executing undo.
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 11/19
Change in Production Mode
I Switch from:
I
normal text production to deletion or insertion
I
deletion to insertion or normal text production
I
insertion to deletion or normal text production
I “normal text production” is typing at the edge of the text produced so far.
å Operationalization of “version”:
I
Previous version can be accessed by executing undo.
Change in Production Mode
I Switch from:
I
normal text production to deletion or insertion
I
deletion to insertion or normal text production
I
insertion to deletion or normal text production
I “normal text production” is typing at the edge of the text produced so far.
å Operationalization of “version”:
I
Previous version can be accessed by executing undo.
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 11/19
Change in Production Mode
I Switch from:
I
normal text production to deletion or insertion
I
deletion to insertion or normal text production
I
insertion to deletion or normal text production
I “normal text production” is typing at the edge of the text produced so far.
å Operationalization of “version”:
I
Previous version can be accessed by executing undo.
Change in Production Mode
I Switch from:
I
normal text production to deletion or insertion
I
deletion to insertion or normal text production
I
insertion to deletion or normal text production
I “normal text production” is typing at the edge of the text produced so far.
å Operationalization of “version”:
I
Previous version can be accessed by executing undo.
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 11/19
Example Text (first 7 versions)
Example Text (first 7 versions)
(explicit versions) Die Legalisierunh Die Legalisierun
Die Legalisierung akriver Die Legalisierung
Die Legalisierung der aktiven Sterbehilfe ist ein sehr zwiespältiges thema
Die Legalisierung der aktiven Sterbehilfe ist ein sehr zwiespältiges
Die Legalisierung der aktiven Sterbehilfe ist ein sehr zwiespältiges Themas. Für eine Legalisierung wur
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 12/19
Example Text (first 7 versions)
(S-notation)
Die Legalisierun[h]1|1g [akriver]2|2der aktiven
Sterbehilfe ist ein sehr zwiespältiges [thema]3|3Themas.
Für eine Legalisierung wur|4
Visualize growing text
I Use versioning/diffing algorithms from document engineering.
I
rely on explicit structure.
I
focus on documents, not on text.
I Use NLP parsing.
I
implicit structure of natural language sentences.
I
sentences during writing are ill-formed, incomplete, inconsistent.
I
when to parse?
I
how to diff parse-trees?
I
we need interactive parsing!
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 13/19
Visualize growing text
I Use versioning/diffing algorithms from document engineering.
I
rely on explicit structure.
I
focus on documents, not on text.
I Use NLP parsing.
I
implicit structure of natural language sentences.
I
sentences during writing are ill-formed, incomplete, inconsistent.
I
when to parse?
I
how to diff parse-trees?
I
we need interactive parsing!
Visualize growing text
I Use versioning/diffing algorithms from document engineering.
I
rely on explicit structure.
I
focus on documents, not on text.
I Use NLP parsing.
I
implicit structure of natural language sentences.
I
sentences during writing are ill-formed, incomplete, inconsistent.
I
when to parse?
I
how to diff parse-trees?
I
we need interactive parsing!
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 13/19
Visualize growing text
I Use versioning/diffing algorithms from document engineering.
I
rely on explicit structure.
I
focus on documents, not on text.
I Use NLP parsing.
I
implicit structure of natural language sentences.
I
sentences during writing are ill-formed, incomplete, inconsistent.
I
when to parse?
I
how to diff parse-trees?
I
we need interactive parsing!
Visualize growing text
I Use versioning/diffing algorithms from document engineering.
I
rely on explicit structure.
I
focus on documents, not on text.
I Use NLP parsing.
I
implicit structure of natural language sentences.
I
sentences during writing are ill-formed, incomplete, inconsistent.
I
when to parse?
I
how to diff parse-trees?
I
we need interactive parsing!
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 13/19
Visualize growing text
I Use versioning/diffing algorithms from document engineering.
I
rely on explicit structure.
I
focus on documents, not on text.
I Use NLP parsing.
I
implicit structure of natural language sentences.
I
sentences during writing are ill-formed, incomplete, inconsistent.
I
when to parse?
I
how to diff parse-trees?
I
we need interactive parsing!
Automatic Re-Parsing
I Avoid re-parsing untouched sentences.
I Avoid constant re-parsing
(e.g., after each key pressed or every 10 seconds).
I Re-use parsed clauses/sentences that are moved around.
I Possibilities:
I
parse touched sentences when a version is saved (manually or automatically).
I
parse touched sentence after a change in production mode.
I
parse current sentence after a change in production mode.
I
parse everything after a change in production mode.
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 14/19
Automatic Re-Parsing
I Avoid re-parsing untouched sentences.
I Avoid constant re-parsing
(e.g., after each key pressed or every 10 seconds).
I Re-use parsed clauses/sentences that are moved around.
I Possibilities:
I
parse touched sentences when a version is saved (manually or automatically).
I
parse touched sentence after a change in production mode.
I
parse current sentence after a change in production mode.
I
parse everything after a change in production mode.
Automatic Re-Parsing
I Avoid re-parsing untouched sentences.
I Avoid constant re-parsing
(e.g., after each key pressed or every 10 seconds).
I Re-use parsed clauses/sentences that are moved around.
I Possibilities:
I
parse touched sentences when a version is saved (manually or automatically).
I
parse touched sentence after a change in production mode.
I
parse current sentence after a change in production mode.
I
parse everything after a change in production mode.
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 14/19
Automatic Re-Parsing
I Avoid re-parsing untouched sentences.
I Avoid constant re-parsing
(e.g., after each key pressed or every 10 seconds).
I Re-use parsed clauses/sentences that are moved around.
I Possibilities:
I
parse touched sentences when a version is saved (manually or automatically).
I
parse touched sentence after a change in production mode.
I
parse current sentence after a change in production mode.
I
parse everything after a change in production mode.
Automatic Re-Parsing
I Avoid re-parsing untouched sentences.
I Avoid constant re-parsing
(e.g., after each key pressed or every 10 seconds).
I Re-use parsed clauses/sentences that are moved around.
I Possibilities:
I
parse touched sentences when a version is saved (manually or automatically).
I
parse touched sentence after a change in production mode.
I
parse current sentence after a change in production mode.
I
parse everything after a change in production mode.
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 14/19
Automatic Re-Parsing
I Avoid re-parsing untouched sentences.
I Avoid constant re-parsing
(e.g., after each key pressed or every 10 seconds).
I Re-use parsed clauses/sentences that are moved around.
I Possibilities:
I
parse touched sentences when a version is saved (manually or automatically).
I
parse touched sentence after a change in production mode.
I
parse current sentence after a change in production mode.
I
parse everything after a change in production mode.
Automatic Re-Parsing
I Avoid re-parsing untouched sentences.
I Avoid constant re-parsing
(e.g., after each key pressed or every 10 seconds).
I Re-use parsed clauses/sentences that are moved around.
I Possibilities:
I
parse touched sentences when a version is saved (manually or automatically).
I
parse touched sentence after a change in production mode.
I
parse current sentence after a change in production mode.
I
parse everything after a change in production mode.
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 14/19
Automatic Re-Parsing
I Avoid re-parsing untouched sentences.
I Avoid constant re-parsing
(e.g., after each key pressed or every 10 seconds).
I Re-use parsed clauses/sentences that are moved around.
I Possibilities:
I
parse touched sentences when a version is saved (manually or automatically).
I
parse touched sentence after a change in production mode.
I
parse current sentence after a change in production mode.
I
parse everything after a change in production mode.
Automatic Re-Parsing
I Avoid re-parsing untouched sentences.
I Avoid constant re-parsing
(e.g., after each key pressed or every 10 seconds).
I Re-use parsed clauses/sentences that are moved around.
I Possibilities:
I
parse touched sentences when a version is saved (manually or automatically).
I
parse touched sentence after a change in production mode.
I
parse current sentence after a change in production mode.
I
parse everything after a change in production mode.
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 14/19
Interactive NLP in Text Editors
I Emacs as test bed.
I Information on current word.
(Stripey Zebra implementation in Malaga for German morphology)
I Information on POS structures (aka “syntax highlighting”).
(MBT for German)
I Information on syntax structure of current version.
(Mate for German)
Interactive NLP in Text Editors
I Emacs as test bed.
I Information on current word.
(Stripey Zebra implementation in Malaga for German morphology)
I Information on POS structures (aka “syntax highlighting”).
(MBT for German)
I Information on syntax structure of current version.
(Mate for German)
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 15/19
Interactive NLP in Text Editors
I Emacs as test bed.
I Information on current word.
(Stripey Zebra implementation in Malaga for German morphology) I Information on POS structures (aka “syntax highlighting”).
(MBT for German)
I Information on syntax structure of current version.
(Mate for German)
Interactive NLP in Text Editors
I Emacs as test bed.
I Information on current word.
(Stripey Zebra implementation in Malaga for German morphology) I Information on POS structures (aka “syntax highlighting”).
(MBT for German)
I Information on syntax structure of current version.
(Mate for German)
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 15/19
Parsing Versions
Parsing Versions
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 16/19
Parsing Versions
Parsing Versions
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 16/19
Parsing Versions
Parsing Versions
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 16/19
Parsing Versions
Summary
I Non-linear incremental parsing as parsing of text during production.
å Application of CS-style incremental parsing to natural language text “coding”.
I Static interactive NLP available on word and sentence level.
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 17/19
Summary
I Non-linear incremental parsing as parsing of text during production.
å Application of CS-style incremental parsing to natural language text “coding”.
I Static interactive NLP available on word and sentence level.
Summary
I Non-linear incremental parsing as parsing of text during production.
å Application of CS-style incremental parsing to natural language text “coding”.
I Static interactive NLP available on word and sentence level.
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 17/19
Open Challenges
I How to display parse trees to be useful for a writer?
I How to diff parse trees and display the diff?
I Apply structured editing.
I Use our version definition for scholarly editions, too.
Open Challenges
I How to display parse trees to be useful for a writer?
I How to diff parse trees and display the diff?
I Apply structured editing.
I Use our version definition for scholarly editions, too.
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 18/19
Open Challenges
I How to display parse trees to be useful for a writer?
I How to diff parse trees and display the diff?
I Apply structured editing.
I Use our version definition for scholarly editions, too.
Growing Trees:
Non-Linear Incremental Parsing during Writing
Cerstin Mahlow
Institut für Deutsche Sprache mahlow@ids-mannheim.de
Fred-Jelinek Seminar, Prague, April 4, 2016
Mahlow: Growing Trees. Fred-Jelinek Seminar, Prague, April 4, 2016 19/19