Functional Arabic Morphology
Principles of Design
Otakar Smrˇz
Institute of Formal and Applied Linguistics Faculty of Mathematics and Physics
Charles University in Prague
Prague, November 6, 2006
Introduction
He will notify them about that through SMS messages, the Internet, and other means. .AëQ
«
ð
I
K
Q
KB
ð
è
Q
®Ë É
K
A
QË
K
Q
£ á
«
½Ë
YK
.
Ñ
ë
Q
.
j
J
He will notify them about that through SMS messages, the Internet, and other means. .AëQ
«
ð
I
K
Q
KB
ð
è
Q
®Ë É
K
A
QË
K
Q
£ á
«
½Ë
YK
.
Ñ
ë
Q
.
j
J
String Token Token Tag Buckwalter’s M-Tags Token Form Token Gloss
Ñë Q
. jJ
F---FUT sa- will
VIIA-3MS--IV3MS+IV+IVSUFF_MOOD:I yu-h
˘bir-u he-notify S----3MP4-IVSUFF_DO:3MP -hum them
½ËYK
.
P---PREP bi- about/by
SD----MS--DEM_PRON_MS d
¯¯alika that
á« P---PREP ֒an by/about
K
Q£ N---2RNOUN+CASE_DEF_GEN t.ar¯ıq-i way-of
É
KAQË N---2DDET+NOUN+CASE_DEF_GEN ar-ras¯a֓il-i the-messages
è Q
®Ë A---FS2DDET+ADJ+NSUFF_FEM_SG+
+CASE_DEF_GEN al-qas.¯ır-at-i the-short
IK Q
KB
ð
C---CONJ wa- and
Z---2DDET+NOUN_PROP+
+CASE_DEF_GEN al-֓internet-i the-internet
C---CONJ wa- and
Introduction
He will notify them about that through SMS messages, the Internet, and other means. .AëQ
«
ð
I
K
Q
KB
ð
è
Q
®Ë É
K
A
QË
K
Q
£ á
«
½Ë
YK
.
Ñ
ë
Q
.
j
J
String Token Token Tag Buckwalter’s M-Tags Token Form Token Gloss
Ñë Q
. jJ
F---FUT sa- will
VIIA-3MS--IV3MS+IV+IVSUFF_MOOD:I yu-h
˘bir-u he-notify S----3MP4-IVSUFF_DO:3MP -hum them
½ËYK
.
P---PREP bi- about/by
SD----MS--DEM_PRON_MS d
¯¯alika that
á« P---PREP ֒an by/about
K
Q£ N---2RNOUN+CASE_DEF_GEN t.ar¯ıq-i way-of
É
KAQË N---2DDET+NOUN+CASE_DEF_GEN ar-ras¯a֓il-i the-messages
è Q
®Ë A---FS2DDET+ADJ+NSUFF_FEM_SG+
+CASE_DEF_GEN al-qas.¯ır-at-i the-short
IK Q
KB
ð
C---CONJ wa- and
Z---2DDET+NOUN_PROP+
+CASE_DEF_GEN al-֓internet-i the-internet
C---CONJ wa- and
He will notify them about that through SMS messages, the Internet, and other means. .AëQ
«
ð
I
K
Q
KB
ð
è
Q
®Ë É
K
A
QË
K
Q
£ á
«
½Ë
YK
.
Ñ
ë
Q
.
j
J
String Token Token Tag Buckwalter’s M-Tags Token Form Token Gloss
Ñë Q
. jJ
F---FUT sa- will
VIIA-3MS--IV3MS+IV+IVSUFF_MOOD:I yu-h
˘bir-u he-notify S----3MP4-IVSUFF_DO:3MP -hum them
½ËYK
.
P---PREP bi- about/by
SD----MS--DEM_PRON_MS d
¯¯alika that
á« P---PREP ֒an by/about
K
Q£ N---2RNOUN+CASE_DEF_GEN t.ar¯ıq-i way-of
É
KAQË N---2DDET+NOUN+CASE_DEF_GEN ar-ras¯a֓il-i the-messages
è Q
®Ë A---FS2DDET+ADJ+NSUFF_FEM_SG+
+CASE_DEF_GEN al-qas.¯ır-at-i the-short
IK Q
KB
ð
C---CONJ wa- and
Z---2DDET+NOUN_PROP+
+CASE_DEF_GEN al-֓internet-i the-internet
C---CONJ wa- and
Introduction
Outline
1 Introduction
Outline
1 Introduction
2 Morphological Theory
Incremental vs. Realizational Lexical vs. Inferential
Functional Arabic Morphology
Introduction
Outline
1 Introduction
2 Morphological Theory
Incremental vs. Realizational Lexical vs. Inferential
Functional Arabic Morphology
3 Implementation Design ElixirFM
Paradigms, parameters, . . . Elixir Lexicon
FM Generic
Outline
1 Introduction
2 Morphological Theory
Incremental vs. Realizational Lexical vs. Inferential
Functional Arabic Morphology
3 Implementation Design ElixirFM
Paradigms, parameters, . . . Elixir Lexicon
FM Generic
4 Extensions
Encode Arabic MorphoTrees
Introduction
Outline
1 Introduction
2 Morphological Theory
Incremental vs. Realizational Lexical vs. Inferential
Functional Arabic Morphology
3 Implementation Design ElixirFM
Paradigms, parameters, . . . Elixir Lexicon
FM Generic
4 Extensions
Encode Arabic MorphoTrees
Outline
1 Introduction
2 Morphological Theory
Incremental vs. Realizational Lexical vs. Inferential
Functional Arabic Morphology
3 Implementation Design ElixirFM
Paradigms, parameters, . . . Elixir Lexicon
FM Generic
4 Extensions
Encode Arabic MorphoTrees
Morphological Theory
Inflectional Morphology
Morphological theories can be classified along two dimensions (Stump 2001).
lexical association of word’s morphosyntactic propertieswith affixes
Inflectional Morphology
Morphological theories can be classified along two dimensions (Stump 2001).
lexical association of word’s morphosyntactic propertieswith affixes inferential inflection is a result ofoperationsonlexemes;
morphosyntactic properties are expressed by the rulesthat relate the form in a givenparadigm to the lexeme
Morphological Theory
Inflectional Morphology
Morphological theories can be classified along two dimensions (Stump 2001).
lexical association of word’s morphosyntactic propertieswith affixes inferential inflection is a result ofoperationsonlexemes;
morphosyntactic properties are expressed by the rulesthat relate the form in a givenparadigm to the lexeme
incremental words acquiremorphosyntactic properties only in connection with acquiring the inflectional exponentsof those properties
Inflectional Morphology
Morphological theories can be classified along two dimensions (Stump 2001).
lexical association of word’s morphosyntactic propertieswith affixes inferential inflection is a result ofoperationsonlexemes;
morphosyntactic properties are expressed by the rulesthat relate the form in a givenparadigm to the lexeme
incremental words acquiremorphosyntactic properties only in connection with acquiring the inflectional exponentsof those properties realizational association of aset of properties with a wordlicensesthe
introduction of the exponents into the word’s morphology
Morphological Theory
Inflectional Morphology
Morphological theories can be classified along two dimensions (Stump 2001).
lexical association of word’s morphosyntactic propertieswith affixes inferential inflection is a result ofoperationsonlexemes;
morphosyntactic properties are expressed by the rulesthat relate the form in a givenparadigm to the lexeme
incremental words acquiremorphosyntactic properties only in connection with acquiring the inflectional exponentsof those properties realizational association of aset of properties with a wordlicensesthe
introduction of the exponents into the word’s morphology
Extended Exponence
The morphosyntactic properties associated with an inflected word may exhibitextended exponencein that word’s morphology.
(Stump 2001:4)
Morphological Theory Incremental vs. Realizational
Extended Exponence
The morphosyntactic properties associated with an inflected word may exhibitextended exponencein that word’s morphology.
(Stump 2001:4)
Ñë Q
. jJ
F--- FUT sa- will
VIIA-3MS-- IV3MS+IV+IVSUFF_MOOD:Iyu-h
˘bir-u he-notify S----3MP4- IVSUFF_DO:3MP -hum them
Extended Exponence
The morphosyntactic properties associated with an inflected word may exhibitextended exponencein that word’s morphology.
(Stump 2001:4)
Ñë Q
. jJ
F--- FUT sa- will
VIIA-3MS-- IV3MS+IV+IVSUFF_MOOD:I yu-h
˘bir-u he-notify S----3MP4- IVSUFF_DO:3MP -hum them
Morphological Theory Incremental vs. Realizational
Underdetermination
The morphosyntactic properties associated with an inflected word’s individualinflectional markings mayunderdetermine the properties associated with the word as awhole. (Stump 2001:7)
Underdetermination
The morphosyntactic properties associated with an inflected word’s individualinflectional markings mayunderdetermine the properties associated with the word as awhole. (Stump 2001:7)
K
Q£ N---2R NOUN+CASE_DEF_GEN t.ar¯ıq-i way-of
É
KAQË N---2D DET+NOUN+CASE_DEF_GEN ar-ras¯a֓il-i the-messages
è Q
®Ë A---FS2D DET+ADJ+NSUFF_FEM_SG+
+CASE_DEF_GEN al-qas.¯ır-at-i the-short
Morphological Theory Incremental vs. Realizational
Underdetermination
The morphosyntactic properties associated with an inflected word’s individualinflectional markings mayunderdetermine the properties associated with the word as awhole. (Stump 2001:7)
K
Q£ N---FS2R NOUN+CASE_DEF_GEN t.ar¯ıq-i way-of
É
KAQË N---FS2D DET+NOUN+CASE_DEF_GEN ar-ras¯a֓il-i the-messages
è Q
®Ë A---FS2D DET+ADJ+NSUFF_FEM_SG+
+CASE_DEF_GEN al-qas.¯ır-at-i the-short
Underdetermination
The morphosyntactic properties associated with an inflected word’s individualinflectional markings mayunderdetermine the properties associated with the word as awhole. (Stump 2001:7)
K
Q£ N---2R NOUN+CASE_DEF_GEN t.ar¯ıq-i way-of
É
KAQË N---FP2D DET+NOUN+CASE_DEF_GEN ar-ras¯a֓il-i the-messages
è Q
®Ë A---FS2D DET+ADJ+NSUFF_FEM_SG+
+CASE_DEF_GEN al-qas.¯ır-at-i the-short
Morphological Theory Lexical vs. Inferential
Nonconcatenative Inflection
There is no theoretically significant difference between
concatenative andnonconcatenative inflection. (Stump 2001:9)
Nonconcatenative Inflection
There is no theoretically significant difference between
concatenative andnonconcatenative inflection. (Stump 2001:9)
Q
. g
֓ah
˘bar-a to notify
Ñë Q
. jJ
F--- FUT sa- will
VIIA-3MS-- IV3MS+IV+IVSUFF_MOOD:I yu-h
˘bir-u he-notify S----3MP4- IVSUFF_DO:3MS -hum them
éËAP ris¯al-at-un a message
É
KAQË N---2D DET+NOUN+CASE_DEF_GEN ar-ras¯a֓il-i the-messages
Morphological Theory Lexical vs. Inferential
Nonconcatenative Inflection
There is no theoretically significant difference between
concatenative andnonconcatenative inflection. (Stump 2001:9)
Q
. g
֓ah
˘bar-a to notify
Ñë Q
. jJ
F--- FUT sa- will
VIIA-3MS-- IV3MS+IV+IVSUFF_MOOD:I yu-h
˘bir-u he-notify S----3MP4- IVSUFF_DO:3MS -hum them
éËAP ris¯al-at-un a message
É
KAQË N---2D DET+NOUN+CASE_DEF_GEN ar-ras¯a֓il-i the-messages
Unmotivated Choice
Exponence isthe only association between inflectional markings and morphosyntactic properties. (Stump 2001:11)
IV3MS+IV+IVSUFF_MOOD:I ?? IV3MS+IV+IVSUFF_MOOD:I yu-h
˘bir-u
Morphological Theory Lexical vs. Inferential
Unmotivated Choice
Exponence isthe only association between inflectional markings and morphosyntactic properties. (Stump 2001:11)
IV3MS+IV+IVSUFF_MOOD:I ?? IV3MS+IV+IVSUFF_MOOD:I yu-h
˘bir-u
An uncompounded word’s morphologicalform isnot distinct from its phonologicalform. (Stump 2001:12)
DET+ADJ+NSUFF_FEM_SG+CASE_DEF_GEN (al-(qas.¯ır-at))-i ?? ((al-qas.¯ır)-at)-i
Functional Arabic Morphology
Most computational models of Arabic morphology arelexical in nature. As they are not designed in connection with anysyntax–morphology interface, their interpretation is destined to be incremental.
Morphological Theory Functional Arabic Morphology
Functional Arabic Morphology
Most computational models of Arabic morphology arelexical in nature. As they are not designed in connection with anysyntax–morphology interface, their interpretation is destined to be incremental.
Functional Arabic Morphology endorses theinferential–realizationalviews.
Functional Arabic Morphology
Most computational models of Arabic morphology arelexical in nature. As they are not designed in connection with anysyntax–morphology interface, their interpretation is destined to be incremental.
Functional Arabic Morphology endorses theinferential–realizationalviews.
It re-establishes the system of inflectional and inherent morphosyntactic properties and distinguishes precisely thesensesof their use in the grammar.
Morphological Theory Functional Arabic Morphology
Functional Arabic Morphology
Most computational models of Arabic morphology arelexical in nature. As they are not designed in connection with anysyntax–morphology interface, their interpretation is destined to be incremental.
Functional Arabic Morphology endorses theinferential–realizationalviews.
It re-establishes the system of inflectional and inherent morphosyntactic properties and distinguishes precisely thesensesof their use in the grammar.
Definition of lexemes can include the derivational root and pattern infor- mation if appropriate. Modeling of the written language as well asspoken dialects is expected to be methodologically identical.
Outline
1 Introduction
2 Morphological Theory
Incremental vs. Realizational Lexical vs. Inferential
Functional Arabic Morphology
3 Implementation Design ElixirFM
Paradigms, parameters, . . . Elixir Lexicon
FM Generic
4 Extensions
Encode Arabic MorphoTrees
Implementation Design ElixirFM
ElixirFM
ElixirFM is a high-level implementation ofFunctional Arabic Morphology.
ElixirFM
ElixirFM is a high-level implementation ofFunctional Arabic Morphology.
ElixirFM uses the Functional Morphology library forHaskell and extends it.
Implementation Design ElixirFM
ElixirFM
ElixirFM is a high-level implementation ofFunctional Arabic Morphology.
ElixirFM uses the Functional Morphology library forHaskell and extends it.
Morphology ismodeledin terms ofparadigms, grammatical categories,lex- emes and word classes. The computation of analysis or generation is con- ceptuallydistinguished from thegeneral-purpose linguisticmodel.
ElixirFM
ElixirFM is a high-level implementation ofFunctional Arabic Morphology.
ElixirFM uses the Functional Morphology library forHaskell and extends it.
Morphology ismodeledin terms ofparadigms, grammatical categories,lex- emes and word classes. The computation of analysis or generation is con- ceptuallydistinguished from thegeneral-purpose linguisticmodel.
The lexicon of ElixirFM is derived from the open-sourceBuckwalter lexicon and from the PADT annotations. It isredesigned in important respects.
Implementation Design Paradigms, parameters, . . .
Paradigms
Aparadigm function is a function which, when applied to the root of a lexeme L paired with aset of morphosyntactic
propertiesappropriate to L, determines the word form occupying the corresponding cell in L’s paradigm. (Stump 2001:32)
Paradigms
Aparadigm function is a function which, when applied to the root of a lexeme L paired with aset of morphosyntactic
propertiesappropriate to L, determines the word form occupying the corresponding cell in L’s paradigm. (Stump 2001:32) paradigm :: (Lexeme, Properties) -> WordForm
paradigm (l, ps) = ...
Implementation Design Paradigms, parameters, . . .
Paradigms
Aparadigm function is a function which, when applied to the root of a lexeme L paired with aset of morphosyntactic
propertiesappropriate to L, determines the word form occupying the corresponding cell in L’s paradigm. (Stump 2001:32)
paradigm :: (Lexeme, Properties) -> WordForm paradigm (l, ps) = ...
paradigm’ :: Lexeme -> Properties -> WordForm paradigm’ l ps = paradigm (l, ps)
paradigm’ l ps = (curry paradigm) l ps paradigm’ = curry paradigm
curry :: ((a, b) -> c) -> a -> b -> c
Paradigms
Aparadigm function is a function which, when applied to the root of a lexeme L paired with aset of morphosyntactic
propertiesappropriate to L, determines the word form occupying the corresponding cell in L’s paradigm. (Stump 2001:32)
paradigm :: (Lexeme, Properties) -> WordForm paradigm (l, ps) = ...
paradigm’ :: Lexeme -> Properties -> WordForm paradigm’ l ps = paradigm (l, ps)
paradigm’ l ps = (curry paradigm) l ps paradigm’ = curry paradigm
curry :: ((a, b) -> c) -> a -> b -> c
Implementation Design Paradigms, parameters, . . .
Paradigms
Aparadigm function is a function which, when applied to the root of a lexeme L paired with aset of morphosyntactic
propertiesappropriate to L, determines the word form occupying the corresponding cell in L’s paradigm. (Stump 2001:32)
paradigm :: (Lexeme, Properties) -> WordForm paradigm (l, ps) = ...
paradigm’ :: Lexeme -> Properties -> WordForm paradigm’ l ps = paradigm (l, ps)
paradigm’ l ps = (curry paradigm) l ps paradigm’ = curry paradigm
curry :: ((a, b) -> c) -> a -> b -> c
Parameters
Instead of feature–value pairs for encoding themorphosyntactic properties (Stump 2001), we useenumerated valuesof distincttypes. The use ofdata types is essential in the system (Forsberg and Ranta 2004).
Implementation Design Paradigms, parameters, . . .
Parameters
Instead of feature–value pairs for encoding themorphosyntactic properties (Stump 2001), we useenumerated valuesof distincttypes. The use ofdata types is essential in the system (Forsberg and Ranta 2004).
data Person = First | Second | Third deriving (Eq, Enum)
Parameters
Instead of feature–value pairs for encoding themorphosyntactic properties (Stump 2001), we useenumerated valuesof distincttypes. The use ofdata types is essential in the system (Forsberg and Ranta 2004).
data Person = First | Second | Third deriving (Eq, Enum)
data Mood = Indicative | Subjunctive
| Jussive | Energetic deriving (Eq, Show, Enum)
Implementation Design Paradigms, parameters, . . .
Parameters
Instead of feature–value pairs for encoding themorphosyntactic properties (Stump 2001), we useenumerated valuesof distincttypes. The use ofdata types is essential in the system (Forsberg and Ranta 2004).
data Person = First | Second | Third deriving (Eq, Enum)
data Mood = Indicative | Subjunctive
| Jussive | Energetic deriving (Eq, Show, Enum)
data ParaVerb = VerbP Voice Person Gender Number
| VerbI Mood Voice Person Gender Number
| VerbC Gender Number
Elixir Lexicon
(a) representation of the linguistic data in an abstract and extensible notation that encodes bothorthography andphonology, and whose interpretation is customizable
Implementation Design Elixir Lexicon
Elixir Lexicon
(a) representation of the linguistic data in an abstract and extensible notation that encodes bothorthography andphonology, and whose interpretation is customizable
(b) organization of the lexicon so that there is preferablyno duplication of information and so that the lexicon can possibly be divided into separate units, as well as be interlinkedwith externalmodules
Elixir Lexicon
(a) representation of the linguistic data in an abstract and extensible notation that encodes bothorthography andphonology, and whose interpretation is customizable
(b) organization of the lexicon so that there is preferablyno duplication of information and so that the lexicon can possibly be divided into separate units, as well as be interlinkedwith externalmodules (c) definition of such format of the lexiconso that editing and
understanding the data is not inappropriately difficult, and using such data markupwhose syntax is either lightweight, or can be
edited/verified with some automatic tools, or both
Implementation Design FM Generic
FM Generic
The linguistic model and the data of the lexicon can be compiled into run- time applications or used asstandalone libraries and resources.
FM Generic
The linguistic model and the data of the lexicon can be compiled into run- time applications or used asstandalone libraries and resources.
FM Generic implements the compilation of morphological analyzers and generators (Forsberg and Ranta 2004). The method used for analysis is deterministic parsing with tries(Ljungl¨of 2002).
Implementation Design FM Generic
FM Generic
The linguistic model and the data of the lexicon can be compiled into run- time applications or used asstandalone libraries and resources.
FM Generic implements the compilation of morphological analyzers and generators (Forsberg and Ranta 2004). The method used for analysis is deterministic parsing with tries(Ljungl¨of 2002).
FM Generic also provides functions for exporting and pretty-printing the linguistic model into XFST, Lexc, SQL, XML, LATEX, . . .
Outline
1 Introduction
2 Morphological Theory
Incremental vs. Realizational Lexical vs. Inferential
Functional Arabic Morphology
3 Implementation Design ElixirFM
Paradigms, parameters, . . . Elixir Lexicon
FM Generic
4 Extensions
Encode Arabic MorphoTrees
Extensions Encode Arabic
Buckwalter Transliteration
Ñî
D
Ê
«
ð
Q
Ö
Þ
ð C
®
« ñ
J
. ë
ð
Y
¯
ð .
ñ
®
m
Ì
'
ð
é
Ó
Q
º
Ë
ú
¯
áK
ð
A
Ó
P
Q
k
A
JË
©J
Ô
g
.
Y
Ëñ
K
.Z
A
gB
h
ð
QK
.
A
ª
K
.
Ñ
îD
ª
K
.
ÉÓ
A
ª
K
à
yuwladu jamiyEu {ln~aAsi OaHoraArFA mutasaAwiyna fiy {lokaraAmapi wa {loHuquwqi. waqado wuhibuwA EaqolAF waDamiyrFA waEalayohimo Oano yuEaAmila baEoDuhumo baEoDFA biruwHi {loIixaA’i.
ÑîD
Ê«ð
Q
ÖÞ
ð C
®« ñJ
. ëð Y
¯ð .
ñ
®m Ì
'ð
éÓQºË ú
¯ áK
ðA
Ó PQk
A
JË ©J
Ô
g
. YËñK
.ZAgB
hðQK
. AªK
. ÑîDªK
. ÉÓAªK
à
ywld jmyE AlnAs OHrArA mtsAwyn fy AlkrAmp wAlHqwq. wqd
Notation of ArabTEX
Ñî
D
Ê
«
ð
Q
Ö
Þ
ð C
®
« ñ
J
. ë
ð
Y
¯
ð .
ñ
®
m
Ì
'
ð
é
Ó
Q
º
Ë
ú
¯
áK
ð
A
Ó
P
Q
k
A
JË
©J
Ô
g
.
Y
Ëñ
K
.Z
A
gB
h
ð
QK
.
A
ª
K
.
Ñ
îD
ª
K
.
ÉÓ
A
ª
K
à
ÑîD
Ê«ð
Q
ÖÞ
ð C
®« ñJ
. ëð Y
¯ð .
ñ
®m Ì
'ð
éÓQºË ú
¯ áK
ðA
Ó PQk
A
JË ©J
Ô
g
. YËñK
.ZAgB
hðQK
. AªK
. ÑîDªK
. ÉÓAªK
à
Y¯uladu ˇgam¯ı֒u ’n-n¯asi ֓ah.r¯aran mutas¯aw¯ına f¯ı ’l-kar¯amati wa-’l-h.uq¯uqi. Wa-qad wuhib¯u ֒aqlan wa-d.am¯ıran wa-֒alayhim ֓an yu֒¯amila ba֒d.uhum ba֒d.an bi-r¯uh.i ’l-
֓ih˘¯a֓i.
\cap yUladu ^gamI‘u an-nAsi ’a.hrAraN mutasAwIna fI al-karAmaTi wa-al-.huqUqi.
Extensions Encode Arabic
Encode Arabic
biruwHi {loIixaA’i ← Z A
gB
h
ð
QK
.
← bi-rU.hi al-’i_hA’i
Implemented in Perland available on CPAN as Encode-Arabic:
$encoded = encode "buckwalter", decode "arabtex", $decoded
$encoded = encode("buckwalter", decode("arabtex", $decoded)) Implemented in Haskell and available along withElixirFM:
encoded = encode Buckwalter $ decode ArabTeX decoded encoded = encode Buckwalter (decode ArabTeX decoded) encoded = (encode Buckwalter . decode ArabTeX) decoded
Encode Arabic
biruwHi {loIixaA’i ← Z A
gB
h
ð
QK
.
← bi-rU.hi al-’i_hA’i Implemented in Perland available on CPAN as Encode-Arabic:
$encoded = encode "buckwalter", decode "arabtex", $decoded
$encoded = encode("buckwalter", decode("arabtex", $decoded)) Implemented in Haskell and available along withElixirFM:
encoded = encode Buckwalter $ decode ArabTeX decoded encoded = encode Buckwalter (decode ArabTeX decoded) encoded = (encode Buckwalter . decode ArabTeX) decoded
Extensions Encode Arabic
Encode Arabic
biruwHi {loIixaA’i ← Z A
gB
h
ð
QK
.
← bi-rU.hi al-’i_hA’i Implemented in Perland available on CPAN as Encode-Arabic:
$encoded = encode "buckwalter", decode "arabtex", $decoded
$encoded = encode("buckwalter", decode("arabtex", $decoded)) Implemented in Haskell and available along withElixirFM:
encoded = encode Buckwalter $ decode ArabTeX decoded encoded = encode Buckwalter (decode ArabTeX decoded) encoded = (encode Buckwalter . decode ArabTeX) decoded
Encode Arabic
biruwHi {loIixaA’i ← Z A
gB
h
ð
QK
.
← bi-rU.hi al-’i_hA’i Implemented in Perland available on CPAN as Encode-Arabic:
$encoded = encode "buckwalter", decode "arabtex", $decoded
$encoded = encode("buckwalter", decode("arabtex", $decoded)) Implemented in Haskell and available along withElixirFM:
encoded = encode Buckwalter $ decode ArabTeX decoded encoded = encode Buckwalter (decode ArabTeX decoded) encoded = (encode Buckwalter . decode ArabTeX) decoded
Extensions Encode Arabic
Encode Arabic
biruwHi {loIixaA’i ← Z A
gB
h
ð
QK
.
← bi-rU.hi al-’i_hA’i Implemented in Perland available on CPAN as Encode-Arabic:
$encoded = encode "buckwalter", decode "arabtex", $decoded
$encoded = encode("buckwalter", decode("arabtex", $decoded)) Implemented in Haskell and available along withElixirFM:
encoded = encode Buckwalter $ decode ArabTeX decoded encoded = encode Buckwalter (decode ArabTeX decoded) encoded = (encode Buckwalter . decode ArabTeX) decoded
Morphology Disambiguation
Arabic is a language of rich morphology, both derivational and inflectional, with highly ambiguousorthography.
Extensions MorphoTrees
Morphology Disambiguation
Arabic is a language of rich morphology, both derivational and inflectional, with highly ambiguousorthography.
Boundaries of syntactic units,tokens, are obscure in writing—orthographical words, strings, consist of up to four lexemes.
Morphology Disambiguation
Arabic is a language of rich morphology, both derivational and inflectional, with highly ambiguousorthography.
Boundaries of syntactic units,tokens, are obscure in writing—orthographical words, strings, consist of up to four lexemes.
Disambiguation encompasses subproblems like tokenization
Extensions MorphoTrees
Morphology Disambiguation
Arabic is a language of rich morphology, both derivational and inflectional, with highly ambiguousorthography.
Boundaries of syntactic units,tokens, are obscure in writing—orthographical words, strings, consist of up to four lexemes.
Disambiguation encompasses subproblems like tokenization, full morpho- logical tagging or its simplified ‘part-of-speech’ versions
Morphology Disambiguation
Arabic is a language of rich morphology, both derivational and inflectional, with highly ambiguousorthography.
Boundaries of syntactic units,tokens, are obscure in writing—orthographical words, strings, consist of up to four lexemes.
Disambiguation encompasses subproblems like tokenization, full morpho- logical tagging or its simplified ‘part-of-speech’ versions, lemmatization
Extensions MorphoTrees
Morphology Disambiguation
Arabic is a language of rich morphology, both derivational and inflectional, with highly ambiguousorthography.
Boundaries of syntactic units,tokens, are obscure in writing—orthographical words, strings, consist of up to four lexemes.
Disambiguation encompasses subproblems like tokenization, full morpho- logical tagging or its simplified ‘part-of-speech’ versions, lemmatization, diacritization or restoration of the structural components of words
Morphology Disambiguation
Arabic is a language of rich morphology, both derivational and inflectional, with highly ambiguousorthography.
Boundaries of syntactic units,tokens, are obscure in writing—orthographical words, strings, consist of up to four lexemes.
Disambiguation encompasses subproblems like tokenization, full morpho- logical tagging or its simplified ‘part-of-speech’ versions, lemmatization, diacritization or restoration of the structural components of words, plus combinations thereof.
Extensions MorphoTrees
Linear Lists
Suppose you can list morphological analysesfor a given input string. . .
Morphs Form Token Tag Lemma Glosses per Morph
|laY+(null) ֓¯al¯a VP-A-3MS-- ֓¯al¯a promise/take an oath + he/it
|liy~ ֓¯al¯ıy A--- ֓¯al¯ıy mechanical/automatic
|liy~+u ֓¯al¯ıy-u A---1R ֓¯al¯ıy mechanical . . . + [def.nom.]
|liy~+i ֓¯al¯ıy-i A---2R ֓¯al¯ıy mechanical . . . + [def.gen.]
|liy~+a ֓¯al¯ıy-a A---4R ֓¯al¯ıy mechanical . . . + [def.acc.]
|liy~+N ֓¯al¯ıy-un A---1I ֓¯al¯ıy mechanical . . . + [indef.nom.]
|liy~+K ֓¯al¯ıy-in A---2I ֓¯al¯ıy mechanical . . . + [indef.gen.]
|l+ ֓¯al N---R ֓¯al family/clan+
+iy -¯ı S----1-S2- ¯ı +my
IilaY ֓il¯a P--- ֓il¯a to/towards Iilay+ ֓ilay P--- ֓il¯a to/towards+
+ya -ya S----1-S2- ya +me
MorphoTrees
. . . organize the analyses into a hierarchy with the string as its root
AlYúÍ
|lYúÍ
Æ
|lYúÍ
Æ
ú
Í Æ
֓¯al¯a
|lyú
Í
Æ
|lyú
Í
Æ
ú
Í
Æ
֓¯al¯ıy
|l yø
È
Æ
|lÈ
Æ
È Æ
֓¯al
yø
ø
'
¯ı
IlYúÍ
IlYúÍ
ú
Í
֓il¯a
Ily yø
ú
Í
Ilyú
Í
ú
Í
֓il¯a
yø
ø
ya
Olyú
Í
Olyú
Í
ú
Í
ðwaliya
Extensions MorphoTrees
MorphoTrees
. . . organize the analyses into a hierarchy with the string as its root and thefull tokensas theleaves
AlYúÍ
|lYúÍ
Æ
|lYúÍ
Æ
ú
Í Æ
֓¯al¯a
|lyú
Í
Æ
|lyú
Í
Æ
ú
Í
Æ
֓¯al¯ıy
|l yø
È
Æ
|lÈ
Æ
È Æ
֓¯al
yø
ø
'
¯ı
IlYúÍ
IlYúÍ
ú
Í
֓il¯a
Ily yø
ú
Í
Ilyú
Í
ú
Í
֓il¯a
yø
ø
ya
Olyú
Í
Olyú
Í
ú
Í
ðwaliya
MorphoTrees
. . . organize the analyses into a hierarchy with the string as its root and thefull tokensas theleaves, grouped by theirlemmas
AlYúÍ
|lYúÍ
Æ
|lYúÍ
Æ
ú
Í Æ
֓¯al¯a
|lyú
Í
Æ
|lyú
Í
Æ
ú
Í
Æ
֓¯al¯ıy
|l yø
È
Æ
|lÈ
Æ
È Æ
֓¯al
yø
ø
'
¯ı
IlYúÍ
IlYúÍ
ú
Í
֓il¯a
Ily yø
ú
Í
Ilyú
Í
ú
Í
֓il¯a
yø
ø
ya
Olyú
Í
Olyú
Í
ú
Í
ðwaliya