• Nebyly nalezeny žádné výsledky

Functional Arabic Morphology Principles of Design

N/A
N/A
Protected

Academic year: 2022

Podíl "Functional Arabic Morphology Principles of Design"

Copied!
76
0
0

Načítání.... (zobrazit plný text nyní)

Fulltext

(1)

Functional Arabic Morphology

Principles of Design

Otakar Smrˇz

Institute of Formal and Applied Linguistics Faculty of Mathematics and Physics

Charles University in Prague

Prague, November 6, 2006

(2)

Introduction

He will notify them about that through SMS messages, the Internet, and other means. .AëQ

«

ð

I

K

Q

KB

ð

è

Q

’

®Ë É

K

A

ƒ

‡

K

Q

£ á

«

½Ë

YK

.

Ñ

ë

Q

.

j

J

ƒ

(3)

He will notify them about that through SMS messages, the Internet, and other means. .AëQ

«

ð

I

K

Q

KB

ð

è

Q

’

®Ë É

K

A

ƒ

‡

K

Q

£ á

«

½Ë

YK

.

Ñ

ë

Q

.

j

J

ƒ

String Token Token Tag Buckwalter’s M-Tags Token Form Token Gloss

Ñë Q

. jJ

ƒ

F---FUT sa- will

VIIA-3MS--IV3MS+IV+IVSUFF_MOOD:I yu-h

˘bir-u he-notify S----3MP4-IVSUFF_DO:3MP -hum them

½ËYK

.

P---PREP bi- about/by

SD----MS--DEM_PRON_MS d

¯¯alika that

á« P---PREP ֒an by/about

‡K

N---2RNOUN+CASE_DEF_GEN t.ar¯ıq-i way-of

É

KAƒQË N---2DDET+NOUN+CASE_DEF_GEN ar-ras¯a֓il-i the-messages

è Q

’

®Ë A---FS2DDET+ADJ+NSUFF_FEM_SG+

+CASE_DEF_GEN al-qas.¯ır-at-i the-short

IK Q

KB

ð

C---CONJ wa- and

Z---2DDET+NOUN_PROP+

+CASE_DEF_GEN al-֓internet-i the-internet

C---CONJ wa- and

(4)

Introduction

He will notify them about that through SMS messages, the Internet, and other means. .AëQ

«

ð

I

K

Q

KB

ð

è

Q

’

®Ë É

K

A

ƒ

‡

K

Q

£ á

«

½Ë

YK

.

Ñ

ë

Q

.

j

J

ƒ

String Token Token Tag Buckwalter’s M-Tags Token Form Token Gloss

Ñë Q

. jJ

ƒ

F---FUT sa- will

VIIA-3MS--IV3MS+IV+IVSUFF_MOOD:I yu-h

˘bir-u he-notify S----3MP4-IVSUFF_DO:3MP -hum them

½ËYK

.

P---PREP bi- about/by

SD----MS--DEM_PRON_MS d

¯¯alika that

á« P---PREP ֒an by/about

‡K

N---2RNOUN+CASE_DEF_GEN t.ar¯ıq-i way-of

É

KAƒQË N---2DDET+NOUN+CASE_DEF_GEN ar-ras¯a֓il-i the-messages

è Q

’

®Ë A---FS2DDET+ADJ+NSUFF_FEM_SG+

+CASE_DEF_GEN al-qas.¯ır-at-i the-short

IK Q

KB

ð

C---CONJ wa- and

Z---2DDET+NOUN_PROP+

+CASE_DEF_GEN al-֓internet-i the-internet

C---CONJ wa- and

(5)

He will notify them about that through SMS messages, the Internet, and other means. .AëQ

«

ð

I

K

Q

KB

ð

è

Q

’

®Ë É

K

A

ƒ

‡

K

Q

£ á

«

½Ë

YK

.

Ñ

ë

Q

.

j

J

ƒ

String Token Token Tag Buckwalter’s M-Tags Token Form Token Gloss

Ñë Q

. jJ

ƒ

F---FUT sa- will

VIIA-3MS--IV3MS+IV+IVSUFF_MOOD:I yu-h

˘bir-u he-notify S----3MP4-IVSUFF_DO:3MP -hum them

½ËYK

.

P---PREP bi- about/by

SD----MS--DEM_PRON_MS d

¯¯alika that

á« P---PREP ֒an by/about

‡K

N---2RNOUN+CASE_DEF_GEN t.ar¯ıq-i way-of

É

KAƒQË N---2DDET+NOUN+CASE_DEF_GEN ar-ras¯a֓il-i the-messages

è Q

’

®Ë A---FS2DDET+ADJ+NSUFF_FEM_SG+

+CASE_DEF_GEN al-qas.¯ır-at-i the-short

IK Q

KB

ð

C---CONJ wa- and

Z---2DDET+NOUN_PROP+

+CASE_DEF_GEN al-֓internet-i the-internet

C---CONJ wa- and

(6)

Introduction

Outline

1 Introduction

(7)

Outline

1 Introduction

2 Morphological Theory

Incremental vs. Realizational Lexical vs. Inferential

Functional Arabic Morphology

(8)

Introduction

Outline

1 Introduction

2 Morphological Theory

Incremental vs. Realizational Lexical vs. Inferential

Functional Arabic Morphology

3 Implementation Design ElixirFM

Paradigms, parameters, . . . Elixir Lexicon

FM Generic

(9)

Outline

1 Introduction

2 Morphological Theory

Incremental vs. Realizational Lexical vs. Inferential

Functional Arabic Morphology

3 Implementation Design ElixirFM

Paradigms, parameters, . . . Elixir Lexicon

FM Generic

4 Extensions

Encode Arabic MorphoTrees

(10)

Introduction

Outline

1 Introduction

2 Morphological Theory

Incremental vs. Realizational Lexical vs. Inferential

Functional Arabic Morphology

3 Implementation Design ElixirFM

Paradigms, parameters, . . . Elixir Lexicon

FM Generic

4 Extensions

Encode Arabic MorphoTrees

(11)

Outline

1 Introduction

2 Morphological Theory

Incremental vs. Realizational Lexical vs. Inferential

Functional Arabic Morphology

3 Implementation Design ElixirFM

Paradigms, parameters, . . . Elixir Lexicon

FM Generic

4 Extensions

Encode Arabic MorphoTrees

(12)

Morphological Theory

Inflectional Morphology

Morphological theories can be classified along two dimensions (Stump 2001).

lexical association of word’s morphosyntactic propertieswith affixes

(13)

Inflectional Morphology

Morphological theories can be classified along two dimensions (Stump 2001).

lexical association of word’s morphosyntactic propertieswith affixes inferential inflection is a result ofoperationsonlexemes;

morphosyntactic properties are expressed by the rulesthat relate the form in a givenparadigm to the lexeme

(14)

Morphological Theory

Inflectional Morphology

Morphological theories can be classified along two dimensions (Stump 2001).

lexical association of word’s morphosyntactic propertieswith affixes inferential inflection is a result ofoperationsonlexemes;

morphosyntactic properties are expressed by the rulesthat relate the form in a givenparadigm to the lexeme

incremental words acquiremorphosyntactic properties only in connection with acquiring the inflectional exponentsof those properties

(15)

Inflectional Morphology

Morphological theories can be classified along two dimensions (Stump 2001).

lexical association of word’s morphosyntactic propertieswith affixes inferential inflection is a result ofoperationsonlexemes;

morphosyntactic properties are expressed by the rulesthat relate the form in a givenparadigm to the lexeme

incremental words acquiremorphosyntactic properties only in connection with acquiring the inflectional exponentsof those properties realizational association of aset of properties with a wordlicensesthe

introduction of the exponents into the word’s morphology

(16)

Morphological Theory

Inflectional Morphology

Morphological theories can be classified along two dimensions (Stump 2001).

lexical association of word’s morphosyntactic propertieswith affixes inferential inflection is a result ofoperationsonlexemes;

morphosyntactic properties are expressed by the rulesthat relate the form in a givenparadigm to the lexeme

incremental words acquiremorphosyntactic properties only in connection with acquiring the inflectional exponentsof those properties realizational association of aset of properties with a wordlicensesthe

introduction of the exponents into the word’s morphology

(17)

Extended Exponence

The morphosyntactic properties associated with an inflected word may exhibitextended exponencein that word’s morphology.

(Stump 2001:4)

(18)

Morphological Theory Incremental vs. Realizational

Extended Exponence

The morphosyntactic properties associated with an inflected word may exhibitextended exponencein that word’s morphology.

(Stump 2001:4)

Ñë Q

. jJ

ƒ

F--- FUT sa- will

VIIA-3MS-- IV3MS+IV+IVSUFF_MOOD:Iyu-h

˘bir-u he-notify S----3MP4- IVSUFF_DO:3MP -hum them

(19)

Extended Exponence

The morphosyntactic properties associated with an inflected word may exhibitextended exponencein that word’s morphology.

(Stump 2001:4)

Ñë Q

. jJ

ƒ

F--- FUT sa- will

VIIA-3MS-- IV3MS+IV+IVSUFF_MOOD:I yu-h

˘bir-u he-notify S----3MP4- IVSUFF_DO:3MP -hum them

(20)

Morphological Theory Incremental vs. Realizational

Underdetermination

The morphosyntactic properties associated with an inflected word’s individualinflectional markings mayunderdetermine the properties associated with the word as awhole. (Stump 2001:7)

(21)

Underdetermination

The morphosyntactic properties associated with an inflected word’s individualinflectional markings mayunderdetermine the properties associated with the word as awhole. (Stump 2001:7)

‡K

N---2R NOUN+CASE_DEF_GEN t.ar¯ıq-i way-of

É

KAƒQË N---2D DET+NOUN+CASE_DEF_GEN ar-ras¯a֓il-i the-messages

è Q

’

®Ë A---FS2D DET+ADJ+NSUFF_FEM_SG+

+CASE_DEF_GEN al-qas.¯ır-at-i the-short

(22)

Morphological Theory Incremental vs. Realizational

Underdetermination

The morphosyntactic properties associated with an inflected word’s individualinflectional markings mayunderdetermine the properties associated with the word as awhole. (Stump 2001:7)

‡K

N---FS2R NOUN+CASE_DEF_GEN t.ar¯ıq-i way-of

É

KAƒQË N---FS2D DET+NOUN+CASE_DEF_GEN ar-ras¯a֓il-i the-messages

è Q

’

®Ë A---FS2D DET+ADJ+NSUFF_FEM_SG+

+CASE_DEF_GEN al-qas.¯ır-at-i the-short

(23)

Underdetermination

The morphosyntactic properties associated with an inflected word’s individualinflectional markings mayunderdetermine the properties associated with the word as awhole. (Stump 2001:7)

‡K

N---2R NOUN+CASE_DEF_GEN t.ar¯ıq-i way-of

É

KAƒQË N---FP2D DET+NOUN+CASE_DEF_GEN ar-ras¯a֓il-i the-messages

è Q

’

®Ë A---FS2D DET+ADJ+NSUFF_FEM_SG+

+CASE_DEF_GEN al-qas.¯ır-at-i the-short

(24)

Morphological Theory Lexical vs. Inferential

Nonconcatenative Inflection

There is no theoretically significant difference between

concatenative andnonconcatenative inflection. (Stump 2001:9)

(25)

Nonconcatenative Inflection

There is no theoretically significant difference between

concatenative andnonconcatenative inflection. (Stump 2001:9)

Q

. g

֓ah

˘bar-a to notify

Ñë Q

. jJ

ƒ

F--- FUT sa- will

VIIA-3MS-- IV3MS+IV+IVSUFF_MOOD:I yu-h

˘bir-u he-notify S----3MP4- IVSUFF_DO:3MS -hum them

éËAƒP ris¯al-at-un a message

É

KAƒQË N---2D DET+NOUN+CASE_DEF_GEN ar-ras¯a֓il-i the-messages

(26)

Morphological Theory Lexical vs. Inferential

Nonconcatenative Inflection

There is no theoretically significant difference between

concatenative andnonconcatenative inflection. (Stump 2001:9)

Q

. g

֓ah

˘bar-a to notify

Ñë Q

. jJ

ƒ

F--- FUT sa- will

VIIA-3MS-- IV3MS+IV+IVSUFF_MOOD:I yu-h

˘bir-u he-notify S----3MP4- IVSUFF_DO:3MS -hum them

éËAƒP ris¯al-at-un a message

É

KAƒQË N---2D DET+NOUN+CASE_DEF_GEN ar-ras¯a֓il-i the-messages

(27)

Unmotivated Choice

Exponence isthe only association between inflectional markings and morphosyntactic properties. (Stump 2001:11)

IV3MS+IV+IVSUFF_MOOD:I ?? IV3MS+IV+IVSUFF_MOOD:I yu-h

˘bir-u

(28)

Morphological Theory Lexical vs. Inferential

Unmotivated Choice

Exponence isthe only association between inflectional markings and morphosyntactic properties. (Stump 2001:11)

IV3MS+IV+IVSUFF_MOOD:I ?? IV3MS+IV+IVSUFF_MOOD:I yu-h

˘bir-u

An uncompounded word’s morphologicalform isnot distinct from its phonologicalform. (Stump 2001:12)

DET+ADJ+NSUFF_FEM_SG+CASE_DEF_GEN (al-(qas.¯ır-at))-i ?? ((al-qas.¯ır)-at)-i

(29)

Functional Arabic Morphology

Most computational models of Arabic morphology arelexical in nature. As they are not designed in connection with anysyntax–morphology interface, their interpretation is destined to be incremental.

(30)

Morphological Theory Functional Arabic Morphology

Functional Arabic Morphology

Most computational models of Arabic morphology arelexical in nature. As they are not designed in connection with anysyntax–morphology interface, their interpretation is destined to be incremental.

Functional Arabic Morphology endorses theinferential–realizationalviews.

(31)

Functional Arabic Morphology

Most computational models of Arabic morphology arelexical in nature. As they are not designed in connection with anysyntax–morphology interface, their interpretation is destined to be incremental.

Functional Arabic Morphology endorses theinferential–realizationalviews.

It re-establishes the system of inflectional and inherent morphosyntactic properties and distinguishes precisely thesensesof their use in the grammar.

(32)

Morphological Theory Functional Arabic Morphology

Functional Arabic Morphology

Most computational models of Arabic morphology arelexical in nature. As they are not designed in connection with anysyntax–morphology interface, their interpretation is destined to be incremental.

Functional Arabic Morphology endorses theinferential–realizationalviews.

It re-establishes the system of inflectional and inherent morphosyntactic properties and distinguishes precisely thesensesof their use in the grammar.

Definition of lexemes can include the derivational root and pattern infor- mation if appropriate. Modeling of the written language as well asspoken dialects is expected to be methodologically identical.

(33)

Outline

1 Introduction

2 Morphological Theory

Incremental vs. Realizational Lexical vs. Inferential

Functional Arabic Morphology

3 Implementation Design ElixirFM

Paradigms, parameters, . . . Elixir Lexicon

FM Generic

4 Extensions

Encode Arabic MorphoTrees

(34)

Implementation Design ElixirFM

ElixirFM

ElixirFM is a high-level implementation ofFunctional Arabic Morphology.

(35)

ElixirFM

ElixirFM is a high-level implementation ofFunctional Arabic Morphology.

ElixirFM uses the Functional Morphology library forHaskell and extends it.

(36)

Implementation Design ElixirFM

ElixirFM

ElixirFM is a high-level implementation ofFunctional Arabic Morphology.

ElixirFM uses the Functional Morphology library forHaskell and extends it.

Morphology ismodeledin terms ofparadigms, grammatical categories,lex- emes and word classes. The computation of analysis or generation is con- ceptuallydistinguished from thegeneral-purpose linguisticmodel.

(37)

ElixirFM

ElixirFM is a high-level implementation ofFunctional Arabic Morphology.

ElixirFM uses the Functional Morphology library forHaskell and extends it.

Morphology ismodeledin terms ofparadigms, grammatical categories,lex- emes and word classes. The computation of analysis or generation is con- ceptuallydistinguished from thegeneral-purpose linguisticmodel.

The lexicon of ElixirFM is derived from the open-sourceBuckwalter lexicon and from the PADT annotations. It isredesigned in important respects.

(38)

Implementation Design Paradigms, parameters, . . .

Paradigms

Aparadigm function is a function which, when applied to the root of a lexeme L paired with aset of morphosyntactic

propertiesappropriate to L, determines the word form occupying the corresponding cell in L’s paradigm. (Stump 2001:32)

(39)

Paradigms

Aparadigm function is a function which, when applied to the root of a lexeme L paired with aset of morphosyntactic

propertiesappropriate to L, determines the word form occupying the corresponding cell in L’s paradigm. (Stump 2001:32) paradigm :: (Lexeme, Properties) -> WordForm

paradigm (l, ps) = ...

(40)

Implementation Design Paradigms, parameters, . . .

Paradigms

Aparadigm function is a function which, when applied to the root of a lexeme L paired with aset of morphosyntactic

propertiesappropriate to L, determines the word form occupying the corresponding cell in L’s paradigm. (Stump 2001:32)

paradigm :: (Lexeme, Properties) -> WordForm paradigm (l, ps) = ...

paradigm’ :: Lexeme -> Properties -> WordForm paradigm’ l ps = paradigm (l, ps)

paradigm’ l ps = (curry paradigm) l ps paradigm’ = curry paradigm

curry :: ((a, b) -> c) -> a -> b -> c

(41)

Paradigms

Aparadigm function is a function which, when applied to the root of a lexeme L paired with aset of morphosyntactic

propertiesappropriate to L, determines the word form occupying the corresponding cell in L’s paradigm. (Stump 2001:32)

paradigm :: (Lexeme, Properties) -> WordForm paradigm (l, ps) = ...

paradigm’ :: Lexeme -> Properties -> WordForm paradigm’ l ps = paradigm (l, ps)

paradigm’ l ps = (curry paradigm) l ps paradigm’ = curry paradigm

curry :: ((a, b) -> c) -> a -> b -> c

(42)

Implementation Design Paradigms, parameters, . . .

Paradigms

Aparadigm function is a function which, when applied to the root of a lexeme L paired with aset of morphosyntactic

propertiesappropriate to L, determines the word form occupying the corresponding cell in L’s paradigm. (Stump 2001:32)

paradigm :: (Lexeme, Properties) -> WordForm paradigm (l, ps) = ...

paradigm’ :: Lexeme -> Properties -> WordForm paradigm’ l ps = paradigm (l, ps)

paradigm’ l ps = (curry paradigm) l ps paradigm’ = curry paradigm

curry :: ((a, b) -> c) -> a -> b -> c

(43)

Parameters

Instead of feature–value pairs for encoding themorphosyntactic properties (Stump 2001), we useenumerated valuesof distincttypes. The use ofdata types is essential in the system (Forsberg and Ranta 2004).

(44)

Implementation Design Paradigms, parameters, . . .

Parameters

Instead of feature–value pairs for encoding themorphosyntactic properties (Stump 2001), we useenumerated valuesof distincttypes. The use ofdata types is essential in the system (Forsberg and Ranta 2004).

data Person = First | Second | Third deriving (Eq, Enum)

(45)

Parameters

Instead of feature–value pairs for encoding themorphosyntactic properties (Stump 2001), we useenumerated valuesof distincttypes. The use ofdata types is essential in the system (Forsberg and Ranta 2004).

data Person = First | Second | Third deriving (Eq, Enum)

data Mood = Indicative | Subjunctive

| Jussive | Energetic deriving (Eq, Show, Enum)

(46)

Implementation Design Paradigms, parameters, . . .

Parameters

Instead of feature–value pairs for encoding themorphosyntactic properties (Stump 2001), we useenumerated valuesof distincttypes. The use ofdata types is essential in the system (Forsberg and Ranta 2004).

data Person = First | Second | Third deriving (Eq, Enum)

data Mood = Indicative | Subjunctive

| Jussive | Energetic deriving (Eq, Show, Enum)

data ParaVerb = VerbP Voice Person Gender Number

| VerbI Mood Voice Person Gender Number

| VerbC Gender Number

(47)

Elixir Lexicon

(a) representation of the linguistic data in an abstract and extensible notation that encodes bothorthography andphonology, and whose interpretation is customizable

(48)

Implementation Design Elixir Lexicon

Elixir Lexicon

(a) representation of the linguistic data in an abstract and extensible notation that encodes bothorthography andphonology, and whose interpretation is customizable

(b) organization of the lexicon so that there is preferablyno duplication of information and so that the lexicon can possibly be divided into separate units, as well as be interlinkedwith externalmodules

(49)

Elixir Lexicon

(a) representation of the linguistic data in an abstract and extensible notation that encodes bothorthography andphonology, and whose interpretation is customizable

(b) organization of the lexicon so that there is preferablyno duplication of information and so that the lexicon can possibly be divided into separate units, as well as be interlinkedwith externalmodules (c) definition of such format of the lexiconso that editing and

understanding the data is not inappropriately difficult, and using such data markupwhose syntax is either lightweight, or can be

edited/verified with some automatic tools, or both

(50)

Implementation Design FM Generic

FM Generic

The linguistic model and the data of the lexicon can be compiled into run- time applications or used asstandalone libraries and resources.

(51)

FM Generic

The linguistic model and the data of the lexicon can be compiled into run- time applications or used asstandalone libraries and resources.

FM Generic implements the compilation of morphological analyzers and generators (Forsberg and Ranta 2004). The method used for analysis is deterministic parsing with tries(Ljungl¨of 2002).

(52)

Implementation Design FM Generic

FM Generic

The linguistic model and the data of the lexicon can be compiled into run- time applications or used asstandalone libraries and resources.

FM Generic implements the compilation of morphological analyzers and generators (Forsberg and Ranta 2004). The method used for analysis is deterministic parsing with tries(Ljungl¨of 2002).

FM Generic also provides functions for exporting and pretty-printing the linguistic model into XFST, Lexc, SQL, XML, LATEX, . . .

(53)

Outline

1 Introduction

2 Morphological Theory

Incremental vs. Realizational Lexical vs. Inferential

Functional Arabic Morphology

3 Implementation Design ElixirFM

Paradigms, parameters, . . . Elixir Lexicon

FM Generic

4 Extensions

Encode Arabic MorphoTrees

(54)

Extensions Encode Arabic

Buckwalter Transliteration

Ñî

D

Ê

«

ð

Q

Ö

Þ

•

ð C

®

« ñ

J

. ë

ð

Y

¯

ð .

†

ñ

®

m

Ì

'

ð

é

Ó

Q

º

Ë

ú

¯

áK

ð

A

‚

Ó

P

Q

k

€

A

©J

Ô

g

.

Y

Ëñ

K

.Z

A

gB

h

ð

QK

.

A

’

ª

K

.

Ñ

îD

”

ª

K

.

ÉÓ

A

ª

K

à

yuwladu jamiyEu {ln~aAsi OaHoraArFA mutasaAwiyna fiy {lokaraAmapi wa {loHuquwqi. waqado wuhibuwA EaqolAF waDamiyrFA waEalayohimo Oano yuEaAmila baEoDuhumo baEoDFA biruwHi {loIixaA’i.

ÑîD

Ê«ð

Q

ÖÞ

•

ð C

®« ñJ

. ëð Y

¯ð .

†ñ

®m Ì

éÓQºË ú

¯ áK

ðA‚

Ó PQk

€A

©J

Ô

g

. YËñK

.ZAgB

hðQK

. A’ªK

. ÑîD”ªK

. ÉÓAªK

à

ywld jmyE AlnAs OHrArA mtsAwyn fy AlkrAmp wAlHqwq. wqd

(55)

Notation of ArabTEX

Ñî

D

Ê

«

ð

Q

Ö

Þ

•

ð C

®

« ñ

J

. ë

ð

Y

¯

ð .

†

ñ

®

m

Ì

'

ð

é

Ó

Q

º

Ë

ú

¯

áK

ð

A

‚

Ó

P

Q

k

€

A

©J

Ô

g

.

Y

Ëñ

K

.Z

A

gB

h

ð

QK

.

A

’

ª

K

.

Ñ

îD

”

ª

K

.

ÉÓ

A

ª

K

à

ÑîD

Ê«ð

Q

ÖÞ

•

ð C

®« ñJ

. ëð Y

¯ð .

†ñ

®m Ì

éÓQºË ú

¯ áK

ðA‚

Ó PQk

€A

©J

Ô

g

. YËñK

.ZAgB

hðQK

. A’ªK

. ÑîD”ªK

. ÉÓAªK

à

uladu ˇgam¯ı֒u ’n-n¯asi ֓ah.r¯aran mutas¯aw¯ına f¯ı ’l-kar¯amati wa-’l-h.uq¯uqi. Wa-qad wuhib¯u ֒aqlan wa-d.am¯ıran wa-֒alayhim ֓an yu֒¯amila ba֒d.uhum ba֒d.an bi-r¯uh.i ’l-

֓ih˘¯a֓i.

\cap yUladu ^gamI‘u an-nAsi ’a.hrAraN mutasAwIna fI al-karAmaTi wa-al-.huqUqi.

(56)

Extensions Encode Arabic

Encode Arabic

biruwHi {loIixaA’i ← Z A

gB

h

ð

QK

.

← bi-rU.hi al-’i_hA’i

Implemented in Perland available on CPAN as Encode-Arabic:

$encoded = encode "buckwalter", decode "arabtex", $decoded

$encoded = encode("buckwalter", decode("arabtex", $decoded)) Implemented in Haskell and available along withElixirFM:

encoded = encode Buckwalter $ decode ArabTeX decoded encoded = encode Buckwalter (decode ArabTeX decoded) encoded = (encode Buckwalter . decode ArabTeX) decoded

(57)

Encode Arabic

biruwHi {loIixaA’i ← Z A

gB

h

ð

QK

.

← bi-rU.hi al-’i_hA’i Implemented in Perland available on CPAN as Encode-Arabic:

$encoded = encode "buckwalter", decode "arabtex", $decoded

$encoded = encode("buckwalter", decode("arabtex", $decoded)) Implemented in Haskell and available along withElixirFM:

encoded = encode Buckwalter $ decode ArabTeX decoded encoded = encode Buckwalter (decode ArabTeX decoded) encoded = (encode Buckwalter . decode ArabTeX) decoded

(58)

Extensions Encode Arabic

Encode Arabic

biruwHi {loIixaA’i ← Z A

gB

h

ð

QK

.

← bi-rU.hi al-’i_hA’i Implemented in Perland available on CPAN as Encode-Arabic:

$encoded = encode "buckwalter", decode "arabtex", $decoded

$encoded = encode("buckwalter", decode("arabtex", $decoded)) Implemented in Haskell and available along withElixirFM:

encoded = encode Buckwalter $ decode ArabTeX decoded encoded = encode Buckwalter (decode ArabTeX decoded) encoded = (encode Buckwalter . decode ArabTeX) decoded

(59)

Encode Arabic

biruwHi {loIixaA’i ← Z A

gB

h

ð

QK

.

← bi-rU.hi al-’i_hA’i Implemented in Perland available on CPAN as Encode-Arabic:

$encoded = encode "buckwalter", decode "arabtex", $decoded

$encoded = encode("buckwalter", decode("arabtex", $decoded)) Implemented in Haskell and available along withElixirFM:

encoded = encode Buckwalter $ decode ArabTeX decoded encoded = encode Buckwalter (decode ArabTeX decoded) encoded = (encode Buckwalter . decode ArabTeX) decoded

(60)

Extensions Encode Arabic

Encode Arabic

biruwHi {loIixaA’i ← Z A

gB

h

ð

QK

.

← bi-rU.hi al-’i_hA’i Implemented in Perland available on CPAN as Encode-Arabic:

$encoded = encode "buckwalter", decode "arabtex", $decoded

$encoded = encode("buckwalter", decode("arabtex", $decoded)) Implemented in Haskell and available along withElixirFM:

encoded = encode Buckwalter $ decode ArabTeX decoded encoded = encode Buckwalter (decode ArabTeX decoded) encoded = (encode Buckwalter . decode ArabTeX) decoded

(61)

Morphology Disambiguation

Arabic is a language of rich morphology, both derivational and inflectional, with highly ambiguousorthography.

(62)

Extensions MorphoTrees

Morphology Disambiguation

Arabic is a language of rich morphology, both derivational and inflectional, with highly ambiguousorthography.

Boundaries of syntactic units,tokens, are obscure in writing—orthographical words, strings, consist of up to four lexemes.

(63)

Morphology Disambiguation

Arabic is a language of rich morphology, both derivational and inflectional, with highly ambiguousorthography.

Boundaries of syntactic units,tokens, are obscure in writing—orthographical words, strings, consist of up to four lexemes.

Disambiguation encompasses subproblems like tokenization

(64)

Extensions MorphoTrees

Morphology Disambiguation

Arabic is a language of rich morphology, both derivational and inflectional, with highly ambiguousorthography.

Boundaries of syntactic units,tokens, are obscure in writing—orthographical words, strings, consist of up to four lexemes.

Disambiguation encompasses subproblems like tokenization, full morpho- logical tagging or its simplified ‘part-of-speech’ versions

(65)

Morphology Disambiguation

Arabic is a language of rich morphology, both derivational and inflectional, with highly ambiguousorthography.

Boundaries of syntactic units,tokens, are obscure in writing—orthographical words, strings, consist of up to four lexemes.

Disambiguation encompasses subproblems like tokenization, full morpho- logical tagging or its simplified ‘part-of-speech’ versions, lemmatization

(66)

Extensions MorphoTrees

Morphology Disambiguation

Arabic is a language of rich morphology, both derivational and inflectional, with highly ambiguousorthography.

Boundaries of syntactic units,tokens, are obscure in writing—orthographical words, strings, consist of up to four lexemes.

Disambiguation encompasses subproblems like tokenization, full morpho- logical tagging or its simplified ‘part-of-speech’ versions, lemmatization, diacritization or restoration of the structural components of words

(67)

Morphology Disambiguation

Arabic is a language of rich morphology, both derivational and inflectional, with highly ambiguousorthography.

Boundaries of syntactic units,tokens, are obscure in writing—orthographical words, strings, consist of up to four lexemes.

Disambiguation encompasses subproblems like tokenization, full morpho- logical tagging or its simplified ‘part-of-speech’ versions, lemmatization, diacritization or restoration of the structural components of words, plus combinations thereof.

(68)

Extensions MorphoTrees

Linear Lists

Suppose you can list morphological analysesfor a given input string. . .

Morphs Form Token Tag Lemma Glosses per Morph

|laY+(null) ֓¯al¯a VP-A-3MS-- ֓¯al¯a promise/take an oath + he/it

|liy~ ֓¯al¯ıy A--- ֓¯al¯ıy mechanical/automatic

|liy~+u ֓¯al¯ıy-u A---1R ֓¯al¯ıy mechanical . . . + [def.nom.]

|liy~+i ֓¯al¯ıy-i A---2R ֓¯al¯ıy mechanical . . . + [def.gen.]

|liy~+a ֓¯al¯ıy-a A---4R ֓¯al¯ıy mechanical . . . + [def.acc.]

|liy~+N ֓¯al¯ıy-un A---1I ֓¯al¯ıy mechanical . . . + [indef.nom.]

|liy~+K ֓¯al¯ıy-in A---2I ֓¯al¯ıy mechanical . . . + [indef.gen.]

|l+ ֓¯al N---R ֓¯al family/clan+

+iy -¯ı S----1-S2- ¯ı +my

IilaY ֓il¯a P--- ֓il¯a to/towards Iilay+ ֓ilay P--- ֓il¯a to/towards+

+ya -ya S----1-S2- ya +me

(69)

MorphoTrees

. . . organize the analyses into a hierarchy with the string as its root

AlYúÍ

|lYúÍ

Æ

|lYúÍ

Æ

ú

Í Æ

֓¯al¯a

|lyú

Í

Æ

|lyú

Í

Æ

ú

Í

Æ

֓¯al¯ıy

|l yø

È

Æ

|lÈ

Æ

È Æ

֓¯al

yø

ø

'

¯ı

IlYúÍ

IlYúÍ

ú

Í

֓il¯a

Ily yø

ú

Í

Ilyú

Í

ú

Í

֓il¯a

yø

ø

ya

Olyú

Í

Olyú

Í

ú

Í

ðwaliya

(70)

Extensions MorphoTrees

MorphoTrees

. . . organize the analyses into a hierarchy with the string as its root and thefull tokensas theleaves

AlYúÍ

|lYúÍ

Æ

|lYúÍ

Æ

ú

Í Æ

֓¯al¯a

|lyú

Í

Æ

|lyú

Í

Æ

ú

Í

Æ

֓¯al¯ıy

|l yø

È

Æ

|lÈ

Æ

È Æ

֓¯al

yø

ø

'

¯ı

IlYúÍ

IlYúÍ

ú

Í

֓il¯a

Ily yø

ú

Í

Ilyú

Í

ú

Í

֓il¯a

yø

ø

ya

Olyú

Í

Olyú

Í

ú

Í

ðwaliya

(71)

MorphoTrees

. . . organize the analyses into a hierarchy with the string as its root and thefull tokensas theleaves, grouped by theirlemmas

AlYúÍ

|lYúÍ

Æ

|lYúÍ

Æ

ú

Í Æ

֓¯al¯a

|lyú

Í

Æ

|lyú

Í

Æ

ú

Í

Æ

֓¯al¯ıy

|l yø

È

Æ

|lÈ

Æ

È Æ

֓¯al

yø

ø

'

¯ı

IlYúÍ

IlYúÍ

ú

Í

֓il¯a

Ily yø

ú

Í

Ilyú

Í

ú

Í

֓il¯a

yø

ø

ya

Olyú

Í

Olyú

Í

ú

Í

ðwaliya

Odkazy

Související dokumenty

Turbulence, definition, properties, properties of fully developed turbulent flows in environmental aerodynamics, methods of mathematical modelling those flows5. Scalar transport

Turbulence, definition, properties, properties of fully developed turbulent flows in environmental aerodynamics, methods of mathematical modelling those flows5. Scalar transport

Turbulence, definition, properties, properties of fully developed turbulent flows in environmental aerodynamics, methods of mathematical modelling those flows5. Scalar transport

In the same way in [2] the numerical extinction has been studied using some discrete and semidiscrete schemes (we say that a solution u extincts in a finite time if it reaches the

• First experiments with extracting parallel verb frames (Bojar and Hajiˇc, 2005)... Consult (Kruijff, 2003) for measuring word

Features characterizing the properties of epithelial reference and analysis cells, the process of reducing the whole feature set to a subset of features with optimal

In the previous sections we have outlined several charac- teristics of Sturmian words. However we have limited our- selves, from the very first definitions, to one-sided infinite

- Acoustic properties of language sounds - Articulatory properties of language sounds... - Transcription: International Phonetic