UndertheSupervisionof_第1頁
UndertheSupervisionof_第2頁
UndertheSupervisionof_第3頁
UndertheSupervisionof_第4頁
UndertheSupervisionof_第5頁
已閱讀5頁,還剩8頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

1、thesis onartificial intelligence for universal networking language (unl)(perspective bengali language)bydeen islam muslimid: 200720851ariful hoque tuhinid: 200710698shohanur rahmanid: 200720100 under the supervision ofmd. ahsan arif, sr. lecturerdept. of computer science and engineeringasian univers

2、ity of bangladesh artificial intelligence for universal networking language (unl)(perspective bengali language)deen islam muslim, ariful hoque tuhin, shohanur rahmamdepartment of computer science and engineeringasian university of bangladeshabstract:in this paper we present the computational analysi

3、s of the complex case structure of bengali- a member of the indo aryan family of languages- with a view toward interlingua based mt. bengali is ranked 4th in the list of languages ordered according to the size of the population that speaks the language. extremely interesting language phenomena invol

4、ving morphology, case structure, word order and word senses make the processing of bengali a worthwhile and challenging proposition. a recently proposed scheme called the universal networking language has been used as the interlingua. the approach is adaptable to other members of the vast indo aryan

5、 language family. the parallel development of both the analyzer and the generator system leads to an insightful intra-system verification process in place. our approach is rule based and makes use of authoritative treatises on bengali grammar and develop rules for certain bengali to unl conversion p

6、rocess. introduction:about 189 million people speak bengali and is ranked 4th in the world in terms of the number of people speaking the language (ref:/mhealy/g101ilec/intro/clt/cltclt/top100.html). like most languages in the indo aryan family, descende

7、d from sanskrit, bengali has the sov structure with some typical characteristics. a motivating factor for creating a system for processing bengali is the possibility of laying the framework for processing many other bengal languages too.work on indian language processing abounds. project anubaad 1 f

8、or machine translation from english to bengali in the newspaper domain uses the direct translation approach. angalabharati 2 system for english bengal machine translation is based on pattern directed rules for english, which generates a pseudo-target-language applicable to a group of indian language

9、s. in matra 3, a web based mt system for english to bengal in the newspaper domain, the input text is transformed into case-frame like structures and parameterized templates generate the target language. the mantra mt system for official documents uses tree adjoining grammar (tag) to achieve english

10、 bengal mt (ref: project anusaaraka 4 is a language accessor system rather than an mt system and addresses multiple indian languages. interlingua based mt for english, bengal and marathi 5 6, that uses the unl, transforms the source text into the unl representation and generates target text from thi

11、s intermediate representation. references to most of these works can also be found at .in/mat/ach-mat.htm. other famous mt systems are pivot 7, atlas 8, kant 9, aries 10, geta 11, systran 12 etc. the universal networking language (unl) () has been defin

12、ed as a digital meta language for describing, summarizing, refining, storing and disseminating information in a machine independent and human language neutral form. the information in a document is represented sentence by sentence. each sentence is converted into a directed hyper graph having concep

13、ts as nodes and relations as arcs. knowledge within a document is expressed in three dimensions: 1. word knowledge is expressed by universal words (uws), which are language independent. these uws are tagged using restrictions describing the sense of the word in the current context. for example, drin

14、k(icl > liquor) denotes the noun sense of drink restricting the sense to a type of liquor. here, icl stands for inclusion and forms an is-a relationship like in semantic nets.2. conceptual knowledge is captured by relating uws through a set of unl relations 14. for example, humans affect the envi

15、ronment is described in the unl asagt(affect(icl>do).present.entry, human(icl>animal).pl)obj(affect(icl>do).present.entry, environment(icl>abstract thing).pl)agt means the agent and obj the object. affect(icl > do), human(icl >animal) and environment(icl > abstract thing) are th

16、e uws denotingconcepts.3. speakers view, aspect, time of event, etc. are captured by unl attributes.for instance, in the above example, the attribute entry denotes the mainpredicate of the sentence, present the present tense and pl the pluralnumber.the above discussion can be summarized using the ex

17、ample below john, who is the chairman of the company, has arranged a meeting at his residencethe unl for the sentence is;= unl =mod(chairman(icl>post).present.def,company(icl>institution).def)aoj(chairman(icl>post).present.def, john(icl>person)agt(arrange(icl>do).plet

18、e, john(icl>person)pos(residence(icl>shelter), john(icl>person)obj(arrange(icl>do).plete, meeting(icl>event).indef)plc(arrange(icl>do).plete, residence(icl>shelter);=in the expressions above, agt denotes the agent relation, obj the object relati

19、on, plc the place relation, pos is the possessor relation, mod is the modifier relation and aoj is the attribute-of-the-object (used to express constructs like a is b) relation. the detailed specification of the universal networking language can be found at /unlsys.our work

20、is based on an authoritative treatise on bengali grammar. the strategies of analysis and generation of linguistic phenomena have been guided by rigorous grammatical principles.universal networking language based analysis and generation for bengalibangla-english dictionarybangla to english dictionary

21、 is the source of building a bangla to unl dictionary asuniversal words are english words mandated by unl. such dictionaries also provide allattributes along with the meaning of a word. any entry in the dictionary is put in thefollowing format:hw id “uw” (attribute1, attribute2 . . .) <flg, fre,

22、pri>here,hw < head word (bangla word)id < identification of head word (omitable)uw < universal wordattribute < attribute of the hwflg < language flagfre < frequency of head wordpri < priority of head wordsome example entries of dictionary for bangla are given below:shohor “ci

23、ty(icl>region)” (n, place) <b,0,0>prochur “huge(icl>big)” (adj) <b,0,0>here the attributes,n stands for nounplace stands for placeadj stands for adjectiveflg field entry is b which stands for banglaa universal knowledge base is defined in unl specification. this knowledge base isla

24、nguage independent and each native language word should be referenced to this knowledgebase. the knowledge base of universal words is a hierarch of concepts.en-converter and de-converter machinesthe en-converter (henceforth called enco) is a language-independent parser, a multi-headed turing machine

25、 providing a framework for morphological, syntactic and semantic analysis synchronously using the uw dictionary and analysis rules. the structure of the machine is shown in the figure 1.fig. 1. the enco machinethe machine has two types of heads- processing heads and context heads.the processing head

26、s (2 nos.) are called analysis windows (aw) and the context heads are called condition windows (cw). the machine traverses the sentence back and forth, retrieves the relevant universal words from the lexicon and, depending on the attributes of the nodes under the aws and those under the surrounding

27、cws, generates semantic relations between the uws and/or attaches speech act attributes to them. the final output is a set of unl expressions equivalent to a unl graph. the de-converter (henceforth called the deco) 18 is a language-independent generator that produces sentences from unl graphs (figur

28、e 2).fig. 2. the deco machinelike enco, deco too is a multi-headed turing machine. it does syntactic and morphological generation synchronously using the lexicon and the set of generation rules.existing problems:1) spell checking:by somehow, spell checker is not included with the current unl system.

29、 for that, there is a possibility to arising wrong output during en-conversion or de-conversion process. an example of such situation is given below: a simple english sentence in a right spelling form:i live in bangladeshaccording to 13, the final unl expression is as follows:aoj(live(icl>inhabit

30、>be,aoj>living_thing,plc>place).entry.present,i(icl>person)plc(live(icl>inhabit>be,aoj>living_thing,plc>place).entry.present,bangladesh(iof>asian_country>thing)where “bangladesh” is assigned to the uw of “asian_country”, which tells the de-converter to search “banglades

31、h” word in “asian_country” category. but if we if we type the “bangladesh” word in wrong spelling like “banladesh” then it convert that word in such form:aoj(live(icl>inhabit>be,aoj>living_thing,plc>place).entry.present,i(icl>person)plc(live(icl>inhabit>be,aoj>living_thing,pl

32、c>place).entry.present,banladesh)it does not define any uw for “bangladesh”. there for wrong conversion can be occurred. 2) maintaining exact grammatical pattern:according to 13, for a single sentence or like some compound sentences it works fine. but we have found some crucial problems when en-c

33、onverting and de-converting some multiple sentences.for an example, using this sentence:i like rice and i play football.en-converting this sentence in unl forms: aoj:01(like(icl>please>be,equ>enjoy,obj>uw,aoj>person).entry.present,i(icl>person):01)obj:01(like(icl>please>be,eq

34、u>enjoy,obj>uw,aoj>person).entry.present,rice(icl>grain>thing)agt:02(play(icl>compete>do,agt>thing,obj>uw,ptn>thing).entry.present,i(icl>person):02)obj:02(play(icl>compete>do,agt>thing,obj>uw,ptn>thing).entry.present,football(icl>field_game>thing

35、)and(:02,:01)after de-converting this unl form in english:i like rice and i play football.but, problem occurs when using some multiple sentences like:we play football. we try to win every match.en-converting this sentence in unl forms:agt(try(icl>attempt>do,agt>person,obj>uw).entry.prese

36、nt,we(icl>group):01.pl)pos:01(match(icl>contest>thing),we(icl>group):02)fictit(try(icl>attempt>do,agt>person,obj>uw).entry.present,every(icl>quantity,per>thing)obj:01(win(icl>prize>do,agt>thing,obj>thing,scn>thing).entry,match(icl>contest>thing)obj(

37、try(icl>attempt>do,agt>person,obj>uw).entry.present,:01)after de-converting this unl form back to english:we try win we match.more examplei am rahim. i am a student of asian university of bangladesh.en-converting this sentence in unl forms:aoj(rahim.entry.present,i(icl>person):01)fict

38、it(rahim.entry.present,i(icl>person):02)aoj(student(icl>university_student>person,obj>knowledge_domain).indef.present,i(icl>person):02)mod(university(icl>body>thing),asian(icl>adj,com>asia)obj(student(icl>university_student>person,obj>knowledge_domain).indef.prese

39、nt,university(icl>body>thing)obj(university(icl>body>thing),bangladesh(iof>asian_country>thing)after de-converting this unl form back to english:i be rahim. i a student of an asian university of bangladesh3) lack of exact rules and absence of ai: lets try some tricks with unl:my na

40、me is casper and i don't like to play football.according to 13, en-conversion takes this sentence in unl form like this way:pos(name(icl>language_unit>thing,pos>thing),i(icl>person):01)aoj(casper(iof>city>thing).entry.present,name(icl>language_unit>thing,pos>thing)and(

41、:01,casper(iof>city>thing).entry.present)aoj:01(like(icl>please>be,equ>enjoy,obj>uw,aoj>person).entry.not.present,i(icl>person):02)obj:01(like(icl>please>be,equ>enjoy,obj>uw,aoj>person).entry.not.present,play(icl>compete>do,agt>thing,obj>uw,ptn>t

42、hing)obj:01(play(icl>compete>do,agt>thing,obj>uw,ptn>thing),football(icl>field_game>thing)where name “casper” categorized under “city”, not as a person name. similarly for bangla to unl conversion we can give an example like:golapi ekhon aar train-e uthena.during the en-conversi

43、on, we may face the problem to identify whether golapi is a person name or a color name.4) verb representation problem:let take a look at this example:i am a good boy.en-converted unl form:aoj(boy(icl>child>person,ant>girl).entry.indef.present,i(icl>person)mod(boy(icl>child>person,

44、ant>girl).entry.indef.present,good(icl>adj,ant>bad)de-converted english:i be a good boy.another example:my name is karim and i like flowers.en-converted unl form:pos(name(icl>language_unit>thing,pos>thing),i(icl>person):01)aoj(kerim(icl>name,iof>person,com>male).entry.p

45、resent,name(icl>language_unit>thing,pos>thing)and(i(icl>person):02,karim(icl>name,iof>person,com>male).entry.present)man(kerim(icl>name,iof>person,com>male).entry.present,like(icl>how,obj>thing)man(i(icl>person):02,like(icl>how,obj>thing)obj(like(icl>h

46、ow,obj>thing),flower(icl>angiosperm>thing).pl)de-converted english:the name of i like is karim and i this like the flowers.recommended solution: 1) implementation of a good spell checker 2) rearrange the sentence in exact grammatical pattern3) implementation of ai and exact rules4) determin

47、e the exact form of verb representationconclusionssystematic analysis of the case structure forms the foundation for any natural language processing system. in this paper, we have described a system for the computational analysis of the bengali case structure for the purpose of interlingua based mt

48、using unl. the complementary generator system too has been implemented, which provides the platform for intra system verification. verification via cross system generation is being done using the bengal generation system (also under development.) apart from the case structure, computational analysis

49、 based on authoritative grammatical treatise, addressing complex phenomena involving verbs, adjectives and adverbs is under way.references:1 dey, k.: project anubaad: an english-bengali mt system. jadavpur university, kolkata (2001)2 sinha, r.: machine translation: the indian context. akshara94, new

50、 delhi (1994)3 rao, d., mohanraj, k., hedge, j., mehta, v., mahadane, p.: a practical framework for syntactic transfer of compound-complex sentences for english-bengal machine translation. (2000)4 bharati, a., chaitanya, v., sanyal, r.: natural language processing: a paninian perspective. prentice h

51、all india private limited (1996)5 dave, s., bhattacharya, p., girishbhai, p.j.: interlingua based english-bengal machine translation and language divergence. journal of machine translation, volume 17 (2002)6 monju, m., sachi, d., bhattacharyya, p.: knowledge extraction from bengal texts. knowledge b

52、ased computer systems, proceedings of the international conference kbcs2000 (2000)7 muraki, k.: pivot: two-phase machine translation system. mt summit manuscripts and program, pp. 81-83 (1987)8 uchida, h.: atlas. mt summit ii, pp. 152-157 (1989)9 lonsdale, d.w., franz, a.m., leavitt, j.r.r.: large-s

53、cale machine translation: an interlingua approach. (/research/kant/pdf/aei94.pdf)10 gonzlez, j.c., go, j.m., nieto, a.f.: aries: a ready for use platform for engineering spanish-processing tools. digest of the second language engineering convention, pages 219-226 (1995) 11 vauquois

54、, b., boitet, c.: automated translation at grenoble university. (/j/j85/j85-1003.pdf)12 w.john, h., l., s.h.: an introduction to machine translation. london: acamedic press (1992)13 russian and english language server, the russian unl language center, www.unl.ru.我的大學(xué)愛情觀1、什么是大學(xué)愛情:大學(xué)是一個(gè)相對(duì)寬松,時(shí)間自由,自己支配的環(huán)境,也正因?yàn)檫@樣,培植愛情之花最肥沃的土地。大學(xué)生戀愛一直是大學(xué)校園的熱門話題,戀愛和學(xué)業(yè)也就自然成為了大學(xué)生在校期間面對(duì)的兩個(gè)主要問題。戀愛關(guān)系處理得好、正確,健康,可以成為學(xué)習(xí)和事業(yè)的催化劑,使人學(xué)習(xí)努力、成績上升;戀愛關(guān)系處理的不當(dāng),不健康,可能分散精力、浪費(fèi)時(shí)間、情緒波動(dòng)、成績下

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論