Under the Supervision of_第1頁
Under the Supervision of_第2頁
Under the Supervision of_第3頁
Under the Supervision of_第4頁
Under the Supervision of_第5頁
已閱讀5頁,還剩8頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)

文檔簡介

1、thesis onartificial intelligence for universal networking language (unl)(perspective bengali language)bydeen islam muslimid: 200720851ariful hoque tuhinid: 200710698shohanur rahmanid: 200720100 under the supervision ofmd. ahsan arif, sr. lecturerdept. of computer science and engineeringasian univers

2、ity of bangladesh artificial intelligence for universal networking language (unl)(perspective bengali language)deen islam muslim, ariful hoque tuhin, shohanur rahmamdepartment of computer science and engineeringasian university of bangladeshabstract:in this paper we present the computational analysi

3、s of the complex case structure of bengali- a member of the indo aryan family of languages- with a view toward interlingua based mt. bengali is ranked 4th in the list of languages ordered according to the size of the population that speaks the language. extremely interesting language phenomena invol

4、ving morphology, case structure, word order and word senses make the processing of bengali a worthwhile and challenging proposition. a recently proposed scheme called the universal networking language has been used as the interlingua. the approach is adaptable to other members of the vast indo aryan

5、 language family. the parallel development of both the analyzer and the generator system leads to an insightful intra-system verification process in place. our approach is rule based and makes use of authoritative treatises on bengali grammar and develop rules for certain bengali to unl conversion p

6、rocess. introduction:about 189 million people speak bengali and is ranked 4th in the world in terms of the number of peoplespeakingthelanguage(ref:/mhealy/g101ilec/intro/clt/cltclt/top100.html). like most languages in the indo aryan family, descended from sanskrit, bengali

7、 has the sov structure with some typical characteristics. a motivating factor for creating a system for processing bengali is the possibility of laying the framework for processing many other bengal languages too.work on indian language processing abounds. project anubaad 1 for machine translation f

8、rom english to bengali in the newspaper domain uses the direct translation approach. angalabharati 2 system for english bengal machine translation is based on pattern directed rules for english, which generates a pseudo-target-language applicable to a group of indian languages. in matra 3, a web bas

9、ed mt system for english to bengal in the newspaper domain, the input text is transformed into case-frame like structures and parameterized templates generate the target language. the mantra mt system for official documents uses tree adjoining grammar (tag) to achieve english bengal mt (ref: project

10、 anusaaraka 4 is a language accessor system rather than an mt system and addresses multiple indian languages. interlingua based mt for english, bengal and marathi 5 6, that uses the unl, transforms the source text into the unl representation and generates target text from this intermediate represent

11、ation. references to most of these works can also be found at .in/mat/ach-mat.htm. other famous mt systems are pivot 7, atlas 8, kant 9, aries 10, geta 11, systran 12 etc. the universal networking language (unl) () has been defined as a digital meta lan

12、guage for describing, summarizing, refining, storing and disseminating information in a machine independent and human language neutral form. the information in a document is represented sentence by sentence. each sentence is converted into a directed hyper graph having concepts as nodes and relation

13、s as arcs. knowledge within a document is expressed in three dimensions: 1. word knowledge is expressed by universal words (uws), which are language independent. these uws are tagged using restrictions describing the sense of the word in the current context. for example, drink(icl liquor) denotes th

14、e noun sense of drink restricting the sense to a type of liquor. here, icl stands for inclusion and forms an is-a relationship like in semantic nets.2. conceptual knowledge is captured by relating uws through a set of unl relations 14. for example, humans affect the environment is described in the u

15、nl asagt(affect(icldo).present.entry, human(iclanimal).pl)obj(affect(icldo).present.entry, environment(iclabstract thing).pl)agt means the agent and obj the object. affect(icl do), human(icl animal) and environment(icl abstract thing) are the uws denotingconcepts.3. speakers view, aspect, time of ev

16、ent, etc. are captured by unl attributes.for instance, in the above example, the attribute entry denotes the mainpredicate of the sentence, present the present tense and pl the pluralnumber.the above discussion can be summarized using the example below john, who is the chairman of the company, has a

17、rranged a meeting at his residencethe unl for the sentence is;= unl =mod(chairman(iclpost).present.def,company(iclinstitution).def)aoj(chairman(iclpost).present.def, john(iclperson)agt(arrange(icldo).plete, john(iclperson)pos(residence(iclshelter), john(iclperson)obj(arrange(icldo).

18、plete, meeting(iclevent).indef)plc(arrange(icldo).plete, residence(iclshelter);=in the expressions above, agt denotes the agent relation, obj the object relation, plc the place relation, pos is the possessor relation, mod is the modifier relation and aoj is the attr

19、ibute-of-the-object (used to express constructs like a is b) relation. the detailed specification of the universal networking language can be found at /unlsys.our work is based on an authoritative treatise on bengali grammar. the strategies of analysis and generation of ling

20、uistic phenomena have been guided by rigorous grammatical principles.universal networking language based analysis and generation for bengalibangla-english dictionarybangla to english dictionary is the source of building a bangla to unl dictionary asuniversal words are english words mandated by unl.

21、such dictionaries also provide allattributes along with the meaning of a word. any entry in the dictionary is put in thefollowing format:hw id “uw” (attribute1, attribute2 . . .) here,hw head word (bangla word)id identification of head word (omitable)uw universal wordattribute attribute of the hwflg

22、 language flagfre frequency of head wordpri region)” (n, place) prochur “huge(iclbig)” (adj) here the attributes,n stands for nounplace stands for placeadj stands for adjectiveflg field entry is b which stands for banglaa universal knowledge base is defined in unl specification. this knowledge base

23、islanguage independent and each native language word should be referenced to this knowledgebase. the knowledge base of universal words is a hierarch of concepts.en-converter and de-converter machinesthe en-converter (henceforth called enco) is a language-independent parser, a multi-headed turing mac

24、hine providing a framework for morphological, syntactic and semantic analysis synchronously using the uw dictionary and analysis rules. the structure of the machine is shown in the figure 1.fig. 1. the enco machinethe machine has two types of heads- processing heads and context heads.the processing

25、heads (2 nos.) are called analysis windows (aw) and the context heads are called condition windows (cw). the machine traverses the sentence back and forth, retrieves the relevant universal words from the lexicon and, depending on the attributes of the nodes under the aws and those under the surround

26、ing cws, generates semantic relations between the uws and/or attaches speech act attributes to them. the final output is a set of unl expressions equivalent to a unl graph. the de-converter (henceforth called the deco) 18 is a language-independent generator that produces sentences from unl graphs (f

27、igure 2).fig. 2. the deco machinelike enco, deco too is a multi-headed turing machine. it does syntactic and morphological generation synchronously using the lexicon and the set of generation rules.existing problems:1) spell checking:by somehow, spell checker is not included with the current unl sys

28、tem. for that, there is a possibility to arising wrong output during en-conversion or de-conversion process. an example of such situation is given below: a simple english sentence in a right spelling form:i live in bangladeshaccording to 13, the final unl expression is as follows:aoj(live(iclinhabit

29、be,aojliving_thing,plcplace).entry.present,i(iclperson)plc(live(iclinhabitbe,aojliving_thing,plcplace).entry.present,bangladesh(iofasian_countrything)where “bangladesh” is assigned to the uw of “asian_country”, which tells the de-converter to search “bangladesh” word in “asian_country” category. but

30、 if we if we type the “bangladesh” word in wrong spelling like “banladesh” then it convert that word in such form:aoj(live(iclinhabitbe,aojliving_thing,plcplace).entry.present,i(iclperson)plc(live(iclinhabitbe,aojliving_thing,plcplace).entry.present,banladesh)it does not define any uw for “banglades

31、h”. there for wrong conversion can be occurred. 2) maintaining exact grammatical pattern:according to 13, for a single sentence or like some compound sentences it works fine. but we have found some crucial problems when en-converting and de-converting some multiple sentences.for an example, using th

32、is sentence:i like rice and i play football.en-converting this sentence in unl forms: aoj:01(like(iclpleasebe,equenjoy,objuw,aojperson).entry.present,i(iclperson):01)obj:01(like(iclpleasebe,equenjoy,objuw,aojperson).entry.present,rice(iclgrainthing)agt:02(play(iclcompetedo,agtthing,objuw,ptnthing).e

33、ntry.present,i(iclperson):02)obj:02(play(iclcompetedo,agtthing,objuw,ptnthing).entry.present,football(iclfield_gamething)and(:02,:01)after de-converting this unl form in english:i like rice and i play football.but, problem occurs when using some multiple sentences like:we play football. we try to wi

34、n every match.en-converting this sentence in unl forms:agt(try(iclattemptdo,agtperson,objuw).entry.present,we(iclgroup):01.pl)pos:01(match(iclcontestthing),we(iclgroup):02)fictit(try(iclattemptdo,agtperson,objuw).entry.present,every(iclquantity,perthing)obj:01(win(iclprizedo,agtthing,objthing,scnthi

35、ng).entry,match(iclcontestthing)obj(try(iclattemptdo,agtperson,objuw).entry.present,:01)after de-converting this unl form back to english:we try win we match.more examplei am rahim. i am a student of asian university of bangladesh.en-converting this sentence in unl forms:aoj(rahim.entry.present,i(ic

36、lperson):01)fictit(rahim.entry.present,i(iclperson):02)aoj(student(icluniversity_studentperson,objknowledge_domain).indef.present,i(iclperson):02)mod(university(iclbodything),asian(icladj,comasia)obj(student(icluniversity_studentperson,objknowledge_domain).indef.present,university(iclbodything)obj(u

37、niversity(iclbodything),bangladesh(iofasian_countrything)after de-converting this unl form back to english:i be rahim. i a student of an asian university of bangladesh3) lack of exact rules and absence of ai: lets try some tricks with unl:my name is casper and i dont like to play football.according

38、to 13, en-conversion takes this sentence in unl form like this way:pos(name(icllanguage_unitthing,posthing),i(iclperson):01)aoj(casper(iofcitything).entry.present,name(icllanguage_unitthing,posthing)and(:01,casper(iofcitything).entry.present)aoj:01(like(iclpleasebe,equenjoy,objuw,aojperson).entry.no

39、t.present,i(iclperson):02)obj:01(like(iclpleasebe,equenjoy,objuw,aojperson).entry.not.present,play(iclcompetedo,agtthing,objuw,ptnthing)obj:01(play(iclcompetedo,agtthing,objuw,ptnthing),football(iclfield_gamething)where name “casper” categorized under “city”, not as a person name. similarly for bang

40、la to unl conversion we can give an example like:golapi ekhon aar train-e uthena.during the en-conversion, we may face the problem to identify whether golapi is a person name or a color name.4) verb representation problem:let take a look at this example:i am a good boy.en-converted unl form:aoj(boy(

41、iclchildperson,antgirl).entry.indef.present,i(iclperson)mod(boy(iclchildperson,antgirl).entry.indef.present,good(icladj,antbad)de-converted english:i be a good boy.another example:my name is karim and i like flowers.en-converted unl form:pos(name(icllanguage_unitthing,posthing),i(iclperson):01)aoj(k

42、erim(iclname,iofperson,commale).entry.present,name(icllanguage_unitthing,posthing)and(i(iclperson):02,karim(iclname,iofperson,commale).entry.present)man(kerim(iclname,iofperson,commale).entry.present,like(iclhow,objthing)man(i(iclperson):02,like(iclhow,objthing)obj(like(iclhow,objthing),flower(iclan

43、giospermthing).pl)de-converted english:the name of i like is karim and i this like the flowers.recommended solution: 1) implementation of a good spell checker 2) rearrange the sentence in exact grammatical pattern3) implementation of ai and exact rules4) determine the exact form of verb representati

44、onconclusionssystematic analysis of the case structure forms the foundation for any natural language processing system. in this paper, we have described a system for the computational analysis of the bengali case structure for the purpose of interlingua based mt using unl. the complementary generato

45、r system too has been implemented, which provides the platform for intra system verification. verification via cross system generation is being done using the bengal generation system (also under development.) apart from the case structure, computational analysis based on authoritative grammatical t

46、reatise, addressing complex phenomena involving verbs, adjectives and adverbs is under way.references:1 dey, k.: project anubaad: an english-bengali mt system. jadavpur university, kolkata (2001)2 sinha, r.: machine translation: the indian context. akshara94, new delhi (1994)3 rao, d., mohanraj, k.,

47、 hedge, j., mehta, v., mahadane, p.: a practical framework for syntactic transfer of compound-complex sentences for english-bengal machine translation. (2000)4 bharati, a., chaitanya, v., sanyal, r.: natural language processing: a paninian perspective. prentice hall india private limited (1996)5 dav

48、e, s., bhattacharya, p., girishbhai, p.j.: interlingua based english-bengal machine translation and language divergence. journal of machine translation, volume 17 (2002)6 monju, m., sachi, d., bhattacharyya, p.: knowledge extraction from bengal texts. knowledge based computer systems, proceedings of

49、 the international conference kbcs2000 (2000)7 muraki, k.: pivot: two-phase machine translation system. mt summit manuscripts and program, pp. 81-83 (1987)8 uchida, h.: atlas. mt summit ii, pp. 152-157 (1989)9 lonsdale, d.w., franz, a.m., leavitt, j.r.r.: large-scale machine translation: an interlin

50、gua approach. (/research/kant/pdf/aei94.pdf)10 gonzlez, j.c., go, j.m., nieto, a.f.: aries: a ready for use platform for engineering spanish-processing tools. digest of the second language engineering convention, pages 219-226 (1995) 11 vauquois, b., boitet, c.: automated translati

51、on at grenoble university. (/j/j85/j85-1003.pdf)12 w.john, h., l., s.h.: an introduction to machine translation. london: acamedic press (1992)13 russian and english language server, the russian unl language center, www.unl.ru.我的大學(xué)愛情觀1、什么是大學(xué)愛情:大學(xué)是一個相對寬松,時間自由,自己支配的環(huán)境,也正因?yàn)檫@樣,培植愛情之花最肥沃的土地。大學(xué)生戀愛一直是大學(xué)校園的熱門話題,戀愛和學(xué)業(yè)也就自然成為了大學(xué)生在校期間面對的兩個主要問題。戀愛關(guān)系處理得好、正確,健康,可以成為學(xué)習(xí)和事業(yè)的催化劑,使人學(xué)習(xí)努力、成績上升;戀愛關(guān)系處理的不當(dāng),不健康,可能分散精力、浪費(fèi)時間、情緒波動、成績下降。因此,大學(xué)生的戀愛觀必須樹立在健康之上,并且樹立正確的戀愛觀是十分有必要的。因此我從下面幾方面談?wù)勛约旱膶Υ髮W(xué)愛情觀。2、什么是健康的愛情:1) 尊重對方,不顯示對愛情的占有欲,不把

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論