transition-relevance_第1頁
transition-relevance_第2頁
transition-relevance_第3頁
transition-relevance_第4頁
transition-relevance_第5頁
已閱讀5頁,還剩60頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

1、CS 224S/LING 281 Speech Recognition, Synthesis, and DialogueDan JurafskyLecture 12: Dialog Part I: Human conversation, frame-based dialogue systems, and VoiceXMLOutlineThe Linguistics of ConversationBasic Conversational AgentsASRNLUGenerationDialogue ManagerDialogue Manager DesignFinite StateFrame-b

2、asedInitiative: User, System, MixedVoiceXMLConversational AgentsAKA:Spoken Language SystemsDialogue SystemsSpeech Dialogue SystemsApplications:Travel arrangements (Amtrak, United airlines)Telephone call routingTutoringCommunicating with robotsAnything with limited screen/keyboardA travel dialog: Com

3、municatorXu and Rudnicky (2000)Call routing: ATT HMIHYGoren et al. (1997)A tutorial dialogue: ITSPOKELitman and Silliman (2004)Linguistics of Human ConversationTurn-takingSpeech ActsGroundingConversational StructureImplicatureTurn-takingDialogue is characterized by turn-taking.A:B:A:B:Resource alloc

4、ation problem:How do speakers know when to take the floor?Turn-taking rulesSacks et al. (1974)At each transition-relevance place of each turn:a. If during this turn the current speaker has selected B as the next speaker then B must speak next.b. If the current speaker does not select the next speake

5、r, any other speaker may take the next turn.c. If no one else takes the next turn, the current speaker may take the next turn.Implications of subrule aFor some utterances the current speaker selects the next speakerAdjacency pairsQuestion/answerGreeting/greetingCompliment/downplayerRequest/grantSile

6、nce between 2 parts of adjacency pair is different than silence afterA: Is there something bothering you or not?(1.0)A: Yes or no?(1.5)A: EhB: No.Speech ActsAustin (1962): An utterance is a kind of actionClear case: performativesI name this ship the TitanicI second that motionI bet you five dollars

7、it will snow tomorrowPerformative verbs (name, second)Austins idea: not just these verbsEach utterance is 3 actsLocutionary act: the utterance of a sentence with a particular meaningIllocutionary act: the act of asking, answering, promising, etc., in uttering a sentence.Perlocutionary act: the (ofte

8、n intentional) production of certain effects upon the thoughts, feelings, or actions of addressee in uttering a sentence.Locutionary and illocutionary“You cant do that!”Illocutionary force:ProtestingPerlocutionary force:Effect of annoying addresseeEffect of stopping addressee from doing somethingThe

9、 3 levels of act revisitedLocutionaryForceIllocutionaryForcePerlocutionaryForceCan I have the rest of your sandwich?OrAre you going to finish that?QuestionRequestEffect: You give me sandwich (or you are amused by my quoting from “Diner”) (or etc)I want the rest of your sandwichDeclarativeRequestEffe

10、ct: as aboveGive me your sandwich!ImperativeRequestEffect: as above.Illocutionary ActsWhat are they?5 classes of speech acts: Searle (1975)Assertives: committing the speaker to somethings being the case (suggesting, putting forward, swearing, boasting, concluding)Directives: attempts by the speaker

11、to get the addressee to do something (asking, ordering, requesting, inviting, advising, begging)Commissives: Committing the speaker to some future course of action (promising, planning, vowing, betting, opposing).Expressives: expressing the psychological state of the speaker about a state of affairs

12、 (thanking, apologizing, welcoming, deploring).Declarations: bringing about a different state of the world via the utterance (I resign; Youre fired)GroundingWhy do elevator buttons light up?Clark (1996) (after Norman 1988)Principle of closure. Agents performing an action require evidence, sufficient

13、 for current purposes, that they have succeeded in performing itWhat is the linguistic correlate of this?GroundingNeed to know whether an action succeeded or failedDialogue is also an actiona collective action performed by speaker and hearerCommon ground: set of things mutually believed by both spea

14、ker and hearerNeed to achieve common ground, so hearer must ground or acknowledge speakers utterance.How do speakers ground? Clark and SchaeferContinued attention: B continues attending to ARelevant next contribution: B starts in on next relevant contributionAcknowledgement: B nods or says continuer

15、 like uh-huh, yeah, assessment (great!)Demonstration: B demonstrates understanding A by paraphrasing or reformulating As contribution, or by collaboratively completing As utteranceDisplay: B displays verbatim all or part of As presentationA human-human conversationGrounding examplesDisplay:C: I need

16、 to travel in MayA: And, what day in May did you want to travel?AcknowledgementC: He wants to fly from BostonA: mm-hmmC: to Baltimore Washington InternationalMm-hmm (usually transcribed “uh-huh”) is a backchannel, continuer, or acknowledgement tokenGrounding Examples (2)Acknowledgement + next releva

17、nt contributionAnd, what day in May did you want to travel?And youre flying into what city?And what time would you like to leave?The and indicates to the client that agent has successfully understood answer to the last question.Grounding negative responsesFrom Cohen et al. (2004)System: Did you want

18、 to review some more of your personal profile?Caller: No.System: Okay, whats next?System: Did you want to review some more of your personal profile?Caller: No.System: Whats next?Bad!Good!Grounding and Dialogue SystemsGrounding is not just a tidbit about humansIs key to design of conversational agent

19、Why?HCI researchers find users of speech-based interfaces are confused when system doesnt give them an explicit acknowledgement signalStifelman et al. (1993), Yankelovich et al. (1995)Why is this customer confused?Customer: (rings)Operator: Directory Enquiries, for which town please?Customer: Could

20、you give me the phone number of um: Mrs. um: Smithson?Operator: Yes, which town is this at please?Customer: Huddleston.Operator: Yes. And the name again?Customer: Mrs. SmithsonConversational StructureTelephone conversationsStage 1: Enter a conversationStage 2: IdentificationStage 3: Establish joint

21、willingness to converseStage 4: First topic is raised, usually by callerConversational ImplicatureA: And, what day in May did you want to travel?C: OK, uh, I need to be there for a meeting thats from the 12th to the 15th.Note that client did not answer question.Meaning of clients sentence:MeetingSta

22、rt-of-meeting: 12thEnd-of-meeting: 15thDoesnt say anything about flying!What is it that licenses agent to infer that client is mentioning this meeting so as to inform the agent of the travel dates?Conversational Implicature (2)A: theres 3 non-stops today.This would still be true if 7 non-stops today

23、.But no, the agent means: 3 and only 3.How can client infer that agent means: only 3Grice: conversational implicatureImplicature means a particular class of licensed inferences.Grice (1975) proposed that what enables hearers to draw correct inferences is:Cooperative PrincipleThis is a tacit agreemen

24、t by speakers and listeners to cooperate in communication4 Gricean MaximsRelevance: Be relevantQuantity: Do not make your contribution more or less informative than requiredQuality: try to make your contribution one that is true (dont say things that are false or for which you lack adequate evidence

25、)Manner: Avoid ambiguity and obscurity; be brief and orderlyRelevanceA: Is Regina here?B: Her car is outside.Implication: yesHearer thinks: Why mention the car? It must be relevant. How could it be relevant? It could since: if her car is here she is probably here.Client: I need to be there for a mee

26、ting thats from the 12th to the 15thHearer thinks:Speaker is following maxims, would only have mentioned meeting if it was relevant. How could meeting be relevant? If client meant me to understand that he had to depart in time for the mtg.QuantityA: How much money do you have on you?B: I have 5 doll

27、arsImplication: not 6 dollarsSimilarly, 3 non stops cant mean 7 non-stops Hearer thinks:If speaker meant 7 non-stops she would have said 7 non-stopsA: Did you do the reading for todays class?B: I intended toImplication: NoBs answer would be true if B intended to do the reading AND did the reading, b

28、ut would then violate maximDialogue System ArchitectureSpeech recognitionASR issues in Dialogue Systems:Language models are differentThe speaker is talking to us for a whileIts probably telephone speechLanguage ModelLanguage models for dialogue are often based on hand-written Context-Free or finite-

29、state grammars rather than N-gramsWhy? Because of need for understanding; we need to constrain user to say things that we know what to do with.Language Models for Dialogue (2)We can have LM specific to a dialogue stateIf system just asked “What city are you departing from?”LM can beCity names onlyFS

30、A: (I want to (leave|depart) (from) CITYNAMEN-grams trained on answers to “Cityname” questions from labeled dataA LM that is constrained in this way is technically called a “restricted grammar” or “restricted LM”Talking to the same human over the whole conversation.Same speakerSo can adapt to speake

31、rAcoustic AdaptationVocal Tract Length Normalization (VTLN)Maximum Likelihood Linear Regression (MLLR)Language Model adaptationPronunciation adaptationBarge-inSpeakers barge-inNeed to deal properly with this via speech-detection, etc.Natural Language UnderstandingOr “NLU”O(jiān)r “Computational semantics”

32、There are many ways to represent the meaning of sentencesFor speech dialogue systems, most common is “Frame and slot semantics”.An example of a frameShow me morning flights from Boston to SF on Tuesday.SHOW:FLIGHTS:ORIGIN:CITY: BostonDATE: TuesdayTIME: morningDEST:CITY: San FranciscoGeneration and T

33、TSGeneration componentChooses concepts to express to userPlans out how to express these concepts in wordsAssigns any necessary prosody to the wordsTTS componentWhat weve seenIn practice both often based on canned sentencesDialogue ManagerControls the architecture and structure of dialogueTakes input

34、 from ASR/NLU componentsMaintains some sort of stateInterfaces with Task ManagerPasses output to NLG/TTS modulesFour architectures for dialogue managementFinite StateFrame-basedInformation StateMarkov Decision ProcessesAI PlanningFinite-State Dialogue MgmtConsider a trivial airline travel systemAsk

35、the user for a departure cityFor a destination cityFor a timeWhether the trip is round-trip or not Finite State Dialogue ManagerFinite-state dialogue managersSystem completely controls the conversation with the user.It asks the user a series of questionsIgnoring (or misinterpreting) anything the use

36、r says that is not a direct answer to the systems questionsDialogue InitiativeSystems that control conversation like this are system initiative or single initiative.“Initiative”: who has control of conversationIn normal human-human dialogue, initiative shifts back and forth between participants.Syst

37、em InitiativeSystems which completely control the conversation at all times are called system initiative.Advantages: Simple to buildUser always knows what they can say nextSystem always knows what user can say nextKnown words: Better performance from ASRKnown topic: Better performance from NLUOk for

38、 VERY simple tasks (entering a credit card, or login name and password)Disadvantage:Too limitedUser InitiativeUser directs the systemGenerally, user asks a single question, system answersSystem cant ask questions back, engage in clarification dialogue, confirmation dialogueUsed for simple database q

39、ueriesUser asks question, system gives answerWeb search is user initiative dialogue.Problems with System InitiativeReal dialogue involves give and take!In travel planning, users might want to say something that is not the direct answer to the question.For example answering more than one question in

40、a sentence:Hi, Id like to fly from Seattle Tuesday morningI want a flight from Milwaukee to Orlando one way leaving after 5 p.m. on Wednesday.Single initiative + universalsWe can give users a little more flexibility by adding universal commandsUniversals: commands you can say anywhereAs if we augmen

41、ted every state of FSA with theseHelpStart overCorrectThis describes many implemented systemsBut still doesnt allow user to say what the want to sayMixed InitiativeConversational initiative can shift between system and userSimplest kind of mixed initiative: use the structure of the frame itself to g

42、uide dialogueSlotQuestionORIGINWhat city are you leaving from?DESTWhere are you going?DEPT DATEWhat day would you like to leave?DEPT TIMEWhat time would you like to leave?AIRLINEWhat is your preferred airline?Frames are mixed-initiativeUser can answer multiple questions at once.System asks questions

43、 of user, filling any slots that user specifiesWhen frame is filled, do database queryIf user answers 3 questions at once, system has to fill slots and not ask these questions again!Anyhow, we avoid the strict constraints on order of the finite-state architecture.Multiple framesflights, hotels, rent

44、al carsFlight legs: Each flight can have multiple legs, which might need to be discussed separatelyPresenting the flights (If there are multiple flights meeting users constraints)It has slots like 1ST_FLIGHT or 2ND_FLIGHT so user can ask “how much is the second one”General route information:Which ai

45、rlines fly from Boston to San FranciscoAirfare practices:Do I have to stay over Saturday to get a decent airfare?Multiple FramesNeed to be able to switch from frame to frameBased on what user says.Disambiguate which slot of which frame an input is supposed to fill, then switch dialogue control to th

46、at frame.Main implementation: production rulesDifferent types of inputs cause different productions to fireEach of which can flexibly fill in different framesCan also switch control to different frameVoiceXMLVoice eXtensible Markup LanguageAn XML-based dialogue design languageMakes use of ASR and TTSDeals well with simple, frame-based mixed initiative dialogue.Most common in commercial world (too limited for research systems)But useful to get a handle on the concepts.V

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論