A-Hierarchical-Approach-to-POMDP-Planning-and-ExecutionPOMDP規(guī)劃與執(zhí)行一個分層的方法課件_第1頁
A-Hierarchical-Approach-to-POMDP-Planning-and-ExecutionPOMDP規(guī)劃與執(zhí)行一個分層的方法課件_第2頁
A-Hierarchical-Approach-to-POMDP-Planning-and-ExecutionPOMDP規(guī)劃與執(zhí)行一個分層的方法課件_第3頁
A-Hierarchical-Approach-to-POMDP-Planning-and-ExecutionPOMDP規(guī)劃與執(zhí)行一個分層的方法課件_第4頁
A-Hierarchical-Approach-to-POMDP-Planning-and-ExecutionPOMDP規(guī)劃與執(zhí)行一個分層的方法課件_第5頁
已閱讀5頁,還剩117頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

HierarchicalMethodsfor

PlanningunderUncertaintyThesisProposalJoellePineauThesisCommittee:SebastianThrun,ChairMatthewMasonAndrewMooreCraigBoutilier,U.ofTorontoHierarchicalMethodsfor

PlannIntegratingrobotsinlivingenvironmentsTherobot’srole: -Socialinteraction -Mobilemanipulation -Intelligentreminding -Remote-operation -Datacollection/monitoringThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyIntegratingrobotsinlivingeAbroadperspectiveGOAL=SelectingappropriateactionsUSER+WORLD+ROBOTACTIONSOBSERVATIONSBeliefstateSTATEThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyAbroadperspectiveGOAL=SeleCause#1:Non-deterministiceffectsofactionsCause#2:PartialandnoisysensorinformationCause#3:InaccuratemodeloftheworldandtheuserWhyisthisadifficultproblem?UNCERTAINTYThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyCause#1:Non-deterministiceCause#1:Non-deterministiceffectsofactionsCause#2:PartialandnoisysensorinformationCause#3:InaccuratemodeloftheworldandtheuserWhyisthisadifficultproblem?UNCERTAINTYAsolution:PartiallyObservableMarkovDecisionProcesses(POMDPs)S3o1,o2S1o1,o2S2o1,o2a1a2ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyCause#1:Non-deterministiceThetruthaboutPOMDPsBadnews:FindinganoptimalPOMDPactionselectionpolicyiscomputationallyintractableforcomplexproblems.ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThetruthaboutPOMDPsBadnewsThetruthaboutPOMDPsBadnews:FindinganoptimalPOMDPactionselectionpolicyiscomputationallyintractableforcomplexproblems.Goodnews:Manyreal-worlddecision-makingproblemsexhibitstructureinherenttotheproblemdomain.Byleveragingstructureintheproblemdomain,IproposeanalgorithmthatmakesPOMDPstractable,evenforlargedomains.ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThetruthaboutPOMDPsBadnewsHowisitdone?Usea“Divide-and-conquer”approach:Wedecomposealargemonolithicproblemintoacollectionofloosely-relatedsmallerproblems.DialoguemanagerHealthmanagerSocialmanagerRemindingmanagerThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyHowisitdone?Usea“Divide-ThesisstatementDecision-makingunderuncertaintycanbemadetractableforcomplexproblemsbyexploitinghierarchicalstructureintheproblemdomain.ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThesisstatementDecision-makinOutlineProblemmotivationPartiallyobservableMarkovdecisionprocessesThehierarchicalPOMDPalgorithmProposedresearchThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyOutlineThesisProposal:HierarPOMDPswithinthefamilyofMarkovmodelsMarkovChainHiddenMarkovModel(HMM)MarkovDecisionProcess(MDP)PartiallyObservableMDP(POMDP)Uncertaintyinsensorinput?nonoControlproblem?yesyesThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyPOMDPswithinthefamilyofMaPOMDPparameters:Initialbelief:b0(s)=Pr(so=s)Observationprobabilities:O(s,a,o)=Pr(o|s,a)Transitionprobabilities:T(s,a,s’)=Pr(s’|s,a)Rewards:R(s,a)HMM

WhatarePOMDPs?Components: Setofstates:sS Setofactions:aA Setofobservations:oO

0.50.51MDPS2Pr(o1)=0.9Pr(o2)=0.1S1Pr(o1)=0.5Pr(o2)=0.5a1a2S3Pr(o1)=0.2Pr(o2)=0.8ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyPOMDPparameters:HMM WhatareAPOMDPexample:ThetigerproblemS1“tiger-left”Pr(o=growl-left)=0.85Pr(o=growl-right)=0.15S2“tiger-right”Pr(o=growl-left)=0.15Pr(o=growl-right)=0.85Actions={listen, open-left, open-right}RewardFunction: R(a=listen) =-1 R(a=open-right,s=tiger-left) =10 R(a=open-left,s=tiger-left) =-100ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyAPOMDPexample:ThetigerproWhatcanwedowithPOMDPs?1)Statetracking:Afteranaction,whatisthestateoftheworld,st?2)Computingapolicy:Whichaction,aj,shouldthecontrollerapplynext?Veryhard!Notsohard.bt-1??at-1otRobot:St-1stWorld:Controllayer:......??ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyWhatcanwedowithPOMDPs?VerThetigerproblem:StatetrackingS1“tiger-left”S2“tiger-right”Beliefvectorb0BeliefThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThetigerproblem:StatetrackThetigerproblem:StatetrackingS1“tiger-left”S2“tiger-right”Beliefvectorb0Beliefobs=growl-leftaction=listenThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThetigerproblem:StatetrackThetigerproblem:Statetrackingb1obs=growl-leftS1“tiger-left”S2“tiger-right”BeliefvectorBeliefb0action=listenThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThetigerproblem:StatetrackPolicyOptimizationWhichaction,aj,shouldthecontrollerapplynext?InMDPs:Policyisamappingfromstatetoaction,:siajInPOMDPs:Policyisamappingfrombelieftoaction,:bajRecursivelycalculateexpectedlong-termrewardforeachstate/belief:Findtheactionthatmaximizestheexpectedreward:ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyPolicyOptimizationWhichactioThetigerproblem:OptimalpolicyBeliefvector:open-leftopen-rightlistenS1“tiger-left”S2“tiger-right”O(jiān)ptimalpolicy:ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThetigerproblem:OptimalpolFinite-horizonPOMDPsareinworse-casedoublyexponential:Infinite-horizonundiscountedstochasticPOMDPsareEXPTIME-hard,andmaynotbedecidable(|n|).ComplexityofpolicyoptimizationThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyComplexityofpolicyoptimizatTheessenceoftheproblemHowcanwefindgoodpoliciesforcomplexPOMDPs?Isthereaprincipledwaytoprovidenear-optimalpoliciesinreasonabletime?ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyTheessenceoftheproblemThesOutlineProblemmotivationPartiallyobservableMarkovdecisionprocessesThehierarchicalPOMDPalgorithmProposedresearchThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyOutlineThesisProposal:HierarAhierarchicalapproachtoPOMDPplanningKeyIdea:Exploithierarchicalstructureintheproblemdomaintobreakaproblemintomany“related”POMDPs.Whattypeofstructure? ActionsetpartitioningActInvestigateHealthMoveNavigateCheckPulseAskWhereLeftRightForwardBackwardCheckMedssubtaskabstractactionThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyAhierarchicalapproachtoPOMAssumptionsEachPOMDPcontrollerhasasubsetofAo.EachPOMDPcontrollerhasfullstatesetS0,observationsetO0.Eachcontrollerincludesdiscriminativerewardinformation.Wearegiventheactionsetpartitioninggraph.WearegivenafullPOMDPmodeloftheproblem:{So,Ao,Oo,Mo}.ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyAssumptionsThesisProposal:HiThetigerproblem:AnactionhierarchyPinvestigate={S0,Ainvestigate,O0,Minvestigate}Ainvestigate={listen,open-right}actopen-leftinvestigateopen-rightlistenThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThetigerproblem:AnactionhOptimizingthe“investigate”controllerS1“tiger-left”S2“tiger-right”Locallyoptimalpolicy:Beliefvector:open-rightlistenThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyOptimizingthe“investigate”cThetigerproblem:AnactionhierarchyPact={S0,Aact,O0,Mact}Aact={open-left,investigate}actopen-leftinvestigateopen-rightlistenBut...R(s,a=investigate) isnotdefined!ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThetigerproblem:AnactionhModelingabstractactionsInsight:Usethelocalpolicyofcorrespondinglow-levelcontroller.Generalform:R(si,ak)=R(si,Policy(controllerk,si))Example:R(s=tiger-left,ak=investigate)=

open-right listen open-lefttiger-left 10 -1 -100

tiger-right -100 -1 10

Policy

(investigate,s=tiger-left)=open-rightThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyModelingabstractactionsInsigOptimizingthe“act”controllerS1“tiger-left”S2“tiger-right”Locallyoptimalpolicy:investigateBeliefvector:open-leftThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyOptimizingthe“act”controlleThecompletehierarchicalpolicyS1“tiger-left”S2“tiger-right”Hierarchicalpolicy:Beliefvector:open-leftopen-rightlistenThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThecompletehierarchicalpoliThecompletehierarchicalpolicyS1“tiger-left”S2“tiger-right”Hierarchicalpolicy:open-leftopen-rightlistenOptimalpolicy:Beliefvector:ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThecompletehierarchicalpoliResultsforlargersimulationdomainsThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyResultsforlargersimulationRelatedworkonhierarchicalmethodsHierarchicalHMMsFineetal.,2019HierarchicalMDPsDayan&Hinton,1993;Dietterich,2019;McGovernetal.,2019;Parr&Russell,2019;Singh,1992.Loosely-coupledMDPsBoutilieretal.,2019;Dean&Lin,2019;Meuleauetal.2019;Singh&Cohn,2019;Wang&Mahadevan,2019.FactoredstatePOMDPsBoutilieretal.,2019;Boutilier&Poole,2019;Hansen&Feng,2000.HierarchicalPOMDPsCastanon,2019;Hernandez-Gardiol&Mahadevan,2019;Theocharousetal.,2019;Wiering&Schmidhuber,2019.ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyRelatedworkonhierarchicalmOutlineProblemmotivationPartiallyobservableMarkovdecisionprocessesThehierarchicalPOMDPalgorithmProposedresearchThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyOutlineThesisProposal:HierarProposedresearch1)Algorithmicdesign2)Algorithmicanalysis3)Modellearning4)SystemdevelopmentandapplicationThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyProposedresearchThesisProposResearchblock#1:AlgorithmicdesignGoal1.1:Developing/implementinghierarchicalPOMDPalgorithm.Goal1.2:ExtendingH-POMDPforfactorizedstaterepresentation.Goal1.3:Usingstate/observationabstraction.Goal1.4:Planningforcontrollerswithnolocalrewardinformation.ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyResearchblock#1:AlgorithmicAssumption#2:“EachPOMDPcontrollerhasfullstatesetS0,andobservationsetO0.”Canwereducethenumberofstates/observations,|S|and|O|?Goal1.3:State/observationabstractionThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyAssumption#2:Goal1.3:State/Assumption#2:“EachPOMDPcontrollerhasfullstatesetS0,andobservationsetO0.”Canwereducethenumberofstates/observations,|S|and|O|?Yes!Eachcontrolleronlyneedssubsetofstate/observationfeatures.Whatisthecomputationalspeed-up?Goal1.3:State/observationabstractionNavigateLeftRightForwardBackwardInvestigateHealthCheckPulseCheckMeds POMDP recursive upper-boundTimecomplexity:ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyAssumption#2:Goal1.3:State/Goal1.4:LocalcontrollerrewardinformationAssumption#3:“Eachcontrollerincludessomeamountofdiscriminativerewardinformation.”Canwerelaxthisassumption?ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyGoal1.4:LocalcontrollerrewGoal1.4:LocalcontrollerrewardinformationAssumption#3:“Eachcontrollerincludessomeamountofdiscriminativerewardinformation.”Canwerelaxthisassumption?Possibly.Userewardshapingtoselectpolicy-invariantrewardfunction.Whatisthebenefit?H-POMDPcouldsolveproblemswithsparserewardfunctions.ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyGoal1.4:LocalcontrollerrewResearchblock#2:AlgorithmicanalysisGoal2.1:EvaluatingperformanceoftheH-POMDPalgorithm.Goal2.2:Quantifyingthelossduetothehierarchy.Goal2.3:Comparingdifferentpossibledecompositionsofaproblem.ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyResearchblock#2:AlgorithmicGoal2.1:PerformanceevaluationHowdoesthehierarchicalPOMDPalgorithmcompareto:ExactvaluefunctionmethodsSondik,1971;Monahan,1982;Littman,2019;Cassandraetal,2019.PolicysearchmethodsHansen,2019;Kearnsetal.,2019;Ng&Jordan,2000;Baxter&Bartlett,2000.ValueapproximationmethodsParr&Russell,2019;Thrun,2000.BeliefapproximationmethodsNourbakhsh,2019;Koenig&Simmons,2019;Hauskrecht,2000;Roy&Thrun,2000.Memory-basedmethodsMcCallum,2019.ConsiderproblemsfromPOMDPliteratureanddialoguemanagementdomain.ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyGoal2.1:PerformanceevaluatiGoal2.2:QuantifyingthelossThehierarchicalPOMDPplanningalgorithmprovidesanapproximately-optimalpolicy.How“near-optimal”isthepolicy?Subjecttosome(veryrestrictive)conditions:“Thevaluefunctionoftop-levelcontrollerisanupper-boundonthevalueoftheapproximation.”Canweloosentherestrictions?Tightenthebound?Findalower-bound?AtopA1......Vtop(b)Vactual(b)ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyGoal2.2:QuantifyingthelossGoal2.3:ComparingdifferentdecompositionAssumption#4: “Wearegivenanactionsetpartitioninggraph.”Whatmakesagoodhierarchicalactiondecomposition?Comparingdecompositionsisthefirststeptowardsautomaticdecomposition.ManufactureExamineInspectReplacea1a2ManufactureReplaceExamineInspecta1a2a3ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyGoal2.3:ComparingdifferentResearchblock#3:ModellearningGoal3.1:Automaticallygeneratinggoodactionhierarchies.Assumption#4:“Wearegivenanactionsetpartitioninggraph.”Canweautomaticallygenerateagoodhierarchicaldecomposition?Maybe.

ItisbeingdoneforhierarchicalMDPs.Goal3.2:Includingparameterlearning.Assumption#5:“WearegivenafullPOMDPmodeloftheproblem.”Canweintroduceparameterlearning?Yes!Maximum-likelihoodparameteroptimization(Baum-Welch)canbeusedforPOMDPs.ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyResearchblock#3:ModellearnTouchscreeninputSpeechutteranceResearchblock#4:SystemdevelopmentandapplicationGoal4.1:BuildinganextensivedialoguemanagerTouchscreenmessageSpeechutteranceDialogueManagerRemindermessageRobotsensorreadingsMotioncommandStatusinformationFacemailoperationsRobotmoduleRemindingmoduleTeleoperationmoduleUserRemote-controlcommandThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyTouchscreeninputResearchblocAnimplementedscenarioPhysiotherapyPatientroomRobothomeProblemsize:|S|=288,|A|=14,|O|=15StateFeatures:{RobotLocation,UserLocation, UserStatus,ReminderGoal, UserMotionGoal,UserSpeechGoal}Testsubjects:3elderlyresidentsinassistedlivingfacilityThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyAnimplementedscenarioPhysiotContributionsAlgorithmiccontribution:AnovelPOMDPalgorithmbasedonhierarchicalstructure.EnablesuseofPOMDPsformuchlargerproblems.Applicationcontribution:ApplicationofPOMDPstodialoguemanagementisnovel.Allowsdesignofrobustrobotbehaviouralmanagers.ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyContributionsThesisProposal:Researchschedule1)Algorithmicdesign/implementation2)Algorithmicanalysis3)Modellearning4)Systemdevelopmentandapplication5)Thesiswritingfall01spring/summer02spring/summer/fall02ongoingfall02/spring03ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyResearchscheduleThesisProposQuestions?ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyQuestions?ThesisProposal:HieAsimulatedrobotnavigationexampleDomainsize:|S|=11,|A|=6,|O|=6GetReward(t)ReadMapActNavigate(t)ReadOpenDoorGoLeftGoRightGoBackGoForward($$)($$)ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyAsimulatedrobotnavigationeAdialoguemanagementexample-AskGoWhere-GoToRoom-GoToKitchen-GoToFollow-VerifyRoom-VerifyKitchen-VerifyFollow-GreetGeneral-GreetMorning-GreetNight-RespondThanks-AskWeatherTime-SayCurrent-SayToday-SayTomorrow-StartMeds-NextMeds-ForceMeds-QuitMeds-AskCallWho-Call911-CallNurse-CallRelative-Verify911-VerifyNurse-VerifyRelative-AskHealth-OfferHelp-SayTimeActCheckHealthPhoneDoMedsCheckWeatherMoveGreetDomainsize:|S|=20,|A|=30,|O|=27ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyAdialoguemanagementexample-ActionhierarchyforimplementedscenarioActRemindAssistRestMoveContactInformBringtoPhysioCheckUserPresentDeliverUserSayWeatherVerifyRequestSayTimeRemindPhysioPublishStatusRingBellGotoRoomVerifyBringVerifyReleaseRechargeGotoHomeThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyActionhierarchyforimplementSondik’spartsmanufacturingproblemManufactureExamineInspectReplacea1a2a3ManufactureExamineInspectReplacea1a2Decomposition1:Decomposition2:+5moredecompositionsThesisProposal:HierarchicalMethodsforPlanningunderUncertaintySondik’spartsmanufacturingpManufacturingtaskresultsThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyManufacturingtaskresultsThesReminderGoal={none,medsX}CommunicationGoal={none,personX}UserHealth={good,poor,emergency}Usingstate/observationabstractionActionSet:StateSet:CommunicationGoal={none,nurse,911,relative}-AskHealth-OfferHelpCheckHealthPhoneDoMeds-AskCallWho-CallHelp-CallNurse-CallRelative-VerifyHelp-VerifyNurse-VerifyRelativePhoneThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyReminderGoal={none,medsX}UsinRelatedworkonrobotplanningandcontrolManually-scripteddialoguestrategies:Denecke&Waibel,2019;Walkeretal.,2019.Markovdecisionprocesses(MDPs)fordialoguemanagementLevinetal.,2019;Fromer,2019;Walkeretal.,2019;Goddeau&Pineau,2000;Singhetal.,2000;Walker,2000.

Robotinterface:Torrance,2019;Asohetal.,2019.ClassicalplanningFikes&Nilsson,1971;Simmons,1987;McAllester&Rosenblitt,1991;Penberthy&Weld,1992;Kushmerick,2019;Veloso&al.,2019;Smith&Weld,2019.ExecutionarchitecturesFirby,1987;Musliner,1993;Simmons,1994;Bonasso&Kortenkamp,2019;ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyRelatedworkonrobotplanningDecision-theoreticplanningmodelsThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyDecision-theoreticplanningmoThetigerproblem:ValuefunctionsolutionVbeliefopen-rightopen-leftlistenS=tiger-leftS=tiger-rightThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThetigerproblem:ValuefunctOptimizingthe“investigate”controllerVopen-rightlistenbeliefS=tiger-leftS=tiger-rightThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyOptimizingthe“investigate”cOptimizingthe“act”controllerVbeliefopen-leftinvestigateS=tiger-leftS=tiger-rightThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyOptimizingthe“act”controlleHierarchicalMethodsfor

PlanningunderUncertaintyThesisProposalJoellePineauThesisCommittee:SebastianThrun,ChairMatthewMasonAndrewMooreCraigBoutilier,U.ofTorontoHierarchicalMethodsfor

PlannIntegratingrobotsinlivingenvironmentsTherobot’srole: -Socialinteraction -Mobilemanipulation -Intelligentreminding -Remote-operation -Datacollection/monitoringThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyIntegratingrobotsinlivingeAbroadperspectiveGOAL=SelectingappropriateactionsUSER+WORLD+ROBOTACTIONSOBSERVATIONSBeliefstateSTATEThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyAbroadperspectiveGOAL=SeleCause#1:Non-deterministiceffectsofactionsCause#2:PartialandnoisysensorinformationCause#3:InaccuratemodeloftheworldandtheuserWhyisthisadifficultproblem?UNCERTAINTYThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyCause#1:Non-deterministiceCause#1:Non-deterministiceffectsofactionsCause#2:PartialandnoisysensorinformationCause#3:InaccuratemodeloftheworldandtheuserWhyisthisadifficultproblem?UNCERTAINTYAsolution:PartiallyObservableMarkovDecisionProcesses(POMDPs)S3o1,o2S1o1,o2S2o1,o2a1a2ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyCause#1:Non-deterministiceThetruthaboutPOMDPsBadnews:FindinganoptimalPOMDPactionselectionpolicyiscomputationallyintractableforcomplexproblems.ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThetruthaboutPOMDPsBadnewsThetruthaboutPOMDPsBadnews:FindinganoptimalPOMDPactionselectionpolicyiscomputationallyintractableforcomplexproblems.Goodnews:Manyreal-worlddecision-makingproblemsexhibitstructureinherenttotheproblemdomain.Byleveragingstructureintheproblemdomain,IproposeanalgorithmthatmakesPOMDPstractable,evenforlargedomains.ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThetruthaboutPOMDPsBadnewsHowisitdone?Usea“Divide-and-conquer”approach:Wedecomposealargemonolithicproblemintoacollectionofloosely-relatedsmallerproblems.DialoguemanagerHealthmanagerSocialmanagerRemindingmanagerThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyHowisitdone?Usea“Divide-ThesisstatementDecision-makingunderuncertaintycanbemadetractableforcomplexproblemsbyexploitinghierarchicalstructureintheproblemdomain.ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThesisstatementDecision-makinOutlineProblemmotivationPartiallyobservableMarkovdecisionprocessesThehierarchicalPOMDPalgorithmProposedresearchThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyOutlineThesisProposal:HierarPOMDPswithinthefamilyofMarkovmodelsMarkovChainHiddenMarkovModel(HMM)MarkovDecisionProcess(MDP)PartiallyObservableMDP(POMDP)Uncertaintyinsensorinput?nonoControlproblem?yesyesThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyPOMDPswithinthefamilyofMaPOMDPparameters:Initialbelief:b0(s)=Pr(so=s)Observationprobabilities:O(s,a,o)=Pr(o|s,a)Transitionprobabilities:T(s,a,s’)=Pr(s’|s,a)Rewards:R(s,a)HMM

WhatarePOMDPs?Components: Setofstates:sS Setofactions:aA Setofobservations:oO

0.50.51MDPS2Pr(o1)=0.9Pr(o2)=0.1S1Pr(o1)=0.5Pr(o2)=0.5a1a2S3Pr(o1)=0.2Pr(o2)=0.8ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyPOMDPparameters:HMM WhatareAPOMDPexample:ThetigerproblemS1“tiger-left”Pr(o=growl-left)=0.85Pr(o=growl-right)=0.15S2“tiger-right”Pr(o=growl-left)=0.15Pr(o=growl-right)=0.85Actions={listen, open-left, open-right}RewardFunction: R(a=listen) =-1 R(a=open-right,s=tiger-left) =10 R(a=open-left,s=tiger-left) =-100ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyAPOMDPexample:ThetigerproWhatcanwedowithPOMDPs?1)Statetracking:Afteranaction,whatisthestateoftheworld,st?2)Computingapolicy:Whichaction,aj,shouldthecontrollerapplynext?Veryhard!Notsohard.bt-1??at-1otRobot:St-1stWorld:Controllayer:......??ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyWhatcanwedowithPOMDPs?VerThetigerproblem:StatetrackingS1“tiger-left”S2“tiger-right”Beliefvectorb0BeliefThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThetigerproblem:StatetrackThetigerproblem:StatetrackingS1“tiger-left”S2“tiger-right”Beliefvectorb0Beliefobs=growl-leftaction=listenThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThetigerproblem:StatetrackThetigerproblem:Statetrackingb1obs=growl-leftS1“tiger-left”S2“tiger-right”BeliefvectorBeliefb0action=listenThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThetigerproblem:StatetrackPolicyOptimizationWhichaction,aj,shouldthecontrollerapplynext?InMDPs:Policyisamappingfromstatetoaction,:siajInPOMDPs:Policyisamappingfrombelieftoaction,:bajRecursivelycalculateexpectedlong-termrewardforeachstate/belief:Findtheactionthatmaximizestheexpectedreward:ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyPolicyOptimizationWhichactioThetigerproblem:OptimalpolicyBeliefvector:open-leftopen-rightlistenS1“tiger-left”S2“tiger-right”O(jiān)ptimalpolicy:ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThetigerproblem:OptimalpolFinite-horizonPOMDPsareinworse-casedoublyexponential:Infinite-horizonundiscountedstochasticPOMDPsareEXPTIME-hard,andmaynotbedecidable(|n|).ComplexityofpolicyoptimizationThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyComplexityofpolicyoptimizatTheessenceoftheproblemHowcanwefindgoodpoliciesforcomplexPOMDPs?Isthereaprincipledwaytoprovidenear-optimalpoliciesinreasonabletime?ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyTheessenceoftheproblemThesOutlineProblemmotivationPartiallyobservableMarkovdecisionprocessesThehierarchicalPOMDPalgorithmProposedresearchThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyOutlineThesisProposal:HierarAhierarchicalapproachtoPOMDPplanningKeyIdea:Exploithierarchicalstructureintheproblemdomaintobreakaproblemintomany“related”POMDPs.Whattypeof

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論