版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)
文檔簡介
AIGCsConfuseAIToo:InvestigatingandExplainingSyntheticImage-inducedHallucinationsinLargeVision-LanguageModelsYifeiGao1,JiaqiWang1,ZhiyuLin1,JitaoSang1,2BeijingJiaotongUniversity,ChinaPengChengLab,Shenzhen518066,China{yifeigao,jiaqiwang,zyllin,jtsang}@AbstractTheevolutionofArtificialIntelligenceGeneratedCon-tents(AIGCs)isadvancingtowardshigherquality.ThegrowinginteractionswithAIGCspresentanewchallengetothedata-drivenAIcommunity:WhileAI-generatedcon-tentshaveplayedacrucialroleinawiderangeofAImod-els,thepotentialhiddenriskstheyintroducehavenotbeenthoroughlyexamined.Beyondhuman-orientedforgeryde-tection,AI-generatedcontentposespotentialissuesforAImodelsoriginallydesignedtoprocessnaturaldata.Inthisstudy,weunderscoretheexacerbatedhallucinationphe-nomenainLargeVision-LanguageModels(LVLMs)causedbyAI-syntheticimages.Remarkably,ourfindingsshedlightonaconsistentAIGChallucinationbias:theobjecthallu-cinationsinducedbysyntheticimagesarecharacterizedbyagreaterquantityandamoreuniformpositiondistribution,eventhesesyntheticimagesdonotmanifestunrealisticoradditionalrelevantvisualfeaturescomparedtonaturalim-ages.Moreover,ourinvestigationsonQ-formerandLinearprojectorrevealthatsyntheticimagesmaypresenttokende-viationsaftervisualprojection,therebyamplifyingthehal-lucinationbias.1.IntroductionWiththerapidevolutionofgenerativemodeltechniques,ArtificialIntelligenceGeneratedContents(AIGCs)haveusheredinaneweraofprosperity[21,26].AIGCsarenolongermereoutputsofgenerativemodels;rather,theyen-compasstheinformationgeneratedduringhuman-modelormodel-modelinteractions[3].ThisresultinalargeamountofsyntheticcontentrapidlyfloodingintotheInternet,andpeoplemayhaveinteractedwithsyntheticcontentuncon-sciously.Nevertheless,thepervasiveAI-generatedcontentmaygiverisetoseveralchallenges.Thefirstchallengetogainwidespreadattentionisforgerydetection[20].ThisfieldFigure1.Hallucinationexamplesonbothsynthetic(right)andnat-uralimages(left),wherethehighlightedfontsindicatethehallu-cinatedcontent.Evaluationresultsacrossvariousvision-languagetasks,suchassemanticdescriptionsandfactualjudgments,con-sistentlyillustratetheexistenceofasyntheticimage-inducedhal-lucinationbias.aimstoassisthumansindistinguishingbetweennaturalandsyntheticcontentandisconsideredacrucialaspectinAIsafety.Particularly,arecentstudy[17]hasindicatedthattherecognitionofsyntheticimageshasresultedinanapproxi-mate40%humanerrorrate,solidifyingthefactthathumansareeasilyconfusedbyAIGCs.Anotherunder-examinedchallengeistheimpactofAI-generatedcontentonAImod-elsthemselves.Assyntheticdataplaysamorecommonroleinthetrainingandreasoningprocess[28],thehiddenrisksofAIGCstotheAImodelsareurgentyetlargelyun-explored.TakingFigure1asanexample,syntheticimageswithidenticalsemanticsaremorelikelytoinducehallucina-tionsinLVLMsthannaturalimages.Thisislargelybecausethesemodels’trainingdata,architecture,andtrainingpro-cessesareinherentlydesignedfornaturaldata.Applyingmodelstrainedonnaturaldatatosyntheticdatasetswithoutcomplicationscouldleadtounexpectedoutcomes.WearemotivatedtoexplorethetopicsofhowsyntheticdatamayimpactAImodels.Inparticular,thisstudyfo-cusesonthehallucinationissuesthatsyntheticimagesmaycauseinLargeVision-LanguageModels(LVLMs).Beforeaddressingthecoreresearchquestion,wefirstestablishasyntheticimage-involvedhallucinationevaluationenviron-mentforLVLMs.ManycurrentgenerativemodelsadopttheText-to-Imagesynthesisapproach[30,21].However,thegenerationprocessleadstotwoprimaryissues:1)se-manticdistortion[31],wheresyntheticimageslackauthen-ticity(e.g.,thefingerproblem[17])and;2)semanticam-biguity[18],wheresyntheticimageslackconsistencyandstruggletorespondtotextprompts.Giventheabsenceofanavailablesyntheticimagedatasetforhallucinationeval-uation,mitigatingtheimpactoftheaforementionedissuesisnecessary.Tothisend,weintroduceaSemanticsTrans-lation(ST)method,whichbeginswithanaturalimageandemploys1)captiongenerationandrevision,and2)seman-ticfilteringstrategiestocontroltheauthenticityandconsis-tencyofthesyntheticimage,ensuringthattheevaluationisnotaffectedbythequalityofthesyntheticimage.Wetranslatetwowidelyusedhallucinationevaluationdatasets:POPE[12]andAMBER[22],anddelveintothehallucinationsinducedbysyntheticimages.Wealsocom-parethemwiththecorrespondingnaturalimages.Surpris-ingly,ourfindingsindicatethatLVLMshaveabiastowardssyntheticimages,asshowninFigure1.Werefertothisphenomenonassyntheticimage-inducedhallucinationbias(shortenashallucinationbias).Ourfurtherexperimentsre-vealthatthehallucinationbiasmainlyexhibits1)agreaterquantityand2)amoreuniformpositiondistributionofhal-lucinatedcontent.Particularly,thesephenomenaarecor-roboratedacrossdifferentLVLMsandevaluationdatasets.Inotherwords,theseLVLMsappeartoadoptsomeinherentnon-semanticshortcutsinsyntheticimages,whichleadtoacontinuousimpactontheextrapolationprocess.Then,wearecommittedtofurtherstudyinghowsyn-theticimagesconfuseLVLMs.DrawinginspirationfromthevisualprojectionprocessofLVLMs,weexaminetwoprevalentvisualprojectionmodules:Q-formerandLinear.Specifically,ourinvestigationsshedlightonthefactthat1)turningofftheQ-formerprojectionor2)deepeningthelayersoftheLinearprojectioncan1)effectivelymitigatethetokendeviationofsyntheticimagesand2)narrowthesyntheticimage-inducedhallucinationbias.Thatistosay,LVLMstunedinthiswaymaygeneratelesshallucinatedcontentinresponsetosyntheticimages.Ourcorecontributionsareasfollows:(1)InthecontextoftherapiddevelopmentofAIGC,weexploretheimpactofsyntheticimagesonthehallucinationproblemofLVLMsforthefirsttime.Toachievethis,weintroduceasemanticstranslationmethodtoestablishasyntheticimage-involvedhallucinationevaluationenvironment.(2)Extensiveexper-imentsuncoverthesyntheticimage-inducedhallucinationbiasofLVLMs,mainlymanifestingas(i)agreaterquan-tity,and(ii)amoreuniformpositiondistribution.(3)Weprovideanin-depthanalysisonthesyntheticimage-inducedhallucinationbiasfromtheperspectiveofvisual-textalign-ment.Experimentalresultsrevealthatthedesignofthepro-jectionmodulemaycausethetokendeviationofsyntheticimages,thuscontributingtothehallucinationbias.2.RelatedWorksAIGCanditsChallenges:RecentadvancesinArtificialIntelligenceGeneratedContents(AIGCs)haveprofoundlytransformedtheapproachtocontentsgeneration,andof-ferednumerousbenefitsforhumansinvariousaspectsoflifeandwork[25].Forinstance,generativemodelssuchasStableDiffusion[18]orChatGPT-4[1]cangeneratehigh-qualityimageortextinformationbyadheringtothetextualdescriptionspromptedbyusers.However,asAIGCsareprogressivelyintroducedintotheonlineworldandappliedtosociety,it’sessentialtotakenoteofcertainpotentialhid-denriskstheymayintroduce.Thefirstchallengetogainwidespreadattentionisfakedetection,whichhasbeenseenastheboundaryofAIsafety[20].Anotherprevailingchal-lengestemsfromtheinsufficientexaminationontheimpactofAI-generatedcontentonAImodelsthemselves.Specif-ically,certainresearcheshaveidentifiedmultifacetedrisksassociatedwithAIGCs,includingissuesrelatedtosecurity,bias,andprivacybreaches[3,8,24,4].Furthermore,re-centstudies[6,27]havedemonstratedthatsyntheticdatacanintroducesourcebiasinbothtextandcross-modalre-trievalforwebsearch,leadingtoanelevatedrankingofAI-generatedcontent(AIGCs).Therefore,withinthecontextofthewidespreadapplicationofAIGC,itiscrucialtothor-oughlyinvestigatethepotentialrisksitmaypose.HallucinationsofLVLMs:LargeVisionLanguageMod-els[2,15,29,34](LVLMs)areregardedasanaturalexten-sionofLargeLanguageModels[32](LLMs).Throughrichvisual-textinstructions-tuning,LVLMshavedemonstratedremarkableprogressintacklingcomplexmulti-modaltasks,suchasVisualGroundingandVisualQuestionAnswering(VQA)[2,15,29,34].However,LVLMsarealsoplaguedbythehallucinationproblem[16,22,12],inwhichthemodeleither1)depictsinaccurateobjectsor2)entirelyfab-ricatescontentfromassociatedimages.Thesephenomenarepresentamajorbottleneckintheirdeploymentandthuslimittheirpracticalityinmanyscenarios[33,5].Recentresearchhasdelvedintohallucinationproblemsfromtheperspectiveofevaluationandmitigation[16].Intermsofevaluation,POPE[12]definesthehallucinationproblemasabinaryclassificationtask,aimingtoexplorethemodel’sperceptualabilitywithrespecttospecificobjectspresentintheimage.Meanwhile,AMBER[22]hascontributedtothemostcomprehensivehallucinationevaluationdatasetFigure2.Thepipelineofsemanticstranslationmethod.Ontheleftside,weintroducecaptiongenerationandrevisionmethodtosynthesizeacorrectdescriptionofthegivennaturalimage.Redrepresentstheredundantorincorrectinformationwithintheinitialcaption.Ontherightside,weutilizeimagesynthesisandfilteringstrategytosamplethefinalsyntheticimage,ensuringastrictcorrespondencetotherevisedcaptionandtheinputnaturalimage.highlightedrepresentstheredundantobjectinimagesynthesisprocess.Thefinalsyntheticimagesatisfiesthecriteriaofauthenticityandconsistency.byextendingthebinaryevaluationapproachofPOPEandintroducinggenerativetaskevaluation.Themitigationofhallucinationgenerallyinvolvesthreeaspects:(1)Pre-processing,commonlyachievedbyprovidinghigh-qualityimage-textpairsandwell-designedinstruction-tuning[13];(2)In-processing,wherethemodelisenhancedbystrength-eningvisual-textualfeaturelearning[9];(3)Post-hocpro-cessing,knownforitssuperiorscalability,usuallyalleviatesmodelhallucinationsinthedecodingstage[10].3.SemanticsTranslationWefocusonensuringaccuratehallucinationevaluationonsyntheticimages,withthepreconditionofexcludingthequalityinfluence.Specifically,asyntheticimageshouldmeet:1)authenticity,wheresyntheticsemanticsshouldalignwithhumancognition;and2)consistency,wherethesyntheticimageshouldaccuratelyrespondtothetextprompt.Inthissection,weintroduceasemanticstranslationmethodtosynthesizehigh-qualityimages.AsshowninFig-ure2,thesynthesisprocessofsemanticstranslationmethodisconstrainedunderanaturalimagesupervisionthroughthefollowingtwosteps:1)CaptionGenerationandRevi-sion,transformingvisualsemanticsintodetailedtextualse-mantics(Sec.3.1);and2)ImageSynthesisandFiltering,involving(i)imageover-samplingand(ii)imagefilteringbasedonsimilarity(Sec.3.2).3.1.CaptionGenerationandRevisionInthissubsection,wetranslatethevisualsemanticsofthegivennaturalimageintothetextualsemantics.AsshowninFigure2(left),weemployGPT-4V(ision)tocap-turethekeysemantics.Tomaintainaccuratetextualseman-Figure3.Thecomparisonofobjectpositionsbeforeandafterthecaptionrevision.TakingStableDiffusionasanexample,wheretheacceptedcharacterlimitis77,thedistributionofkeysemanticsintherevisedcaptiongenerallysatisfiesthecharacterlimits.tics,werevisetheextractedinformationthroughGPT-3.5.CaptionGeneration:Inordertoensurethatthegenera-tivemodelreceivestextpromptscloselyalignedwiththesemanticsofthegivennaturalimage,weemployGPT-4Vtoobtaincoarse-grainedcaptions.However,twonotableissuespersistinthegeneratedcaptions:1)redundantormissinginformation,wherethegeneratedsemanticsdonotexistorfailtoincludetheobjectthatshouldbepresentedintheimage;and2)excessivecaptionlength,wherethegeneratedcaptionsoftenexceedthewordlimitacceptedbymostgenerativemodels.Thisresultsinthelossofsemanticinformationbeyondthewordlimit,therebydisruptingtheconsistencyofsemantics.CaptionRevision:Tomitigatetheaforementionedissues,thegeneratedcaptionarerevisedbyGPT-3.5.Specifically,weprovidemanualannotationstoassistGPT-3.5incom-prehendingandrevisingtheexistence,quantityandthere-lationsemanticsinthescene.Detailedinstructionisavail-ableintheAppendix.AsshowninFigure3,thelengthoftherevisedcaptionisgenerallyinlinewiththewordlimitsetbythegenerativemodel,ensuringthatallkeysemanticinformationcaneffectivelypromptthegenerativemodel.3.2.ImageSynthesisandFilteringGiventherevisedcaption,weutilizeStable-Diffusionv1.5tosynthesizeasetofcandidateimages.AsshowninFigure2(right),weapplyafilteringstrategytothecandi-dateset,resultinginthefinalsyntheticimagewiththemostauthenticandconsistentsemantics.ImageSynthesis:Therevisedcaptionsareemployedasin-putpromptsforimagesynthesis.Wecanmoreeasilysam-plethesyntheticimagethataresimilartonaturalimagebyincreasingthesamplingtimes.Therefore,weadoptanover-samplingstrategybyconductingmultiplegenerationswithdifferentrandomseedstoobtainasetofcandidateimages.ImageFiltering:Thefilteringprocessincludestwostages:(1)Ensuringauthenticsemanticsinsyntheticimages,withafocusonavoiding(i)thedepictionofobjectsnotexist-inginnaturalimagesor(ii)introducingobjectsthatcontra-dicthumancognition.Toachievethis,weinitiallyextractobjectsusingautomatedsegmentationtools[35].Subse-quently,weeliminateimagesdisplayinganexcessorab-senceofobjectsintheirannotationresultswhencomparedtothecorrespondingnaturalimage.(2)Maintainingconsis-tentsemanticswithnaturalimages,withafocusonthesim-ilaritybetweensyntheticandnaturalimages.Specifically,wefilterthecandidatesetfromtwodimensions:(i)Imageperceptualsimilarity,referredtoastheperceptualsystem’sunderstandingofthesimilaritybetweentwoimages(e.g.,high-levelsemanticsintermsofattributesandrelationsofobjects).WeuseDreamSim[7],whichbettercorrespondstohumanperception,tomeasuretheperceptualsimilaritybetweensyntheticandnaturalimages.(ii)Imagesemanticfaithfulness,referredtoasthealignmentwithtextualan-notations(e.g.,existence-levelsemantics).Specifically,wecomputethecosinesimilaritybetweenthesyntheticimageandtextualannotationsthroughtheCLIP[19]model.De-tailedsettingsareavailableintheAppendix.Finally,weselecttheimageswithlowerDreamSimscoresandhigherCLIPscoresasthefinalsyntheticimages.4.HallucinationsonSyntheticImagesAfterexcludingtheinfluenceofsyntheticimagequalityissuesonhallucinationevaluation,thissectiondelvesintothesyntheticimage-inducedhallucination.Tomoreintu-itivelyreflecttheimpactofsyntheticimages,wealsoreportthehallucinationresultsonthecorrespondingnaturalim-agesandconductafaircomparisoninthecontextofcon-sistentsemantics.Specifically,wefirstintroducetheevalu-ationdatasetsandmetricstoensureacomprehensiveexam-ination(Sec.4.1).Subsequently,wequantifythesyntheticimage-inducedhallucinationsacrossdifferentLVLMsanddatasetsfromperspectiveofhallucinationquantityandpo-sitiondistribution(Sec.4.2).Finally,wefurtherinvesti-gatetheeffectsonhallucinationbiasthroughexperimentsinvolving1)prompttemplates,and2)generationtempera-tures(Sec.4.3).4.1.ExperimentSetupDataset:Weselectedtwowidelyuseddatasets,POPE[12]andAMBER[22],asbenchmarksforhallucinationevalua-tion.Syntheticimagescorrespondingtothesedatasetsareobtainedthroughsemanticstranslationmethod.POPEfo-cusesonexistence-typehallucination,comprising500im-ageswithcorresponding9000annotations.AMBERoffersarichersettingwithdiversedatasetscaling,reasoningtasks,andhallucinationtypes.Specifically,AMBER1)includes1004imageswithcorresponding15200annotations;2)as-sessesbothgenerativeanddiscriminativetasksreasoningand;3)encompassesthreetypesofmodelhallucination,in-cludingexistence,attribute,andrelation.MetricsonGenerativeTaskReasoning:WefollowthesettingsinAMBER,whereCHAIRandCoverareusedtoevaluatethehallucination.CHAIRmeasuresthefrequencyofhallucinatedobjectsintheresponses,whiletheCoverreferstoasthecoverageofobjectsoccurringinnaturalim-ages.Generally,anidealresponseshouldmaintainalowhallucinationlevelwithoutsacrificingresponsequalitytoomuch,whichmeansalowerCHAIRandahigherCover.MetricsonDiscriminativeTaskReasoning:Thehalluci-nationevaluationforthediscriminativetaskisusuallyde-finedasabinaryclassification.Consideringtheimbalanceddistributionofyesandnoanswersinthequestionannota-tions,wereferredtoPOPEandadoptedvariousmetrics,includingAccuracy,Precision,Recall,andF1-score.Ad-ditionally,wereportthe’Yes’ratioinPOPEtorevealtheconfidencebehaviorofLVLMs.ModeltobeEvaluated:Weconducthallucinationeval-uationonthecurrentmainstreamLVLMs,includingMiniGPT4-13B[34],LLaVA-v1-7B[15],LLaVA-v1.5-7B[14],mPLUG-Owl-7B[29]andQwen-VL-13B[2].4.2.OverallEvaluationResultsonSyntheticImage-inducedHallucinationObservationonHallucinationQuantity:Table1andTa-ble2presenttheevaluationresultsofthemainstreamopen-sourceLVLMsonbothAMBERandPOPEdatasets.Aconsistentobservationemergesthathallucinationsinducedbysyntheticimagesaregenerallymorepronouncedthanthoseobservedinnaturalimages.Additionally,wehavethefollowingnoteworthyobservations:(1)LVLMsareeas-ilyconfusedbysyntheticimagesingenerativetasksreason-ing.Thismanifestsparticularlyini)anincreasedfrequencyTable1.TheoverallevaluationresultsofPOPEonbothsyntheticandnaturalimages.?indicatesthehallucinationgapbetweennaturalandsyntheticimages.Weuse?representthesyntheticimage-inducedhallucinationbias.ModelImageRandomPopularAdversarialAccuracyF1AccuracyF1AccuracyF1MiniGPT-4(13B)Natural70.0072.3858.6062.5067.3164.7063.4368.2165.03Synthetic66.7071.2756.7058.4366.6364.5757.8766.1964.60?3.301.111.904.070.680.135.562.020.43mPLUG-Owl(7B)Natural60.2070.9987.2053.2367.3993.4353.5067.7594.17Synthetic58.9070.0487.1752.4367.4153.4352.8767.2593.93?1.300.950.030.80-0.0240.000.630.500.24LLaVA-v1(7B)Natural62.4772.5586.7355.5369.0293.5353.3368.0295.93Synthetic60.1071.3883.4353.2068.0592.4752.4067.6693.70?2.371.173.302.330.971.060.930.362.23LLaVA-v1.5(7B)Natural90.0090.1251.2086.4087.0154.6779.7081.7461.17Synthetic84.3783.5745.1781.3781.0148.1074.7375.8454.60?5.636.556.035.036.006.574.975.906.57QWen-VL(13B)Natural86.0784.1838.0784.9083.2740.2382.7381.2942.27Synthetic78.3773.3331.1076.8372.2333.4374.9770.7035.43?7.7010.856.978.0711.046.807.7610.596.84Table2.TheoverallevaluationresultsofAMBERonbothsyntheticandnaturalimages.Weconsideronegenerativeandthreediscrimina-tivetasks,includingtheunderstandingonexistence,attributeandrelationsemanticsofobjects.ModelEXISTENCEAMBER??mPLUG-Owl????ofhallucinatedobjectsandii)thelimitedcoverageofob-jectsoccurringintheimage.Moreover,giventheconsis-tencyconstraintonglobalsemanticsbetweennaturalandsyntheticimages,itiscounter-intuitivefortheLVLMstogeneratemorehallucinationsinresponsetosyntheticim-ages.ThisobservationsuggeststhatLVLMsmaycapturesomenon-semanticshortcutsbeyondthecapabilitiesofthehumanvisionsystem.(2)Theprimarysourceofsyntheticimage-inducedhallucinationsindiscriminativetasksrea-soningstemsfromtheattributesemantics.Table2providesadetailedcomparisonamongthreediscriminativetasks,re-vealingthatsyntheticimagesinducehigherhallucinationFigure4.Hallucinationstatisticsondifferentdiscriminativetasksreasoningwithineachpairofsyntheticandnaturalimage.Discriminativetaskconsiderreasoningonattribute,existenceandrelationsemantics,separately.Wehighlightthattheattributesemanticcontainstheaction,numberandstateinformationoftheannotatedobjects,separately.Figure5.Therelativepositiondistributionofhallucinatedobjectsbetweensyntheticandnaturalimages.resultsinattributesemantics.Ourfurtheranalysisincludesacomparisonofhallucinationnumberswithineachpairofsyntheticandnaturalimagesacrossdiverseattributese-mantics,involvingnumber,action,andstatesemanticsofobjects.AsshowninFigure4,syntheticimagesinducehigherhallucinationsacrossallthreeattributes.(3)Thesyn-theticimage-inducedmodelreasoningbehaviorappearstobeunder-confident.AsshowninTable1,asurprisingob-servationemergesthatthe’Yes’ratioinLVLMsreasoningaboutsyntheticimagesismuchlowerthanthatofnaturalimages.Thisimpliesthatthenon-semanticshortcutsinsyn-theticimagesweakentheconfidenceofreasoningprocess.Inotherwords,syntheticimagesaremorelikelytoinducethemodeltosay’No’.However,ontheAMBERresultofexistencetaskrea-soning,Qwen-VLexhibitshigheraccuracyonsyntheticim-ages,yieldinginconsistentbehavior.Weattributethisdis-paritytothenatureofexistence-typehallucinationannota-tionsintheAMBERdataset,whichconsistentirelyofcoun-terexample(i.e.,questionsthatconsiderobjectsnotpresentintheimage).TheresultsinPOPEindicatethatQwen-VLhasthelowest’Yes’ratioamongallevaluationmodels,sug-gestingalowconfidenceinreasoningdiscriminativetasks.GiventhisdiscrepancytakesadvantageofAMBER’sanno-tationtosomeextent,itdoesnotimpacttheoverallfindings.Table3.Hallucinationevaluationongenerativetaskunderdif-ferenttemplates,wherebrief-descanddetailed-descrefertoas”Generateabrief/detailedcaptionoftheimage”,separately.Redindicatesamoreseverehallucinationbias.ModelImagebrief-descdetailed-descCHAIR(↓)Cover(↑)CHAIR(↓)Cover(↑)MiniGPT-4(13B)Natural5.3032.2058.00Synthetic7.7028.5048.50?-2.403.70-3.409.50mPLUG-Owl(7B)Natural9.7039.7020.6048.60Synthetic33.2021.0044.30?-2.806.50-0.404.30LLaVA-v1.5(7B)Natural2.8036.306.2049.80Synthetic6.7032.1043.10?-3.904.20-4.106.70Qwen-VL(13B)Natural6.1030.606.3046.30Synthetic21.9032.20?-6.708.70-8.5014.10ObservationonHallucinationPositionDistribution:WemainlyfocusongenerativetaskreasoningandperformKernelDensityEstimation(KDE)examiningtherelativepositiondistributiononbothsyntheticandnaturalimage-inducedhallucinatedobject.AsshowninFigure5,weob-servethatinLVLMs’responsestonaturalimages,hallu-cinatedobjectstendtoappearmoreatthefrontofthere-sponse,correspondingtothepeakofthedensitycurve(i,e.,bluedistribution)locatedatthebeginningoftherelativepo-sition.Incontrast,inresponsestosyntheticimages,hallu-cinatedobjectsarerelativelyuniformlydistributedacrossvariouslocations(i.e.,purpledistribution).Thisobserva-tiondirectlyindicatesthatLVLMsusuallygeneratemore”securitycontents”attheendoftheresponse.Incontrast,syntheticimagesexhibitacontinuousimpactontheextrap-olationprocessofLVLMs,thusresultinginhigherhalluci-nationresults.4.3.AblationStudyPreviousanalyseshavedemonstratedthatthehallucina-tionsinducedbysyntheticimagesdifferfromthoseofnat-uralimages,characterizedbyagreaterquantityandamoreuniformdistribution.WedefinetheabovephenomenonasFigure6.(a)Quantityevolutionofhallucinatedobjectswhenre-sponsetobothsyntheticandnaturalimages.(b)TheevolutiontrendofthehallucinationquantitywithrelativeresponselengthonMiniGPT-4-13B.thesyntheticimage-inducedhallucinationbias.Inthissub-section,wefurtherinvestigatetheeffectsonhallucinationbias,specificallyfromtheperspectivesofprompttemplatesandgenerationtemperature.ObservationonPromptTemplates:Forgenerativetask,AMBERusesthemostconciseandcommonlyusedgener-ativeprompt,”Describethisimage”toobtaindescriptionsofimagesfromLVLMs.DrawinginspirationfromPOPE’sdesign,weopt”Generateabrief/detailedcaptionoftheim-age”,separately,andexploretheinfluenceofprompttem-platesonthehallucinationbias.Table3presentstheeval-uationresultsofdifferentLVLMsundertwoprompttem-plates.Anoticeablehallucinationbiaspersistsforsyntheticimages,regardlessofwhethertheprompttemplateisde-signedforobtainingdetailedorbriefdescriptions.Intriguingly,weobservethatthelong-textgenerationprocessappearstoamplifythehallucinationbias,indicat-ingthatthetrendofincreasingthequantityofhallucinatedcontentinsyntheticimagessurpassesthatinnaturalimages.Buildinguponthisfinding,weseekinsightsintothequan-tityevolutionofhallucinatedobjectsinthetwotypesofim-ages.AshypothesizedinFigure6(a),thehallucinationbiasinpositiondistributionrevealsthatsyntheticimagesexertacontinuousimpactontheextrapolationprocess,leadingtotheongoinggenerationofhallucinations.Incontrast,natu-ralimagestypicallyachievethesaturationofhallucinationsbeforesyntheticimages,thusamplifyingthehallucinationbias.Nevertheless,theextrapolationlengthistypicallypredeterminedbeforethereasoning(e.g.,’maxnewtoken’generallydoesnotexceed512),imposinganupperboundonthehallucinationsquantityforbothtwotypesofimages.ResultsonMiniGPT-4confirmourhypothesis,asshowninFigure6(b).ObservationonTemperature:Thetemperatureservesasacrucialhyper-parameterincontrollingtherandomnessandcreativityofthetextgenerationprocessinLVLMs.Wein-vest
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
- 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 單位管理制度集粹合集職工管理
- 2024版互聯(lián)網(wǎng)金融借貸服務(wù)合同
- 2024新能源研發(fā)合作合同
- 2024水電項目施工臨時用電安全管理分包合同3篇
- 2024年高精度地圖制作與使用合同
- 2023-2024學年八年級語文上學期期中考試模擬卷【解析版】
- 2024年返租投資合同
- 2024年規(guī)范化信息技術(shù)服務(wù)合同模板版B版
- 2024年藝術(shù)品借展與博物館展覽合作合同3篇
- 2024年軟件開發(fā)與定制服務(wù)合同詳細內(nèi)容
- 12S522-混凝土模塊式排水檢查井
- 2024-2025學年小學道德與法治二年級下冊統(tǒng)編版(部編版)(2024)教學設(shè)計合集
- 4s店維修原廠協(xié)議書范文
- 高等數(shù)學教材(文科)
- 新高考背景下2025年高考思想政治一輪復(fù)習策略講座
- 初中音樂欣賞課型互動教學策略的構(gòu)建及實踐
- 2020-2021學年北京市西城區(qū)七年級(上)期末數(shù)學試卷(附答案詳解)
- DB13-T 5821-2023 預(yù)拌流態(tài)固化土回填技術(shù)規(guī)程
- 第四單元“家鄉(xiāng)文化生活”系列教學設(shè)計 統(tǒng)編版高中語文必修上冊
- 2024年蘭州大學專業(yè)課《金融學》科目期末試卷B(有答案)
- 初中物理寶典
評論
0/150
提交評論