以人為中心的大型語言模型(LLM)研究綜述 -A Survey on Human-Centric LLMs_第1頁
以人為中心的大型語言模型(LLM)研究綜述 -A Survey on Human-Centric LLMs_第2頁
以人為中心的大型語言模型(LLM)研究綜述 -A Survey on Human-Centric LLMs_第3頁
以人為中心的大型語言模型(LLM)研究綜述 -A Survey on Human-Centric LLMs_第4頁
以人為中心的大型語言模型(LLM)研究綜述 -A Survey on Human-Centric LLMs_第5頁
已閱讀5頁,還剩71頁未讀 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)

文檔簡介

ASurveyonHuman-CentricLLMs

JINGYIWANG*,TsinghuaUniversity,China

arXiv:2411.14491v2[cs.CL]26Nov2024

NICHOLASSUKIENNIK*,TsinghuaUniversity,ChinaTONGLI,TsinghuaUniversity,China

WEIKANGSU,TsinghuaUniversity,China

QIANYUEHAO,TsinghuaUniversity,ChinaJINGBOXU,TsinghuaUniversity,China

ZIHANHUANG,TsinghuaUniversity,ChinaFENGLIXU,TsinghuaUniversity,China

YONGLI,TsinghuaUniversity,China

Therapidevolutionoflargelanguagemodels(LLMs)andtheircapacitytosimulatehumancognitionandbehaviorhasgivenrisetoLLM-basedframeworksandtoolsthatareevaluatedandappliedbasedontheirabilitytoperformtaskstraditionallyperformedbyhumans,namelythoseinvolvingcognition,decision-making,andsocialinteraction.Thissurveyprovidesacomprehensiveexaminationofsuchhuman-centricLLMcapabilities,focusingontheirperformanceinbothindividualtasks(whereanLLMactsasastand-inforasinglehuman)andcollectivetasks(wheremultipleLLMscoordinatetomimicgroupdynamics).WefirstevaluateLLMcompetenciesacrosskeyareasincludingreasoning,perception,andsocialcognition,comparingtheirabilitiestohuman-likeskills.Then,weexplorereal-worldapplicationsofLLMsinhuman-centricdomainssuchasbehavioralscience,politicalscience,andsociology,assessingtheireffectivenessinreplicatinghumanbehaviorsandinteractions.Finally,weidentifychallengesandfutureresearchdirections,suchasimprovingLLMadaptability,emotionalintelligence,andculturalsensitivity,whileaddressinginherentbiasesandenhancingframeworksforhuman-AIcollaboration.ThissurveyaimstoprovideafoundationalunderstandingofLLMsfromahuman-centricperspective,offeringinsightsintotheircurrentcapabilitiesandpotentialforfuturedevelopment.

AdditionalKeyWordsandPhrases:LargeLanguageModels,Human-CenteredComputing.

1INTRODUCTION

Aslargelanguagemodels(LLMs)

[1,

2],suchasOpenAI’sGPTfamily

[3,

4]andMeta’sLLaMA

[5,

6],continuetoevolve,theirabilitytosimulate,analyze,andinfluencehumanbehavioris

growingatanunprecedentedrate.Thesemodelscannowprocessandgeneratehuman-liketextandperformcognitivetasksatlevelscomparabletohumansinmanysituations,providingnewtoolsforunderstandinghumancognition,decision-making,andsocialdynamics.

Assuch,thissurveyaimstoprovideacomprehensiveevaluationofLLMsfromahuman-centricperspective,focusingontheirabilitytosimulate,complement,andenhancehumancognitionandbehavior,bothonanindividualandcollectivelevel.WhileLLMshavetraditionallybeenrootedincomputerscienceandengineering

[7,

8],theirincreasingsophisticationinreplicatinghuman-like

reasoning,decision-making,andsocialinteractionshasexpandedtheiruseintodomainswherehumansarethefocalpoint.Thishasallowedresearcherstoaddressquestionsthatwereoncetoointricateorabstractforcomputationalanalysis.Forexample,inpoliticalscience,LLMsareusedtoanalyzepoliticaldiscourse,detectbiases,andmodelelectionoutcomes

[9];insociology,they

assistinunderstandingsocialmediaconversations,publicsentiment,andgroupbehaviors

[10];

Authors’addresses:JingYiWang*,TsinghuaUniversity,Beijing,China,jy-w22@;NicholasSukiennik*,TsinghuaUniversity,Beijing,China,sukiennikn10@;TongLi,TsinghuaUniversity,Beijing,China,tongli@;WeikangSu,TsinghuaUniversity,Beijing,China;QianyueHao,TsinghuaUniversity,Beijing,China;JingboXu,TsinghuaUniversity,Beijing,China;ZihanHuang,TsinghuaUniversity,Beijing,China;FengliXu,TsinghuaUniversity,Beijing,China;YongLi,TsinghuaUniversity,Beijing,China,liyong07@.

J.ACM,Vol.V,No.N,Article.Publicationdate:November2024.

2Wangetal.

andinpsychology,theyhelpmodelhumancognitionanddecision-making

[11]

.LLMshavealsorevolutionizedlinguisticsbyenablinglarge-scaleanalysisoflanguage,fromsyntaxandsemanticstopragmatics

[12],andineconomics,theyallowformodelingcomplexinteractionsbetweenpolicies

andsocietaloutcomes

[13]

.

Tostructurethisinvestigation,thesurveyisdividedintotwomainsections.First,weevaluatehuman-centricLLMs,focusingontheircognitive,perceptual,social,andculturalcompetencies.ThissectionexamineshowLLMsperformtaskscommonlyassociatedwithhumancognition,suchasreasoning,perception,emotionalawareness,andsocialunderstanding.Weassesstheirstrengthsinstructuredreasoning,patternrecognition,andcreativity,whileidentifyingtheirlimitationsinareassuchasreal-timelearning,empathy,andhandlingcomplex,multi-steplogic.BybenchmarkingLLMperformanceagainsthumanstandards,wehighlightareaswhereLLMsexcelandwherefurtherimprovementsareneeded.

Second,weexploreLLMsinhuman-centricapplieddomains,whereLLMsareusedinreal-worldscenariosthattraditionallyrequirehumaninput.Thissectionisdividedintostudiesfocusingonindividualandcollectiveapplications,whereindividual-focusedstudiesinvolveanLLMperformingtaskstypicallydonebyasinglehuman,suchasdecision-making,problem-solving,orcontentcreation,andcollective-focusedstudiesexplorehowmultipleLLMscanworktogethertosimu-lategroupbehaviors,interactions,orcollaborativetasks,offeringinsightsintosocialdynamics,organizationalbehavior,andmulti-agentcoordination.Inbothcontexts,weexaminethemethodsemployedsuchasbasicprompting,multi-agentprompting,andfine-tuning,alongwiththetheoret-icalframeworksthatguidetheseapplications,includinggametheory,sociallearningtheory,andtheoryofmind,etc.

Ultimately,thissurveyseekstoprovideadetailedunderstandingofhowLLMscanbetteralignwithhumanbehaviorsandsocialcontexts,identifyingboththeirstrengthsandareasforimprovement.Figure

1

providesanoverviewofthisframework,categorizingLLMcapabilitiesintoindividualskills,suchascognition,perception,analysis,andexecutivefunctioning,andcollectiveskillslikesocialabilities,andhighlightingtheircapabilitiesinapplyingtostudiesacrossindividualdomainslikebehavioralscience,psychology,andlinguistics,andcollectivedomainsincludingpoliticalscience,economics,andsociology.Inclassifyingresearchworkswiththisframework,weofferinsightsintohowLLMscanbemademoreeffective,ethical,andrealistictoolsforresearchandpracticalapplications,whetherinindividualorcollectivehuman-centricsettings.

Themaincontributionsofthispapercanbesummarizedasfollows.

?Weprovideanin-depthevaluationofLLMcapabilitiesinhuman-centrictasks,focusingontheircognitive,perceptual,andsocialcompetencies,andcomparingtheirperformancetohuman-likereasoning,decision-making,andemotionalunderstanding.

?WeexploreLLM’scapabilitiesinhuman-centricdomains,namelyfocusingonreal-worldapplicationsinindividualandcollectivecontexts,assessingtheirabilitytoreplicatehumanbehaviorsinfieldssuchasbehavioralscience,politicalscience,economics,andsociology,bothassingle-agentmodelsandinmulti-agentsystems.

?Weidentifykeychallengesandfutureresearchdirections,includingimprovingLLMs’real-worldadaptability,emotionalintelligence,andculturalsensitivity,whileaddressingbiasesanddevelopingmoreadvancedframeworksforhuman-AIcollaboration.

Thepaperisorganizedasfollows:Section2providesanoverviewofAI-empoweredhuman-centricstudiesandLLMs,whileSection3evaluatesLLMcompetenciesacrosscognitive,perceptual,analytical,executive,andsocialskills.Section4discusseshowLLMscanbeappliedinavarietyofinterdisciplinaryscenariostobothenhanceLLMdevelopmentandassistinhuman-centeredtasks.

Section5exploresopenchallengesandoutlinesfuturedirectionsforadvancingLLMs.SectionJ.ACM,Vol.V,No.N,Article.Publicationdate:November2024.

ASurveyonHuman-CentricLLMs3

Individual

Collective

Domains

Skills

Cognition

LLM

BehavioralScience

ExecutiveFunction

PoliticalScience

Psychology

Linguistics

Perception

Sociability

Sociology

Economics

Analysis

Fig.1.OurframeworkdepictshowLLMsareevaluatedonfoundationalhuman-likeskills,dividedintoindi-vidual(e.g.,cognition,perception,analysis,executivefunctioning)andcollective(e.g.,sociability)levels,andappliedwithinvariousfieldsofstudysimilarlycategorizedasindividual(e.g.,BehavioralScience,Psychology,Linguistics)andcollective(e.g.,PoliticalScience,Economics,Sociology)domains.

6summarizeskeyinsightsandemphasizestheimportanceofinterdisciplinarycollaborationtoenhanceLLMs’understandingofhumanbehavior.

2OVERVIEW

2.1Human-CentricArtificalIntelligence

2.1.1TraditionalAIApproachesinHuman-CentricStudies.TheapplicationofAIinvarioushuman-centeredfieldshasundergonealongprogression,nowreachingapinnaclewiththeriseofgenerativemodels,withAImethodstobeingusedinvestigatevarioushumanphenomena.however,despitetheirrelativenaivetycomparedtoLLMs,thosetraditionalmethodshavenonethelessenabledresearcherstoaddresscomplexsocialphenomenathroughcomputation.

Foralmostaslongasithasbeeninvestigated,AIhasbeenusedinareasthatarehighlyim-pactfulonsociety

[14]

.SincethenresearchershaveevaluatedthemanywaysinwhichAIcouldemulatehumanbehaviorandthoughtprocession,forexampleincognition

[15],perception

[16],

andexecutivefunction

[17]

.Morerecently,though,withtheriseofthewebandsocialmedia,AI’susescomeclosertoourday-to-daylives.Forexample,inpoliticalcommunicationresearch,thedetectionofpoliticalbiasinnewsarticleshasemergedasacriticalareaofstudy,particularlygiventheincreasingpolarizationinmediaandonlinespaces.Traditionalmethodsforpredictingpoliticalideology,basedonstatisticalmodelingandnetworkanalysis,havebecomeanurgenttaskduetothevastamountofcontentproduceddaily.Forinstance,researchby

[18]employednetwork

analysistoestimateideologicalpreferencesofsocialmediausers.Moreover,techniquesliketopicmodelingandcontentanalysishavebeenwidelyusedtoidentifybiasandmisinformationinnewsarticlesusingdata-miningmethods

[19,

20],highlightingtheuseoftraditionalAItechniquesin

understandingpoliticaldiscourse.Otherworkstackledthetaskofstancedetectionusingmethods

likerecursiveneuralnetworks[21]andclusteringalgorithms[22].Furthermore,Dezfoulietal.

[23]

exploreadversarialvulnerabilitiesindecision-makingmodels,whichiscrucialwhenconsideringJ.ACM,Vol.V,No.N,Article.Publicationdate:November2024.

4Wangetal.

therobustnessoftraditionalbiasdetectionsystemsunderadversarialconditions.Furthermore,Dafoeetal.

[24]emphasizetheimportanceofsystemsdesignedtonavigatesocialenvironments,

suchaspoliticaldiscourse,usingmoreestablishedmulti-agentsystemsandgametheoryframe-works.Meanwhile,machineunderstandingofhumanpreferenceshasalsobeenusedtooptimizethelearningofrewardfunctionsinreinforcementlearning

[25],showingusthatAImethodsnot

onlyhelpusexplainhumanbehavior,butcanbenefitbyunderstandingthem,highlightingtheco-evolutionarynatureofadvancementsinbothAItechniquesandhuman-centricstudies.

Overall,thevastbodyofAI-empoweredhuman-centricstudiespointtotheburgeoningpotentialofusingmoreadvancedcomputationalmethods,suchasLLMs,tobothunderstandandbettersimulatehumanbehaviorandreasoningprocesses.LLMscanpresentnewopportunitiesinthefieldbysimulatinghumanbehaviorsinareaswherereal-worlddataisscarce,aswellasfacilitateinquiryintolawsanddynamicsofhumanbehaviorbasedonLLMreplicability.

2.1.2AParadigmShiftfromTraditionalAItoLLMs.TheriseofLLMshastransformednaturallanguageprocessing(NLP)andartificialintelligenceingeneralthroughkeybreakthroughsinmodelarchitecture,scale,andcapabilities.EarlymodelslikeWord2VecandGloVeusedwordembeddings,buttheintroductionoftheTransformerin2017

[26],withitsself-attentionmechanism,enabled

deepercontextualunderstandingandmarkedaturningpoint.OpenAI’sGPTseries,beginningin2018withGPT

[3],capitalizedonthis,culminatinginGPT-3

[27]andGPT-4

[28],whichdemon

-stratedunprecedentedcapabilitiesinreasoning,textgeneration,andmultimodaltasks.Meanwhile,Google’sPaLM2

[29]advancedmultilingualismandefficiency,andopen-sourcemodelslikeFalcon

[30]andBaidu’sERNIEBot

[31]broadenedaccessandspecialization

.ThesedevelopmentsreflectthegrowingimpactofLLMsacrossdiversedomains,frominterdisciplinaryresearchtoethicalAIapplications.

TherapidadoptionofLLMsacrossacademicdisciplineshasledtovaryingpredictionsaboutwhetherthesesystemswilleventuallymatchhumancognitiveabilities.WhilesomeexpertsforeseeAIachievinghuman-likegeneralintelligenceinthenearfuture,othersremainmorecautious,doubtingwhetherAIcanfullyreplicatethecomplex,abstractreasoningandcreativitythatdefinehumancognition

[32]

.Despitethesedifferingviewpoints,AIisalreadyasignificantforceineverydaylife,influencingdecision-makingandinformationprocessingacrossnumerousdomains.However,akeydistinctionremains:humancognitionisdrivenbyforward-thinking,theory-basedreasoning,whileAIoperatesonpatternsderivedfromvastdatasets,oftenrelyingonprobabilityandpastdata

[33].ThisdifferenceunderscoresthecomplementarynatureofhumanandAIsystems,

witheachexcellingindistinctaspectsofcognitiveprocessing.

Unlikehumanintelligence,LLMsoperatewithoutinherentgoals,values,oremotionalexperi-ences.Humancognition,drivenbysurvival,socialinteraction,andcreativity,isdeeplyconnectedtoourphysicalandsocialenvironments.EvenembodiedAI,whilecapableofinteractingwithitssurroundings,lacksthenuanced,purpose-drivenintelligencethatdefineshumanthought.Incontrast,LLMsgenerateresponsesbasedonprobabilisticmodelsderivedfromlargedatasets,with-outthelivedexperiencesthatinformhumandecision-making.ThoughLLMscansimulatecertainhuman-likebehaviors,theystillfallshortoftheembodiedunderstandinghumanspossess.

ThesedistinctionsraisecriticalquestionsaboutthelimitationsandpotentialsofAI,especiallyasweconsiderthediversecapabilitiesexploredinSection

3,whichdiscussesthecapabilitiesofLLMs

includingcognitive,perceptual,social,analytical,executive,cultural,moral,andcollaborativeskills.Section

4

delvesintohowinterdisciplinaryfields,suchaspoliticalscience,economics,sociology,behavioralscience,psychology,andlinguistics,contributetoLLMdevelopment,offeringinsights

J.ACM,Vol.V,No.N,Article.Publicationdate:November2024.

ASurveyonHuman-CentricLLMs5

intohowhumanintelligenceinformsandshapestheevolutionofartificialsystems.Thisexplo-rationemphasizestheimportanceofleveragingLLMstrengthswhilerecognizingthefundamentaldifferencesbetweenhumanandartificialcognition.

3EVALUATIONOFHUMAN-CENTRICLLMS

Toevaluatehuman-centricLLMs,weshowcaseaholisticrepresentationofLLMcompetencies,categorizedintotwodomains:individual(e.g.,cognitive,perceptual,analytical,executivefunc-tioningskills)andcollective(e.g.,socialskills),asshowninFigure

2.

ThisrepresentationincludesvariouskeyLLMskills,suchasreasoning,patternrecognition,spatialawareness,adaptability,decision-making,interpersonalcommunication,andculturalcompetency.Followingthis,Figure

3

outlinestheevaluationapproachesusedtoassessLLMs,includingbenchmarkanddatasettest-ing,human-centricevaluations,interactiveandsimulation-basedevaluations,ethicalandbiasassessments,andlastly,explainabilityandinterpretabilityevaluations.Table

1

highlightsboththestrengthsandareasforimprovementinthesedomains.Byoutliningtheseabilities,weprovideacomprehensivecomparisonofhuman-likeskills,usingbenchmarkstoassesstheirstrengthsandlimitations.Additionally,AppendixTables

2

and

3

provideacomprehensiveoverviewofkeypapers,highlightingtheircontributions,theLLMsassessed,andcomparisonstohumanperformance.Thesubsequentsectiondelvesintoeachcategory,providinganin-depthexplorationoftheskillsandbenchmarksthatdefineLLMperformanceacrossthesedomains.

cuttural

competene"

O

C入

O

Recognition

Pattern

Individual

InformationProcessing

Fig.2.OverviewofLLMCapabilitiesAcrossIndividualandCollectiveDomains.

J.ACM,Vol.V,No.N,Article.Publicationdate:November2024.

6Wangetal.

3.1CognitiveSkills

LLMsdemonstratecognitivecompetenciesthatmirrorkeyelementsofhumanintelligence,primarilythroughreasoningandlearning.WhileLLMsshowremarkableabilityinprocessingvastamountsofinformationandgeneratingcoherentresponses,theirproficiencyvarieswhenitcomestocomplexcognitivetasks.Thesemodelsshowcaseevolvedabilitiesinstructuredreasoningandgeneralizationbutencounterchallengeswhenfacedwithintricatelogicorlearningfromreal-timeinteractions.ThissectionexploresthestrengthsandlimitationsofLLMsinreasoningandlearning,highlightingtheirprogressandareasthatrequirefurtheradvancement.

3.1.1Reasoning.Logicalreasoning,acoreelementofhumancognitionandessentialfordailyfunctioning,consistsofvarioustypesofreasoning,includingdeductive,inductive,andcausalrea-soning,eachcontributingtohowweprocessinformationandmakedecisions.Deductivereasoningappliesgeneralprinciplestoobtainspecificconclusions,whileinductivereasoningdrawsgeneral-izationsfromspecificobservations

[34],andcausalreasoninghelpstounderstandcause-and-effect

relationships

[35,

36]

.

SeveralbenchmarkdatasetshavebeendevelopedtoassessthesereasoningcapabilitiesinLLMs.Fordeductivereasoning,theLogiQA2.0dataset

[37]isanotableresource,focusingonfivetypes

ofreasoning,includingcategorical,necessaryconditional,sufficientconditional,conjunctive,anddisjunctivereasoning.PrOntoQA

[38]alsoevaluatesdeductivereasoningthroughfirst-orderlogic

taskswhereLLMsderivespecificconclusionsfromlogicalpremises.Forinductivereasoning,CommonsenseQA2.0

[39]requiresgeneralizationfromeverydayfactsandcommonsenseknowl

-edge,whereastheCreakdataset

[40]furthertestsLLMs’abilitytogeneralizefromcommonsense

knowledgetoidentifyinconsistencies.Inturn,causalreasoningisassessedusingCausalBench

[41],

whichevaluatesLLMs’abilitytoreasonaboutcause-and-effectrelationshipsacrossdiversedo-mains.ContextHub

[42],ontheotherhand,servesasanotherbenchmarkfocusingonLLMs’causal

reasoninginbothabstractandcontextualizedtasks.AdditionaldatasetslikeGSM8K

[43]and

BIG-Bench-Hard

[44]arefurthermoreemployedformathematicalreasoningandevaluatingLLM

performanceacrossvariousreasoningdomains,respectively.

AnalyzingLLMperformancewiththesedatasetshasrevealedsignificantinsightsintotheirreasoningabilitiesandlimitations.Fordeductivereasoning,althoughLLMslikeGPT-3havemadeprogress,theiraccuracyremainsat68.65%intasksinvolvinglogicalinference,whichissignificantlybelowthe90%humanbenchmark

[37]

.Thisgapindicatesongoingchallengesinmasteringcomplexlogicalstructures,especiallywhenmultiplelogicalstepsorintricatereasoningprocessesarerequired.LLMslikeGPT-3.5,PaLM,andLLaMAperformwellonsimplerdeductivereasoningtasksbutstrugglewithmorecomplexscenariosthatinvolvechainingmultiplelogicalpremisestogether

[45]

.Forinductivereasoning,ontheotherhand,GPT-4showsimprovementsinruleapplicationwithupto99.5%partialaccuracy

[46],yetstruggleswithlargerproblemsandminimal

examples.EvenwithChain-of-Thought(CoT)prompting,GPT-4andDavincifacedifficultiesinrulevalidationandintegratingcomplexrules,withDavinci’saccuracydecliningto51%innuancedtasks

[47]

.Inaddition,Hanetal.

[47]evaluateGPT-3.5andGPT-4onpropertyinductiontasks,

highlightingthatwhileGPT-4morecloselyalignswithhumanreasoningpatterns,theystillstruggletofullycapturepremisenon-monotonicity,acriticalelementofhumancognitiveprocessing.

CausalreasoningremainsasignificantchallengeforLLMslikeGPT-4andDavinci,asitrequiresadeepunderstandingofcause-and-effectacrossvariouscontexts.Althoughthesemodelsshowreasonableproficiencyinmathematicalcausaltasks,theCausalBenchbenchmarkhighlightstheirstruggleswithmorecomplextext-basedandcoding-relatedcausalproblems

[41].Interpretingcausal

structuresinnarrativesorcodesnippetsoftengoesbeyondsimpledatacorrelations,demanding

robustreasoningtoavoidproducingmisleadingoutputs.EvenwhenGPT-4initiallyperformswell,J.ACM,Vol.V,No.N,Article.Publicationdate:November2024.

ASurveyonHuman-CentricLLMs7

Interactive&

Simulation-BasedEvaluations

.Single-AgentSimulations.Multi-AgentSimulations.Task-OrientedDialogues

Human-CentricEvaluations

.ExpertEvaluations.Crowdsourced

Evaluations

.Human-in-the-Loop

Testing

Ethical&BiasAssessments

.BiasDetection

.FairnessMetrics

.EthicalCompliance

Benchmark&

DatasetTesting

.Standardized

Benchmarks

.CustomBenchmarks.PerformanceMetrics

Explainability&Interpretability

.TransparencyofReasoning

.UserInterpretability

.TechnicalInterpretability

LLM

Evaluations

Fig.3.OverviewofLLMevaluations.

itsreasoningcapabilitiesfrequentlyweakenwhenfacedwithflawedorconflictingarguments,raisingconcernsaboutitsconsistencyincomplexscenarios

[48]

.

TheContextHubbenchmarkisdevelopedtoassessLLMslikeGPT-4,PaLM,andLLaMAinhandlingbothabstractandcontextualizedlogicalproblems

[42]

.ContextHubfocusesonthechallengesthesemodelsencounterwhentransitioningfromsimplelogictaskstonuanced,real-worldreasoning.Whilemodelsperformwellwithstraightforwardproblems,theyoftenstruggletogeneralizeincontext-richscenariosrequiringdeeperinterpretativeskills.AdditionaldatasetslikeGSM8Kemphasizedeductivereasoning,andBIG-Bench-Hardevaluatesmulti-stepreasoning,factualknowledge,andcommonsenseunderstanding

[43,

44]

.Together,thesebenchmarksrevealcriticalinsightsintothestrengthsandlimitationsofmodelslikeGPT-4andDavinci,pinpointingareasthatneedimprovementforhandlingcomplex,real-worldreasoningtasks.

Overall,thesebenchmarkdatasetsprovideacomprehensiveevaluationframeworkforassessingLLMs’reasoningcapabilities,revealingboththeiradvancementsandlimitations.WhileLLMshaveshownprogressinhandlingspecificreasoningtasks,theycontinuetofacesignificantchallengesinmulti-steplogic,contextualproblem-solving,andgeneralizingtheirreasoningabilitiesacrossdiversedomains.

3.1.2Learning.LLMs’learningabilityencompassestheircapacitytoadapt,generalize,andimproveperformancebasedonpre-existingtrainingdataandinteractionswithusersorenvironments.Unliketraditionallearningmodels,LLMsdonotupdatetheirparametersduringinference.Instead,theyrelyonpre-trainedknowledgetoperformfew-shotorzero-shottasks,highlightingtheirgeneralizationcapabilities.However,thiscomeswithsignificantlimitationswhenfacedwithevolving,real-worlddata.

RecenteffortshaveaimedatimprovingLLMadaptabilitythroughvariousstrategies.Forinstance,theRLwithGuidedFeedback(RLGF)framework

[49]optimizeslearningfromfeedback,showing

thatguidedstrategiescansignificantlyimprovetextgenerationindynamicconditions.Similarly,error-drivenlearningapproaches,likeLEMA(LearningfromMistAKes)

[50],allowmodelslike

GPT-4torefinereasoningbyidentifyingandcorrectingerrors.Theseapproacheshighlightthepotentialofleveragingfeedbackanderrorcorrectiontoboostadaptability,yettheystillrelyonstaticdataatinference.

J.ACM,Vol.V,No.N,Article.Publicationdate:November2024.

8Wangetal.

Analysis

Cognition

Perception

Sociability

High

accuracyininformation

retrieval

Structuredmetadata-based

queries

Highvolumeofideasin

structuredtasks

Nuancedemotionalregulation

ExecutiveFunction

Cognition

Real-world,dynamic

challengeadaptation

Entity-basedreasoning

with

structureddatasets

Abstractlogic

reasoninginstructured

contexts

Contextualcue-basedreasoning

Abstractcommon-sense

reasoning

Contextuallogical

reasoning

Contradictorytaskhandling

Multi-step

reasoningwithreal-world

application

Structured,predefinedtask

handling

Context-specificempathy

Complex

Perception

ExecutiveFunction

Dynamicplanning

Real-time

adjustments

Controlledvirtual

environ-

ments

understanding

Socialcontextnavigation

mentalstate

Sociability

Analysis

Basic

empathytasks

Falsebelief

andindirectcue

recognition

Moreoriginal,dive

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論