版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領
文檔簡介
Assessing
OpportunitiesforLLMs
inSoftware
EngineeringandAcquisition
Authors
StephanyBellomo
ShenZhangJamesIversJulieCohenIpekOzkaya
NOVEMBER2023
2
LARGELANGUAGEMODELS(LLMS)AREGENERATIVEARTIFICIALINTELLIGENCE(AI)MODELSthathave
beentrainedonmassivecorpusesoftextdataand
canbepromptedtogeneratenew,plausiblecontent.LLMsareseeingrapidadvances,andtheypromisetoimproveproductivityinmanyfields.OpenAI’sGPT-
41andGoogle’sLaMDA2aretheunderlyingLLMs
ofserviceslikeChatGPT3,CoPilot4,andBard5.Theseservicescanperformarangeoftasks,including
generatinghuman-liketextresponsestoquestions,summarizingartifacts,andgeneratingworkingcode.Thesemodelsandservicesarethefocusofextensiveresearcheffortsacrossindustry,government,and
academiatoimprovetheircapabilitiesandrelevance,andorganizationsinmanydomainsarerigorously
exploringtheirusetouncoverpotentialapplications.
TheideaofharnessingLLMstoenhancetheefficiencyof
softwareengineeringandacquisitionactivitiesholdsspecialallurefororganizationswithlargesoftwareoperations,suchastheDepartmentofDefense(DoD),asdoingsooffersthepromiseofsubstantialresourceoptimization.PotentialusecasesforLLMsareplentiful,butknowinghowtoassessthebenefitsandrisksassociatedwiththeiruseisnontrivial.
Notably,togainaccesstothelatestadvances,organizationsmayneedtoshareproprietarydata(e.g.,sourcecode)withserviceproviders.UnderstandingsuchimplicationsiscentraltointentionalandresponsibleuseofLLMs,especiallyfor
organizationsmanagingsensitiveinformation.
Inthisdocument,weexaminehowdecisionmakers,suchastechnicalleadsandprogrammanagers,canassessthefitnessofLLMstoaddresssoftwareengineeringandacquisition
needs
[Ozkaya2023]
.Wefirstintroduceexemplarscenariosinsoftwareengineeringandsoftwareacquisitionand
identifycommonarchetypes.Wedescribecommonconcerns
involvingtheuseofLLMsandenumeratetacticsformitigatingthoseconcerns.Usingthesecommonconcernsandtactics,
wedemonstratehowdecisionmakerscanassessthefitnessofLLMsfortheirownusecasesthroughtwoexamples.
CapabilitiesofLLMs,risksconcerningtheiruse,andour
collectiveunderstandingofemergingservicesandmodelsareevolvingrapidly
[Brundageetal.2022]
.Whilethisdocumentisnotmeanttobecomprehensiveincoveringallsoftware
engineeringandacquisitionusecases,theirconcerns,andmitigationtactics,itdemonstratesanapproachthatdecisionmakerscanusetothinkthroughtheirownLLMusecasesasthisspaceevolves.
1
/research/gpt-4
2
https://blog.google/technology/ai/lamda/
3
4
/features/copilot
5
WhatIsanLLM?
AnLLMisadeepneuralnetworkmodeltrainedonanextensivecorpusofdiversedocuments(e.g.,websitesandbooks)to
learnlanguagepatterns,grammarrules,factsandevensomereasoningabilities
[Wolfram2023]
.LLMscangenerateresponsestoinputs(“prompts”)byiterativelydeterminingthenextwordorphraseappearingafterothersbasedonthepromptand
patternsandassociationslearnedfromtheirtrainingcorpususingprobabilisticandrandomizedselection
[Whiteetal.
2023]
.ThiscapabilityallowsLLMstogeneratehuman-liketextthatcanbesurprisinglycoherentandcontextuallyrelevant,eveniftheymaynotalwaysbesemanticallycorrect.
WhileLLMscanperformcomplextasksusingtheirtrainedknowledge,theylacktrueunderstanding.Rather,theyare
sophisticatedpatternmatchingtools.Moreover,duetotheirprobabilisticreasoning,theycangenerateinaccurateresults(oftenreferredtoas“hallucinations”),suchascitationsto
non-existentreferencesormethodcallstononexistent
applicationprogramminginterfaces(APIs).WhileLLMscanperformanalysisandinferencingonnewdatatheyhave
beenpromptedwith,dataonwhichLLMshavebeentrainedcanlimittheiraccuracy.However,thetechnologyisrapidlyadvancingwithnewmodelshavingincreasingcomplexity
andparameters,andbenchmarkshavealreadyemergedforcomparingtheirperformance
[Imsys2023]
.Inaddition,LLMserviceprovidersareworkingonwaystousemorerecentdata
[D’Cruze2023]
.Despitetheselimitations,thereare
productiveusesofLLMstoday.
ChoosinganLLM
TherearealreadydozensofLLMsandservicesbuiltusingLLMs,andmoreemergeeveryday.Thesemodelsvaryinmanydimensions,fromtechnicaltocontractual,andthe
detailsofthesedifferencescanbedifficulttokeepstraight.ThefollowingdistinctionsareagoodstartingpointwhenchoosinganLLMforuse.
ModelorService.ChatGPTisachatbotbuiltonOpenAI’sGPTfamilyofLLMs
[OpenAI2023]
.Thedifferenceisimportant,
asservicesbuiltonLLMscanaddadditionalcapabilities(e.g.,
specializedchatbotfeatures,specializedtrainingbeyondthe
coreLLM,ornon-LLMfeaturesthatcanimproveresultsfromanLLM).AservicelikeChatGPTistypicallyhostedbyaserviceprovider,meaningthatitmanagesthecomputingresources
(andassociatedcosts)andthatusersaretypicallyrequired
tosendtheirprompts(andpotentiallysensitivedata)tothe
serviceprovidertousetheservice.Amodel,likeMeta’sLlama26,canbefine-tunedwithdomain-ororganization-specificdatatoimproveaccuracy,butittypicallylackstheaddedfeatures
andresourcesofacommerciallysupportedservice.
6
/llama/
3
GeneralorSpecialized.LLMsarepre-trainedonacorpus,andthecompositionofthatcorpusisasignificantfactoraffectinganLLM’sperformance.GeneralLLMsaretrainedontextsourceslikeWikipediathatareavailabletothepublic.SpecializedLLMsfinetunethosemodelsbyaddingtrainingmaterialfromspecificdomainslikehealthcareandfinance
[Zhouetal.2022;Wuet
al.2023]
.LLMslikeCodeGen7havebeenspecializedwithlargecorpusesofsourcecodeforuseinsoftwareengineering.
OpenSourceorProprietary.OpensourceLLMsprovidea
platformforresearchersanddeveloperstofreelyaccess,use,andevencontributetothemodel’sdevelopment.ProprietaryLLMsaresubjecttovaryingrestrictionsonuse,makingthemlessopentoexperimentationorpotentialdeployment.
Someproviders(e.g.,Meta)usealicensethatislargely,butnotcompletely,open
[Hull2023]
.OpenAIoffersadifferent
compromise:WhiletheGPTseriesofLLMsisnotopen
source,OpenAIdoespermitfinetuning(forafee)asameansofspecializationandlimitedexperimentationwiththeir
proprietarymodel.
ThefieldofLLMsisafast-movingspace.Moreover,theethicsandregulationssurroundingtheirusearealsoinastateofflux,associetygrappleswiththechallengesandopportunitiesthesepowerfulmodelspresent.
KeepingapprisedofthesedevelopmentsiscrucialfortakingadvantageofthepotentialofferedbyLLMs.
7
/salesforce/CodeGen
UseCases
TheabilityofLLMstogenerateplausiblecontentfortextandcodeapplicationshassparkedtheimaginationsofmany.
Arecentliteraturereviewexamines229researchpapers
writtensince2017ontheapplicationofLLMstosoftware
engineeringproblems
[Houetal.2023]
.Applicationareasspanrequirements,design,development,testing,maintenance,andmanagementactivities,withdevelopmentandtestingbeingthemostcommon.
Ourteam,whichworkswithgovernmentorganizations
daily,tookabroaderperspectiveandbrainstormedseveraldozenideasforusingLLMsincommonsoftwareengineeringandacquisitionactivities(see
Table1
forexamples).Two
importantobservationsquicklyemergedfromthisactivity.
First,mostusecasesrepresenthuman-AIpartnershipsin
whichanLLMorLLM-basedservicecouldbeusedtohelp
humans(asopposedtoreplacehumans)completetasks
morequickly.Second,decidingwhichusecaseswouldbe
mostfeasible,beneficial,oraffordableisnotatrivialdecisionforthosejustgettingstartedwithLLMs.
4
Table1:SampleAcquisitionandSoftwareEngineeringUseCases
ACQUISITIONUSECASES
SOFTWAREENGINEERINGUSECASES
A1.AnewacquisitionspecialistusesanLLMtogeneratean
overviewofrelevantfederalregulationsforanupcomingrequestforproposal(RFP)review,expectingthesummarytosavetimeinbackgroundreading.
SE1.AdeveloperusesanLLMtofindvulnerabilitiesinexistingcode,hopingthattheexercisewillcatchadditionalissuesnotalreadyfoundbystaticanalysistools.
A2.AchiefengineerusesanLLMtogenerateacomparisonofalternativesfrommultipleproposals,expectingittousethebudgetandscheduleformulasfromprevioussimilarproposalreviewsandgenerateaccurateitemizedcomparisons.
SE2.AdeveloperusesanLLMtogeneratecodethatparses
structuredinputfilesandperformsspecifiednumericalanalysisonitsinputs,expectingittogeneratecodewiththedesired
capabilities.
A3.AcontractspecialistusesanLLMtogenerateideasfora
requestforinformation(RFI)solicitationgivenasetofconcernsandvagueproblemdescription,expectingittogenerateadraftRFIthatisatleast75%alignedwiththeirneeds.
SE3.AtesterusesanLLMtocreatefunctionaltestcases,
expectingittoproduceasetoftexttestcasesfromaprovidedrequirementsdocument.
A4.ACTOusesanLLMtocreateareportsummarizingallusesofdigitalengineeringtechnologiesintheorganizationbased
oninternaldocuments,expectingitcanquicklyproduceaclearsummarythatisatleast90%correct.
SE4.AdeveloperusesanLLMtogeneratesoftware
documentationfromcodetobemaintained,expectingittosummarizeitsfunctionalityandinterface.
A5.AprogramofficeleadusesanLLMtoevaluateacontractor’scodedeliveryforcompliancewithrequireddesignpatterns,
expectingthatitwillidentifyanyinstancesinwhichthecodefailstouserequiredpatterns.
SE5.AsoftwareengineerwhoisunfamiliarwithSQLusesan
LLMtogenerateanSQLqueryfromanaturallanguage
description,expectingittogenerateacorrectquerythatcanbetestedimmediately.
A6.AprogrammanagerusesanLLMtosummarizeasetof
historicalartifactsfromthepastsixmonthsinpreparationforahigh-visibilityprogramreviewandprovidesspecificretrievalcriteria(e.g.,deliverytempo,statusofopendefects,and
schedule),expectingittogenerateanaccuratesummaryofprogramstatusthatcomplieswiththeretrievalcriteria.
SE6.AsoftwarearchitectusesanLLMtovalidatewhethercodethatisreadyfordeploymentisconsistentwiththesystem’s
architecture,expectingthatitwillreliablycatchdeviationsfromtheintendedarchitecture.
A7.AprogrammanagerusesanLLMtogenerateareviseddraftofastatementofwork,givenashortstartingdescriptionand
alistofconcerns(e.g.,cybersecurity,softwaredeliverytempo,andinteroperabilitygoals).Theprogrammanagerexpectsittogenerateastructurethatcanbequicklyrefinedandthat
includestopicsdrawnfrombestpracticestheymaynotthinktorequestexplicitly.
SE7.AdeveloperusesanLLMtotranslateseveralclassesfrom
C++toRust,expectingthatthetranslatedcodewillpassthesametestsandbemoresecureandmemorysafe.
A8.ArequirementsengineerusesanLLMtogeneratedraft
requirementsstatementsforaprogramupgradebasedonpastsimilarcapabilities,expectingthemtobeagoodstartingpoint.
SE8.AdeveloperusesanLLMtogeneratesynthetictestdataforanewfeaturebeingdeveloped,expectingthatitwillquicklygeneratesyntacticallycorrectandrepresentativedata.
A9.Acontractofficerisseekingfundingtoconductresearchonahigh-prioritytopictheyarenotfamiliarwith.ThecontractofficerusesanLLMtocreateexampleprojectdescriptionsfortheir
context,expectingittoproducereasonabledescriptions.
SE9.AdeveloperprovidesanLLMwithcodethatisfailingin
productionandadescriptionofthefailures,expectingittohelpthedeveloperdiagnosetherootcauseandproposeafix.
Archetypes
Commonalitiesamongtheusecaseslendthemselvesto
abstractingthesetintoamanageablenumberofarchetypes.Twodimensionsarehelpfulinthisregard:thenatureof
theactivityanLLMisperformingandthenatureofthedatathattheLLMisactingon.Takingthecross-productofthesedimensions,theseusecasesfallintothearchetypesdepictedin
Table2
.
Table2:UseCaseArchetypes
ACTIVITYTYPE
DATATYPE
Text
Code
Model
Images
Retrieve
Information
retrieve-text
retrieve-code
retrieve-model
retrieve-images
GenerateArtifact
generate-text
generate-code
generate-model
generate-images
ModifyArtifact
modify-text
modify-code
modify-model
modify-images
AnalyzeArtifact
analyze-text
analyze-code
analyze-model
analyze-images
5
Matchingaspecificusetoanarchetypehelpsidentify
commonconcernsamongsimilarusecasesandknownsolutionscommonlyappliedforsimilarusecases.
Archetypescanbeatoolthatorganizationsusetogroupsuccesses,gaps,andlessonslearnedinastructuredway.
ActivityTypecapturesdifferencesinassociationsthatanLLMwouldneedtomaketosupportausecase,withsomeaskinganLLMtodothingsthatalanguagemodelwasnotdesignedtodo:
?RetrieveInformationasksanLLMtoconstructaresponsetoaquestion(e.g.,what’stheObserverpattern?)forwhichaknownanswerislikelyfoundinthetrainingcorpus,directlyoracrossrelatedelements.
?GenerateArtifactasksanLLMtocreateanewartifact(e.g.,asummaryofatopicoraPythonscriptthatperformsastatisticalanalysis)thatlikelybearssimilaritywithexistingexamplesinthecorpus.
?ModifyArtifactasksanLLMtomodifyanexistingartifact
toimproveitinsomeway(e.g.,translatePythoncodetoJavaorremoveadescribedbug)thatresemblesanalogousimprovementsamongartifactsinthetrainingcorpus.
?AnalyzeArtifactasksanLLMtodrawaconclusionaboutprovidedinformation(e.g.,whatvulnerabilitiesareinthiscodeorwillthisarchitecturescaleadequately?)thatlikelyrequiressemanticreasoningaboutdata.
DataTypecapturesdifferencesinthekindofdatathatanLLMoperatesonorgenerates,suchasthedifferencesinsemanticrulesthatmakedata(e.g.,code)well-formed:
?Textinputsvarywidelyinformalityandstructure(e.g.,
informalchatversusstructuredtextcapturedintemplates).
?Codeistextwithformalrulesforstructureandsemantics,andagrowingnumberofLLMsarebeingspecializedtotakeadvantageofthisstructureandsemantics.
?Modelsareabstractions(e.g.,fromsoftwaredesignor
architecture)thatoftenusesimpleterms(e.g.,publisher)thatimplydeepsemantics.
?Imagesareusedtocommunicatemanysoftwareartifacts(e.g.,classdiagrams)andoftenemployvisualconventions
that,muchlikemodels,implyspecificsemantics.WhileLLMsoperateontext,multimodalLLMs(e.g.,GPT-4)aregrowingintheirabilitytoingestandgenerateimagedata.
Figure1
showsanexampleofusingthearchetypesto
generateideasforLLMusecasesinaparticulardomain.
Thisexamplefocusesonindependentverificationand
validation(IV&V),aresource-intensiveactivitywithintheDoDthatinvolvesmanydifferentactivitiesthatmightbenefit
fromtheuseofLLMs.MorecomplexusecasesforIV&V
couldalsobegeneratedthatinvolveintegrationofmultiplearchetypesintoalargerworkflow.
ACTIVITYTYPE
Text
Code
DATATYPE
Model
Images
RetrieveInformation
retrieve-text
retrieve-code
retrieve-model
retrieve-images
GenerateArtifact
generate-text
1
2
generate-code
4
generate-model
generate-images
6
ModifyArtifact
modify-text
modify-code
modify-model
modify-images
AnalyzeArtifact
analyze-text
3
analyze-code
5
analyze-model
analyze-images
3
AnIV&VevaluatorusesanLLMtoanalyzesoftwaredesigndocumentsagainsta
specificsetofcertificationcriteriaandto
generateacertificationreport,expectingittodescribecertificationviolationsthattheywillreviewtoconfirm.
2
AdeveloperusesanLLMtocreatea
networkviewforauthorizationtooperate(ATO)certificationfromadescriptionofthearchitecture,expectingittoproducearoughnetworkdiagramtheycanrefine.
Figure1:UsingArchetypestoHelpBrainstormPotentialUseCases
AtesterusesanLLMtocreateintegrationtestdescriptionsfromasetofAPIsand
integrationscenarios,expectingitto
produceasetoftestcasedescriptionsthatcanbeusedtoimplementtests.
AnIV&VevaluatorusesanLLMtocreateaverificationchecklistfromasetof
certificationregulationsandasystem
description,expectingittoproducea
context-sensitivechecklisttheycantailor.
AdeveloperusesanLLMtofind
vulnerabilitiesinexistingcode,hopingthattheexercisewillcatchadditionalissuesnotalreadyfoundbystaticanalysistools.
AnewdeveloperusesanLLMasapair
programmertowritecode,expectingittohelpcreatevulnerability-freecode.
4
6
5
1
6
Mistakeshave
smallconsequences
Mistakesarehardforusersto?nd
SE1A8
A4
SE8
A3
SE4
A9
SE3
SE5
A1
SE7
A5
SE6
A2
A6
SE9SE2
A7
Mistakeshave
largeconsequences
Mistakesareeasyforusersto?nd
Figure2:TwoWaystoLookatConcernswiththeGenerationofIncorrectResults(A:AcquisitionUseCases,SE:SoftwareEngineeringUseCases
[Table1
])
ConcernsandHowtoAddressThem
RecognizingconcernsaroundapplicationsofLLMsto
softwareengineeringandacquisition,anddecidinghowto
addresseach,willhelpdecisionmakersmakemoreinformedchoices.TherearemultipleperspectivesoneshouldconsiderbeforegoingforwardwithanLLMusecase.Animportant
realityisthattheresultsgeneratedbyLLMsareinfact
sometimeswrong.
Figure2
illustratesthisperspectivebasedontwoquestions:
?Howsignificantwoulditbetoactonanincorrectresultinagivenusecase?
?HoweasywoulditbeforauserintheusecasetorecognizethataresultfromanLLMisincorrect?
Thisfigureshowsanotionalplacementoftheusecasesfrom
Table1
(actualplacementwouldbereliantonrefinement
oftheseusecases).Thegreenquadrantisidealfromthis
perspective:Mistakesarenotparticularlyconsequentialandrelativelyeasytospot.UsecasesinthisquadrantcanbeagreatplacefororganizationstostartLLMexperimentation.Theredquadrant,ontheotherhand,representstheleastfavorablecasesforLLMuse:Mistakescreaterealproblemsandarehardforuserstorecognize.
Theconsequencesofmistakesandeaseofspottingthemisonlyoneperspectiveofevaluation.Anotherperspectiveis
theexpectedsignificanceofimprovementsorefficienciesachievablewithLLMs.Amongmanyconcerns,wediscussfivecategoriesinfurtherdetailinthisdocument—correctness,disclosure,usability,performance,andtrust—astheyare
relevanttoallusecases.
Correctness:Thesignificanceofcorrectnessasaconcern
dependsonfactorssuchashowtheresultswillbeused,thesafeguardsusedinworkflows,andtheexpertiseofusers.
Correctnessreferstotheoverallaccuracyandprecisionof
outputrelativetosomeknowntruthorexpectation.Accuracy
hingesgreatlyonwhetheranLLMwastrainedorfine-tunedwithdatathatissufficientlyrepresentativetosupportthe
specificusecase.Evenwithrichtrainingcorpuses,some
inaccuracycanbeexpected
[Ouyangetal.2023]
.Forexample,arecentstudyoncodetranslationfoundGPT-4toperform
betterthanotherLLMs,eventhoughmorethan80%of
translationsonapairofopensourceprojectscontainedsomeerrors.Advancesarelikelytoimprove,butnoteliminate,
thesenumbers
[Panetal.2023]
.
7
Disclosure:WhenusersinteractwithLLMs,someusecases
mayrequiredisclosingproprietaryorsensitiveinformationtoaserviceprovidertocompleteatask(e.g.,sharingsourcecodetohelpdebugit).Thedisclosureconcernisthereforerelatedtotheamountofproprietaryinformationthatmustbeexposedduringuse.Ifusersshareconfidentialdata,tradesecrets,or
personalinformation,thereisariskthatsuchdatacouldbestored,misused,oraccessedbyunauthorizedindividuals.Moreover,itmightbecomepartofthetrainingdatacorpusanddisseminatedwithoutusershavinganymeanstotrackitsorigin.Forexample,GSACIOIL-23-01(theU.S.GeneralServicesAdministrationinstructionalletterSecurityPolicy
forGenerativeArtificialIntelligence[AI]LargeLanguageModels[LLMs])bansdisclosureoffederalnonpublicinformationasinputsinpromptstothird-partyLLMendpoints
[GSA2023]
.
Usability:LLMusershavevastlydifferentbackgrounds,
expectations,andtechnicalabilities.Usabilitycaptures
theabilityofLLMuserswithdifferentexpertisetocomplete
tasks.Usersmayneedexpertiseonboththeinput(craftingappropriateprompts)andoutput(judgingthecorrectnessofresults)sidesofLLMuse
[Zamfirescu-Pereiraetal.2023]
.Thesignificanceofusabilityasaconcerndependsonthe
degreetowhichgettingtoacceptableresultsissensitivetotheexpertiseofusers.Astudycompletedwithdevelopers’earlyexperiencesusingCoPilotreflectsthatthereisashiftfromwritingcodetounderstandingcodewhenusingLLMsoncodingtasks
[Birdetal.2023]
.Thisobservationhintsattheneedfordifferentusabilitytechniquesforinteractionmechanisms,aswellastheneedtoaccountforexpertise.
Performance:WhileusinganLLMrequiresmuchless
computingpowerthantraininganLLM,responsiveness
canstillbeafactorinLLMuse,especiallyifsophisticated
promptingapproachesareincorporatedintoanLLM-
basedservice.Forthepurposesofconcernsrelatedtousecases,performanceexpressesthetimerequiredtoarriveatanappropriateresponse.Modelsize,underlyingcompute
power,andwherethemodelrunsandisaccessedfromareamongthefactorsthatinfluenceresponsiveness
[Patterson
etal.2022]
.ServicesbuiltonLLMsmayintroduceadditionalperformanceoverheadduetothewayinwhichother
capabilitiesareintegratedwiththeLLM.
Trust:Toemploythetechnologywiththerequisitelevel
oftrust,usersmustgraspthelimitationsofLLMs.Trust
reflectstheuser’sconfidenceintheoutput.Overrelianceon
anLLMwithoutunderstandingitspotentialforerroror
biascanleadtoundesirableconsequences
[Rastogietal.
2023]
.Asaresult,severalotherconcerns(e.g.,explainability,bias,privacy,security,andethics)areoftenconsideredin
relationshiptotrust
[Schwartzetal.2023]
.Forexample,theDoDpublishedethicalAIprinciplestoadvancetrustworthyAIsystems
[DoD2020]
.
Howsignificanttheseandotherconcernsareforeachuse
casewillvarybycontextanduse.Thequestionsprovided
in
Table3
canhelporganizationsassesshowrelevanteachconcernisforaspecificusecase.AstartingpointcouldbetocategorizethesignificanceofeachconcernasHigh,Medium,orLow.Thisinformationcanhelporganizationsdecide
whetheranLLMisfitforpurposeandwhatconcernsneedtobemitigatedtoavoidunacceptableoutcomes.
Table3:ExampleQuestionstoHelpDeterminetheSignificanceofCommonConcernsforaSpecificUseCase
CONCERN
SIGNIFICANCEQUESTIONS
Correctness
?Whatistheriskorimpactofusinganincorrectresultintheusecase?
?Howdifficultisitfortheexpectedusertodeterminewhetheraresultiscorrect?
?Aretheregapsinthedatausedtotrainthe
LLMthatcouldadverselyimpactresults(e.g.,thedataisnotcurrentwithrecenttechnologyreleasesorcontainslittledataforanesotericprogramminglanguage)?
Disclosure
?CananLLMbepromptedwithoutdisclosingproprietaryinformation(e.g.,usinggenericquestionsorabstractingproprietarydetails)?
?Whatistheriskorimpactofathirdpartybeingabletoobserveyourprompts?
?Arethereexistingdatadisclosureconstraintsthatstrictlyneedtobeobserved?
Usability
?HowadeptareexpectedusersatpromptinganLLM?
?Howfamiliarareexpecteduserswith
approachesfordeterminingwhetherresultsareinaccurate?
?Howfamiliarareexpecteduserswith
approachesfordeterminingwhetherresultsareincomplete?
Performance
?Howquicklymustauserormachinebeabletoactonaresult?
?Aretheresignificantcomputingresourcelimitations?
?ArethereintermediatestepsintheinteractionwiththeLLMthatmayaffectend-to-end
performance?
Trust
?Areyourexpecteduserspredisposedto
acceptgeneratedresults(automationbias)orrejectthem?
?IsthedatatheLLMwastrainedonfreeofbiasandethicalconcerns?
?HastheLLMbeentrainedondatathatisappropriateforuse?
8
Thesecommonconcerns,andquestionstodetermine
theirsignificance,enableidentificationofcommontacticsforaddressingeachconcern.Atacticisacourseofactionthatcanbetakentoreducetheoccurrenceorimpactofaconcern.
Table4
summarizesacollectionoftacticsthatcanhelpmitigateeachconcern,alongwitharoughestimate
(High[H],Medium[M],orLow[L])oftherelativepotentialcostofusingeachtactic.Typically,themoreresources
(humanandcomputation)atacticrequires,thehigherthe
cost.Forexample,promptengineeringandmodeltraining
bothaddresscorrectness,butpromptengineeringistypicallymuchlessexpensive.Ofnote,sometactics(purplerows)
focusontechnicalinterventions,others(greenrows)focusonhuman-centeredactions,andtherest(grayrows)couldemploytechnicalorhuman-centeredinterventions.
Table4:TacticsThatCanBeUsedtoAddressCommonConcernswithLLMUse
CONCERN
TACTIC
DESCRIPTION
COST
Correctness
Promptengineering
Educateusersonpromptengineeringtechniquesandpatternstogeneratebetterresults.
L
Validatemanually
Dedicatetimetoallowuserstocarefullyvalidateinterimandfinalresults.
M
Adjustsettings
Changesettingsofexposedmodelparametersliketemperature
(randomnessofthemodel’soutput)andthemaximumnumberoftokens.
L
Adoptnewermodel
Usenewermodelsthatintegratetechnicaladvancesorimprovedtrainingcorpusesthatcanproducebetterresults.
M
Finetunemodel
Tailorapretrainedmodelusingorganization-ordomain-specificdatatoimproveresults.
M
Trainnewmodel
Useacustomtrainingcorpusorproprietarydatatotrainanewmodel.
H
Disclosure
Opendisclosurepolicy
Establishapolicythatallowsuserstoshareasmuchdeta
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
- 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
- 6. 下載文件中如有侵權或不適當內容,請與我們聯系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 教科版四年級下冊科學科學教案+教材分析
- 2018年Q1中國移動互聯網行業(yè)發(fā)展分析報告
- 能源有限公司企業(yè)標準:基本建設管理標準
- 修繕協(xié)議書(2篇)
- 公共關系咨詢合同(2篇)
- 酒店租賃協(xié)議
- 房屋場地租賃合同書模板
- 博覽會總贊助協(xié)議書
- 土方車租賃合同
- 2025年金屬壓力及大型容器合作協(xié)議書
- GB/T 16823.3-2010緊固件扭矩-夾緊力試驗
- GB/T 1446-2005纖維增強塑料性能試驗方法總則
- 透水混凝土工程檢驗批質量驗收記錄表
- 2023年中荊投資控股集團有限公司招聘筆試模擬試題及答案解析
- DPP-4抑制劑的臨床應用及優(yōu)勢解析課件
- 《起重吊裝方案編制》課件
- 光伏扶貧項目可行性研究報告
- 鈑金沖壓件質量要求
- 2022年高考全國甲卷語文試題評講課件55張
- 欠條(標準模版)
- 深圳京基·KKmall市場考察報告(45頁
評論
0/150
提交評論