評估LLM在軟件工程和采購中的機會(英文版)_第1頁
評估LLM在軟件工程和采購中的機會(英文版)_第2頁
評估LLM在軟件工程和采購中的機會(英文版)_第3頁
評估LLM在軟件工程和采購中的機會(英文版)_第4頁
評估LLM在軟件工程和采購中的機會(英文版)_第5頁
已閱讀5頁,還剩18頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

Assessing

OpportunitiesforLLMs

inSoftware

EngineeringandAcquisition

Authors

StephanyBellomo

ShenZhangJamesIversJulieCohenIpekOzkaya

NOVEMBER2023

2

LARGELANGUAGEMODELS(LLMS)AREGENERATIVEARTIFICIALINTELLIGENCE(AI)MODELSthathave

beentrainedonmassivecorpusesoftextdataand

canbepromptedtogeneratenew,plausiblecontent.LLMsareseeingrapidadvances,andtheypromisetoimproveproductivityinmanyfields.OpenAI’sGPT-

41andGoogle’sLaMDA2aretheunderlyingLLMs

ofserviceslikeChatGPT3,CoPilot4,andBard5.Theseservicescanperformarangeoftasks,including

generatinghuman-liketextresponsestoquestions,summarizingartifacts,andgeneratingworkingcode.Thesemodelsandservicesarethefocusofextensiveresearcheffortsacrossindustry,government,and

academiatoimprovetheircapabilitiesandrelevance,andorganizationsinmanydomainsarerigorously

exploringtheirusetouncoverpotentialapplications.

TheideaofharnessingLLMstoenhancetheefficiencyof

softwareengineeringandacquisitionactivitiesholdsspecialallurefororganizationswithlargesoftwareoperations,suchastheDepartmentofDefense(DoD),asdoingsooffersthepromiseofsubstantialresourceoptimization.PotentialusecasesforLLMsareplentiful,butknowinghowtoassessthebenefitsandrisksassociatedwiththeiruseisnontrivial.

Notably,togainaccesstothelatestadvances,organizationsmayneedtoshareproprietarydata(e.g.,sourcecode)withserviceproviders.UnderstandingsuchimplicationsiscentraltointentionalandresponsibleuseofLLMs,especiallyfor

organizationsmanagingsensitiveinformation.

Inthisdocument,weexaminehowdecisionmakers,suchastechnicalleadsandprogrammanagers,canassessthefitnessofLLMstoaddresssoftwareengineeringandacquisition

needs

[Ozkaya2023]

.Wefirstintroduceexemplarscenariosinsoftwareengineeringandsoftwareacquisitionand

identifycommonarchetypes.Wedescribecommonconcerns

involvingtheuseofLLMsandenumeratetacticsformitigatingthoseconcerns.Usingthesecommonconcernsandtactics,

wedemonstratehowdecisionmakerscanassessthefitnessofLLMsfortheirownusecasesthroughtwoexamples.

CapabilitiesofLLMs,risksconcerningtheiruse,andour

collectiveunderstandingofemergingservicesandmodelsareevolvingrapidly

[Brundageetal.2022]

.Whilethisdocumentisnotmeanttobecomprehensiveincoveringallsoftware

engineeringandacquisitionusecases,theirconcerns,andmitigationtactics,itdemonstratesanapproachthatdecisionmakerscanusetothinkthroughtheirownLLMusecasesasthisspaceevolves.

1

/research/gpt-4

2

https://blog.google/technology/ai/lamda/

3

4

/features/copilot

5

WhatIsanLLM?

AnLLMisadeepneuralnetworkmodeltrainedonanextensivecorpusofdiversedocuments(e.g.,websitesandbooks)to

learnlanguagepatterns,grammarrules,factsandevensomereasoningabilities

[Wolfram2023]

.LLMscangenerateresponsestoinputs(“prompts”)byiterativelydeterminingthenextwordorphraseappearingafterothersbasedonthepromptand

patternsandassociationslearnedfromtheirtrainingcorpususingprobabilisticandrandomizedselection

[Whiteetal.

2023]

.ThiscapabilityallowsLLMstogeneratehuman-liketextthatcanbesurprisinglycoherentandcontextuallyrelevant,eveniftheymaynotalwaysbesemanticallycorrect.

WhileLLMscanperformcomplextasksusingtheirtrainedknowledge,theylacktrueunderstanding.Rather,theyare

sophisticatedpatternmatchingtools.Moreover,duetotheirprobabilisticreasoning,theycangenerateinaccurateresults(oftenreferredtoas“hallucinations”),suchascitationsto

non-existentreferencesormethodcallstononexistent

applicationprogramminginterfaces(APIs).WhileLLMscanperformanalysisandinferencingonnewdatatheyhave

beenpromptedwith,dataonwhichLLMshavebeentrainedcanlimittheiraccuracy.However,thetechnologyisrapidlyadvancingwithnewmodelshavingincreasingcomplexity

andparameters,andbenchmarkshavealreadyemergedforcomparingtheirperformance

[Imsys2023]

.Inaddition,LLMserviceprovidersareworkingonwaystousemorerecentdata

[D’Cruze2023]

.Despitetheselimitations,thereare

productiveusesofLLMstoday.

ChoosinganLLM

TherearealreadydozensofLLMsandservicesbuiltusingLLMs,andmoreemergeeveryday.Thesemodelsvaryinmanydimensions,fromtechnicaltocontractual,andthe

detailsofthesedifferencescanbedifficulttokeepstraight.ThefollowingdistinctionsareagoodstartingpointwhenchoosinganLLMforuse.

ModelorService.ChatGPTisachatbotbuiltonOpenAI’sGPTfamilyofLLMs

[OpenAI2023]

.Thedifferenceisimportant,

asservicesbuiltonLLMscanaddadditionalcapabilities(e.g.,

specializedchatbotfeatures,specializedtrainingbeyondthe

coreLLM,ornon-LLMfeaturesthatcanimproveresultsfromanLLM).AservicelikeChatGPTistypicallyhostedbyaserviceprovider,meaningthatitmanagesthecomputingresources

(andassociatedcosts)andthatusersaretypicallyrequired

tosendtheirprompts(andpotentiallysensitivedata)tothe

serviceprovidertousetheservice.Amodel,likeMeta’sLlama26,canbefine-tunedwithdomain-ororganization-specificdatatoimproveaccuracy,butittypicallylackstheaddedfeatures

andresourcesofacommerciallysupportedservice.

6

/llama/

3

GeneralorSpecialized.LLMsarepre-trainedonacorpus,andthecompositionofthatcorpusisasignificantfactoraffectinganLLM’sperformance.GeneralLLMsaretrainedontextsourceslikeWikipediathatareavailabletothepublic.SpecializedLLMsfinetunethosemodelsbyaddingtrainingmaterialfromspecificdomainslikehealthcareandfinance

[Zhouetal.2022;Wuet

al.2023]

.LLMslikeCodeGen7havebeenspecializedwithlargecorpusesofsourcecodeforuseinsoftwareengineering.

OpenSourceorProprietary.OpensourceLLMsprovidea

platformforresearchersanddeveloperstofreelyaccess,use,andevencontributetothemodel’sdevelopment.ProprietaryLLMsaresubjecttovaryingrestrictionsonuse,makingthemlessopentoexperimentationorpotentialdeployment.

Someproviders(e.g.,Meta)usealicensethatislargely,butnotcompletely,open

[Hull2023]

.OpenAIoffersadifferent

compromise:WhiletheGPTseriesofLLMsisnotopen

source,OpenAIdoespermitfinetuning(forafee)asameansofspecializationandlimitedexperimentationwiththeir

proprietarymodel.

ThefieldofLLMsisafast-movingspace.Moreover,theethicsandregulationssurroundingtheirusearealsoinastateofflux,associetygrappleswiththechallengesandopportunitiesthesepowerfulmodelspresent.

KeepingapprisedofthesedevelopmentsiscrucialfortakingadvantageofthepotentialofferedbyLLMs.

7

/salesforce/CodeGen

UseCases

TheabilityofLLMstogenerateplausiblecontentfortextandcodeapplicationshassparkedtheimaginationsofmany.

Arecentliteraturereviewexamines229researchpapers

writtensince2017ontheapplicationofLLMstosoftware

engineeringproblems

[Houetal.2023]

.Applicationareasspanrequirements,design,development,testing,maintenance,andmanagementactivities,withdevelopmentandtestingbeingthemostcommon.

Ourteam,whichworkswithgovernmentorganizations

daily,tookabroaderperspectiveandbrainstormedseveraldozenideasforusingLLMsincommonsoftwareengineeringandacquisitionactivities(see

Table1

forexamples).Two

importantobservationsquicklyemergedfromthisactivity.

First,mostusecasesrepresenthuman-AIpartnershipsin

whichanLLMorLLM-basedservicecouldbeusedtohelp

humans(asopposedtoreplacehumans)completetasks

morequickly.Second,decidingwhichusecaseswouldbe

mostfeasible,beneficial,oraffordableisnotatrivialdecisionforthosejustgettingstartedwithLLMs.

4

Table1:SampleAcquisitionandSoftwareEngineeringUseCases

ACQUISITIONUSECASES

SOFTWAREENGINEERINGUSECASES

A1.AnewacquisitionspecialistusesanLLMtogeneratean

overviewofrelevantfederalregulationsforanupcomingrequestforproposal(RFP)review,expectingthesummarytosavetimeinbackgroundreading.

SE1.AdeveloperusesanLLMtofindvulnerabilitiesinexistingcode,hopingthattheexercisewillcatchadditionalissuesnotalreadyfoundbystaticanalysistools.

A2.AchiefengineerusesanLLMtogenerateacomparisonofalternativesfrommultipleproposals,expectingittousethebudgetandscheduleformulasfromprevioussimilarproposalreviewsandgenerateaccurateitemizedcomparisons.

SE2.AdeveloperusesanLLMtogeneratecodethatparses

structuredinputfilesandperformsspecifiednumericalanalysisonitsinputs,expectingittogeneratecodewiththedesired

capabilities.

A3.AcontractspecialistusesanLLMtogenerateideasfora

requestforinformation(RFI)solicitationgivenasetofconcernsandvagueproblemdescription,expectingittogenerateadraftRFIthatisatleast75%alignedwiththeirneeds.

SE3.AtesterusesanLLMtocreatefunctionaltestcases,

expectingittoproduceasetoftexttestcasesfromaprovidedrequirementsdocument.

A4.ACTOusesanLLMtocreateareportsummarizingallusesofdigitalengineeringtechnologiesintheorganizationbased

oninternaldocuments,expectingitcanquicklyproduceaclearsummarythatisatleast90%correct.

SE4.AdeveloperusesanLLMtogeneratesoftware

documentationfromcodetobemaintained,expectingittosummarizeitsfunctionalityandinterface.

A5.AprogramofficeleadusesanLLMtoevaluateacontractor’scodedeliveryforcompliancewithrequireddesignpatterns,

expectingthatitwillidentifyanyinstancesinwhichthecodefailstouserequiredpatterns.

SE5.AsoftwareengineerwhoisunfamiliarwithSQLusesan

LLMtogenerateanSQLqueryfromanaturallanguage

description,expectingittogenerateacorrectquerythatcanbetestedimmediately.

A6.AprogrammanagerusesanLLMtosummarizeasetof

historicalartifactsfromthepastsixmonthsinpreparationforahigh-visibilityprogramreviewandprovidesspecificretrievalcriteria(e.g.,deliverytempo,statusofopendefects,and

schedule),expectingittogenerateanaccuratesummaryofprogramstatusthatcomplieswiththeretrievalcriteria.

SE6.AsoftwarearchitectusesanLLMtovalidatewhethercodethatisreadyfordeploymentisconsistentwiththesystem’s

architecture,expectingthatitwillreliablycatchdeviationsfromtheintendedarchitecture.

A7.AprogrammanagerusesanLLMtogenerateareviseddraftofastatementofwork,givenashortstartingdescriptionand

alistofconcerns(e.g.,cybersecurity,softwaredeliverytempo,andinteroperabilitygoals).Theprogrammanagerexpectsittogenerateastructurethatcanbequicklyrefinedandthat

includestopicsdrawnfrombestpracticestheymaynotthinktorequestexplicitly.

SE7.AdeveloperusesanLLMtotranslateseveralclassesfrom

C++toRust,expectingthatthetranslatedcodewillpassthesametestsandbemoresecureandmemorysafe.

A8.ArequirementsengineerusesanLLMtogeneratedraft

requirementsstatementsforaprogramupgradebasedonpastsimilarcapabilities,expectingthemtobeagoodstartingpoint.

SE8.AdeveloperusesanLLMtogeneratesynthetictestdataforanewfeaturebeingdeveloped,expectingthatitwillquicklygeneratesyntacticallycorrectandrepresentativedata.

A9.Acontractofficerisseekingfundingtoconductresearchonahigh-prioritytopictheyarenotfamiliarwith.ThecontractofficerusesanLLMtocreateexampleprojectdescriptionsfortheir

context,expectingittoproducereasonabledescriptions.

SE9.AdeveloperprovidesanLLMwithcodethatisfailingin

productionandadescriptionofthefailures,expectingittohelpthedeveloperdiagnosetherootcauseandproposeafix.

Archetypes

Commonalitiesamongtheusecaseslendthemselvesto

abstractingthesetintoamanageablenumberofarchetypes.Twodimensionsarehelpfulinthisregard:thenatureof

theactivityanLLMisperformingandthenatureofthedatathattheLLMisactingon.Takingthecross-productofthesedimensions,theseusecasesfallintothearchetypesdepictedin

Table2

.

Table2:UseCaseArchetypes

ACTIVITYTYPE

DATATYPE

Text

Code

Model

Images

Retrieve

Information

retrieve-text

retrieve-code

retrieve-model

retrieve-images

GenerateArtifact

generate-text

generate-code

generate-model

generate-images

ModifyArtifact

modify-text

modify-code

modify-model

modify-images

AnalyzeArtifact

analyze-text

analyze-code

analyze-model

analyze-images

5

Matchingaspecificusetoanarchetypehelpsidentify

commonconcernsamongsimilarusecasesandknownsolutionscommonlyappliedforsimilarusecases.

Archetypescanbeatoolthatorganizationsusetogroupsuccesses,gaps,andlessonslearnedinastructuredway.

ActivityTypecapturesdifferencesinassociationsthatanLLMwouldneedtomaketosupportausecase,withsomeaskinganLLMtodothingsthatalanguagemodelwasnotdesignedtodo:

?RetrieveInformationasksanLLMtoconstructaresponsetoaquestion(e.g.,what’stheObserverpattern?)forwhichaknownanswerislikelyfoundinthetrainingcorpus,directlyoracrossrelatedelements.

?GenerateArtifactasksanLLMtocreateanewartifact(e.g.,asummaryofatopicoraPythonscriptthatperformsastatisticalanalysis)thatlikelybearssimilaritywithexistingexamplesinthecorpus.

?ModifyArtifactasksanLLMtomodifyanexistingartifact

toimproveitinsomeway(e.g.,translatePythoncodetoJavaorremoveadescribedbug)thatresemblesanalogousimprovementsamongartifactsinthetrainingcorpus.

?AnalyzeArtifactasksanLLMtodrawaconclusionaboutprovidedinformation(e.g.,whatvulnerabilitiesareinthiscodeorwillthisarchitecturescaleadequately?)thatlikelyrequiressemanticreasoningaboutdata.

DataTypecapturesdifferencesinthekindofdatathatanLLMoperatesonorgenerates,suchasthedifferencesinsemanticrulesthatmakedata(e.g.,code)well-formed:

?Textinputsvarywidelyinformalityandstructure(e.g.,

informalchatversusstructuredtextcapturedintemplates).

?Codeistextwithformalrulesforstructureandsemantics,andagrowingnumberofLLMsarebeingspecializedtotakeadvantageofthisstructureandsemantics.

?Modelsareabstractions(e.g.,fromsoftwaredesignor

architecture)thatoftenusesimpleterms(e.g.,publisher)thatimplydeepsemantics.

?Imagesareusedtocommunicatemanysoftwareartifacts(e.g.,classdiagrams)andoftenemployvisualconventions

that,muchlikemodels,implyspecificsemantics.WhileLLMsoperateontext,multimodalLLMs(e.g.,GPT-4)aregrowingintheirabilitytoingestandgenerateimagedata.

Figure1

showsanexampleofusingthearchetypesto

generateideasforLLMusecasesinaparticulardomain.

Thisexamplefocusesonindependentverificationand

validation(IV&V),aresource-intensiveactivitywithintheDoDthatinvolvesmanydifferentactivitiesthatmightbenefit

fromtheuseofLLMs.MorecomplexusecasesforIV&V

couldalsobegeneratedthatinvolveintegrationofmultiplearchetypesintoalargerworkflow.

ACTIVITYTYPE

Text

Code

DATATYPE

Model

Images

RetrieveInformation

retrieve-text

retrieve-code

retrieve-model

retrieve-images

GenerateArtifact

generate-text

1

2

generate-code

4

generate-model

generate-images

6

ModifyArtifact

modify-text

modify-code

modify-model

modify-images

AnalyzeArtifact

analyze-text

3

analyze-code

5

analyze-model

analyze-images

3

AnIV&VevaluatorusesanLLMtoanalyzesoftwaredesigndocumentsagainsta

specificsetofcertificationcriteriaandto

generateacertificationreport,expectingittodescribecertificationviolationsthattheywillreviewtoconfirm.

2

AdeveloperusesanLLMtocreatea

networkviewforauthorizationtooperate(ATO)certificationfromadescriptionofthearchitecture,expectingittoproducearoughnetworkdiagramtheycanrefine.

Figure1:UsingArchetypestoHelpBrainstormPotentialUseCases

AtesterusesanLLMtocreateintegrationtestdescriptionsfromasetofAPIsand

integrationscenarios,expectingitto

produceasetoftestcasedescriptionsthatcanbeusedtoimplementtests.

AnIV&VevaluatorusesanLLMtocreateaverificationchecklistfromasetof

certificationregulationsandasystem

description,expectingittoproducea

context-sensitivechecklisttheycantailor.

AdeveloperusesanLLMtofind

vulnerabilitiesinexistingcode,hopingthattheexercisewillcatchadditionalissuesnotalreadyfoundbystaticanalysistools.

AnewdeveloperusesanLLMasapair

programmertowritecode,expectingittohelpcreatevulnerability-freecode.

4

6

5

1

6

Mistakeshave

smallconsequences

Mistakesarehardforusersto?nd

SE1A8

A4

SE8

A3

SE4

A9

SE3

SE5

A1

SE7

A5

SE6

A2

A6

SE9SE2

A7

Mistakeshave

largeconsequences

Mistakesareeasyforusersto?nd

Figure2:TwoWaystoLookatConcernswiththeGenerationofIncorrectResults(A:AcquisitionUseCases,SE:SoftwareEngineeringUseCases

[Table1

])

ConcernsandHowtoAddressThem

RecognizingconcernsaroundapplicationsofLLMsto

softwareengineeringandacquisition,anddecidinghowto

addresseach,willhelpdecisionmakersmakemoreinformedchoices.TherearemultipleperspectivesoneshouldconsiderbeforegoingforwardwithanLLMusecase.Animportant

realityisthattheresultsgeneratedbyLLMsareinfact

sometimeswrong.

Figure2

illustratesthisperspectivebasedontwoquestions:

?Howsignificantwoulditbetoactonanincorrectresultinagivenusecase?

?HoweasywoulditbeforauserintheusecasetorecognizethataresultfromanLLMisincorrect?

Thisfigureshowsanotionalplacementoftheusecasesfrom

Table1

(actualplacementwouldbereliantonrefinement

oftheseusecases).Thegreenquadrantisidealfromthis

perspective:Mistakesarenotparticularlyconsequentialandrelativelyeasytospot.UsecasesinthisquadrantcanbeagreatplacefororganizationstostartLLMexperimentation.Theredquadrant,ontheotherhand,representstheleastfavorablecasesforLLMuse:Mistakescreaterealproblemsandarehardforuserstorecognize.

Theconsequencesofmistakesandeaseofspottingthemisonlyoneperspectiveofevaluation.Anotherperspectiveis

theexpectedsignificanceofimprovementsorefficienciesachievablewithLLMs.Amongmanyconcerns,wediscussfivecategoriesinfurtherdetailinthisdocument—correctness,disclosure,usability,performance,andtrust—astheyare

relevanttoallusecases.

Correctness:Thesignificanceofcorrectnessasaconcern

dependsonfactorssuchashowtheresultswillbeused,thesafeguardsusedinworkflows,andtheexpertiseofusers.

Correctnessreferstotheoverallaccuracyandprecisionof

outputrelativetosomeknowntruthorexpectation.Accuracy

hingesgreatlyonwhetheranLLMwastrainedorfine-tunedwithdatathatissufficientlyrepresentativetosupportthe

specificusecase.Evenwithrichtrainingcorpuses,some

inaccuracycanbeexpected

[Ouyangetal.2023]

.Forexample,arecentstudyoncodetranslationfoundGPT-4toperform

betterthanotherLLMs,eventhoughmorethan80%of

translationsonapairofopensourceprojectscontainedsomeerrors.Advancesarelikelytoimprove,butnoteliminate,

thesenumbers

[Panetal.2023]

.

7

Disclosure:WhenusersinteractwithLLMs,someusecases

mayrequiredisclosingproprietaryorsensitiveinformationtoaserviceprovidertocompleteatask(e.g.,sharingsourcecodetohelpdebugit).Thedisclosureconcernisthereforerelatedtotheamountofproprietaryinformationthatmustbeexposedduringuse.Ifusersshareconfidentialdata,tradesecrets,or

personalinformation,thereisariskthatsuchdatacouldbestored,misused,oraccessedbyunauthorizedindividuals.Moreover,itmightbecomepartofthetrainingdatacorpusanddisseminatedwithoutusershavinganymeanstotrackitsorigin.Forexample,GSACIOIL-23-01(theU.S.GeneralServicesAdministrationinstructionalletterSecurityPolicy

forGenerativeArtificialIntelligence[AI]LargeLanguageModels[LLMs])bansdisclosureoffederalnonpublicinformationasinputsinpromptstothird-partyLLMendpoints

[GSA2023]

.

Usability:LLMusershavevastlydifferentbackgrounds,

expectations,andtechnicalabilities.Usabilitycaptures

theabilityofLLMuserswithdifferentexpertisetocomplete

tasks.Usersmayneedexpertiseonboththeinput(craftingappropriateprompts)andoutput(judgingthecorrectnessofresults)sidesofLLMuse

[Zamfirescu-Pereiraetal.2023]

.Thesignificanceofusabilityasaconcerndependsonthe

degreetowhichgettingtoacceptableresultsissensitivetotheexpertiseofusers.Astudycompletedwithdevelopers’earlyexperiencesusingCoPilotreflectsthatthereisashiftfromwritingcodetounderstandingcodewhenusingLLMsoncodingtasks

[Birdetal.2023]

.Thisobservationhintsattheneedfordifferentusabilitytechniquesforinteractionmechanisms,aswellastheneedtoaccountforexpertise.

Performance:WhileusinganLLMrequiresmuchless

computingpowerthantraininganLLM,responsiveness

canstillbeafactorinLLMuse,especiallyifsophisticated

promptingapproachesareincorporatedintoanLLM-

basedservice.Forthepurposesofconcernsrelatedtousecases,performanceexpressesthetimerequiredtoarriveatanappropriateresponse.Modelsize,underlyingcompute

power,andwherethemodelrunsandisaccessedfromareamongthefactorsthatinfluenceresponsiveness

[Patterson

etal.2022]

.ServicesbuiltonLLMsmayintroduceadditionalperformanceoverheadduetothewayinwhichother

capabilitiesareintegratedwiththeLLM.

Trust:Toemploythetechnologywiththerequisitelevel

oftrust,usersmustgraspthelimitationsofLLMs.Trust

reflectstheuser’sconfidenceintheoutput.Overrelianceon

anLLMwithoutunderstandingitspotentialforerroror

biascanleadtoundesirableconsequences

[Rastogietal.

2023]

.Asaresult,severalotherconcerns(e.g.,explainability,bias,privacy,security,andethics)areoftenconsideredin

relationshiptotrust

[Schwartzetal.2023]

.Forexample,theDoDpublishedethicalAIprinciplestoadvancetrustworthyAIsystems

[DoD2020]

.

Howsignificanttheseandotherconcernsareforeachuse

casewillvarybycontextanduse.Thequestionsprovided

in

Table3

canhelporganizationsassesshowrelevanteachconcernisforaspecificusecase.AstartingpointcouldbetocategorizethesignificanceofeachconcernasHigh,Medium,orLow.Thisinformationcanhelporganizationsdecide

whetheranLLMisfitforpurposeandwhatconcernsneedtobemitigatedtoavoidunacceptableoutcomes.

Table3:ExampleQuestionstoHelpDeterminetheSignificanceofCommonConcernsforaSpecificUseCase

CONCERN

SIGNIFICANCEQUESTIONS

Correctness

?Whatistheriskorimpactofusinganincorrectresultintheusecase?

?Howdifficultisitfortheexpectedusertodeterminewhetheraresultiscorrect?

?Aretheregapsinthedatausedtotrainthe

LLMthatcouldadverselyimpactresults(e.g.,thedataisnotcurrentwithrecenttechnologyreleasesorcontainslittledataforanesotericprogramminglanguage)?

Disclosure

?CananLLMbepromptedwithoutdisclosingproprietaryinformation(e.g.,usinggenericquestionsorabstractingproprietarydetails)?

?Whatistheriskorimpactofathirdpartybeingabletoobserveyourprompts?

?Arethereexistingdatadisclosureconstraintsthatstrictlyneedtobeobserved?

Usability

?HowadeptareexpectedusersatpromptinganLLM?

?Howfamiliarareexpecteduserswith

approachesfordeterminingwhetherresultsareinaccurate?

?Howfamiliarareexpecteduserswith

approachesfordeterminingwhetherresultsareincomplete?

Performance

?Howquicklymustauserormachinebeabletoactonaresult?

?Aretheresignificantcomputingresourcelimitations?

?ArethereintermediatestepsintheinteractionwiththeLLMthatmayaffectend-to-end

performance?

Trust

?Areyourexpecteduserspredisposedto

acceptgeneratedresults(automationbias)orrejectthem?

?IsthedatatheLLMwastrainedonfreeofbiasandethicalconcerns?

?HastheLLMbeentrainedondatathatisappropriateforuse?

8

Thesecommonconcerns,andquestionstodetermine

theirsignificance,enableidentificationofcommontacticsforaddressingeachconcern.Atacticisacourseofactionthatcanbetakentoreducetheoccurrenceorimpactofaconcern.

Table4

summarizesacollectionoftacticsthatcanhelpmitigateeachconcern,alongwitharoughestimate

(High[H],Medium[M],orLow[L])oftherelativepotentialcostofusingeachtactic.Typically,themoreresources

(humanandcomputation)atacticrequires,thehigherthe

cost.Forexample,promptengineeringandmodeltraining

bothaddresscorrectness,butpromptengineeringistypicallymuchlessexpensive.Ofnote,sometactics(purplerows)

focusontechnicalinterventions,others(greenrows)focusonhuman-centeredactions,andtherest(grayrows)couldemploytechnicalorhuman-centeredinterventions.

Table4:TacticsThatCanBeUsedtoAddressCommonConcernswithLLMUse

CONCERN

TACTIC

DESCRIPTION

COST

Correctness

Promptengineering

Educateusersonpromptengineeringtechniquesandpatternstogeneratebetterresults.

L

Validatemanually

Dedicatetimetoallowuserstocarefullyvalidateinterimandfinalresults.

M

Adjustsettings

Changesettingsofexposedmodelparametersliketemperature

(randomnessofthemodel’soutput)andthemaximumnumberoftokens.

L

Adoptnewermodel

Usenewermodelsthatintegratetechnicaladvancesorimprovedtrainingcorpusesthatcanproducebetterresults.

M

Finetunemodel

Tailorapretrainedmodelusingorganization-ordomain-specificdatatoimproveresults.

M

Trainnewmodel

Useacustomtrainingcorpusorproprietarydatatotrainanewmodel.

H

Disclosure

Opendisclosurepolicy

Establishapolicythatallowsuserstoshareasmuchdeta

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論