版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領
文檔簡介
LIFE
FLIAISafetyIndex2024
IndependentexpertsevaluatesafetypracticesofleadingAIcompaniesacrosscriticaldomains.
11thDecember2024
Availableonlineat:
/index
Contactus:policy@
FUTUREOFLIFEINSTITUTE
1
Contents
Introduction2
Scorecard2
KeyFindings2
IndependentReviewPanel3
IndexDesign4
EvidenceBase5
GradingProcess7
Results7
Conclusions11
AppendixA-GradingSheets12
AppendixB-CompanySurvey42
AppendixC-CompanyResponses64
AbouttheOrganization:TheFutureofLifeInstitute(FLI)isanindependentnonprofitorganizationwiththegoalofreducinglarge-scalerisksandsteeringtransformativetechnologiestobenefithumanity,withaparticularfocusonartificialintelligence(AI).
Learnmore
at.
FUTUREOFLIFEINSTITUTE
2
Introduction
RapidlyimprovingAIcapabilitieshaveincreasedinterestinhowcompaniesreport,assessandattempttomitigateassociatedrisks.TheFutureofLifeInstitute(FLI)thereforefacilitatedtheAISafetyIndex,atooldesignedtoevaluateandcomparesafetypracticesamongleadingAIcompanies.AttheheartoftheIndexis
anindependentreviewpanel,includingsomeoftheworld’sforemostAIexperts.Reviewersweretaskedwith
gradingcompanies’safetypoliciesonthebasisofacomprehensiveevidencebasecollectedbyFLI.TheindexaimstoincentivizeresponsibleAIdevelopmentbypromotingtransparency,highlightingcommendableefforts,andidentifyingareasofconcern.
Scorecard
Firm
OverallGrade
Score
Risk
Assessment
CurrentHarms
Safety
Frameworks
Existential
SafetyStrategy
Governance&Accountability
Transparency&Communication
Anthropic
C
2.13
C+
B-
D+
D+
C+
D+
DeepMind
D+
1.55
C
C+
D-
D
D+
D
OpenAI
D+
1.32
C
D+
D-
D-
D+
D-
ZhipuAI
D
1.11
D+
D+
F
F
D
C
x.AI
D-
0.75
F
D
F
F
F
C
Meta
F
0.65
D+
D
F
F
D-
F
Grading:Usesthe
USGPAsystem
forgradeboundaries:A+,A,A-,B+,[...],Flettervaluescorrespondingtonumericalvalues4.3,4.0,3.7,3.3,[...],0.
KeyFindings
?Largeriskmanagementdisparities:Whilesomecompanieshaveestablishedinitialsafetyframeworksorconductedsomeseriousriskassessmentefforts,othershaveyettotakeeventhemostbasicprecautions.
?Jailbreaks:Alltheflagshipmodelswerefoundtobevulnerabletoadversarialattacks.
?Control-Problem:Despitetheirexplicitambitionstodevelopartificialgeneralintelligence(AGI),capableofrivalingorexceedinghumanintelligence,thereviewpaneldeemedthecurrentstrategiesofallcompaniesinadequateforensuringthatthesesystemsremainsafeandunderhumancontrol.
?Externaloversight:Reviewersconsistentlyhighlightedhowcompanieswereunabletoresistprofit-drivenincentivestocutcornersonsafetyintheabsenceofindependentoversight.WhileAnthropic'scurrentandOpenAI’sinitialgovernancestructureswerehighlightedaspromising,expertscalledforthird-partyvalidationofriskassessmentandsafetyframeworkcomplianceacrossallcompanies.
FUTUREOFLIFEINSTITUTE
3
IndependentReviewPanel
The2024AISafetyIndexwasgradedbyanindependentpanelofworld-renownedAIexpertsinvitedbyFLI’spresident,MITProfessorMaxTegmark.Thepanelwascarefullyselectedtoensureimpartialityandadiverserangeofexpertise,coveringbothtechnicalandgovernanceaspectsofAI.Panelselectionprioritizeddistinguishedacademicsandleadersfromthenon-profitsectortominimizepotentialconflictsofinterest.
AtoosaKasirzadeh
AtoosaKasirzadehisaphilosopherandAIresearcher,servingasanAssistantProfessoratCarnegieMellonUniversity.Previously,shewasavisitingfacultyresearcheratGoogle,aChancellor’sFellowandDirectorofResearchattheCentreforTechnomoralFuturesattheUniversityofEdinburgh,aResearchLeadattheAlanTuringInstitute,aninternatDeepMind,andaGovernanceofAIFellowatOxford.Herinterdisciplinaryresearchaddressesquestionsaboutthesocietalimpacts,governance,andfutureofAI.
Thepanelassignedgradesbasedonthegatheredevidencebase,consideringbothpublicandcompany-submittedinformation.Theirevaluations,combinedwithactionablerecommendations,aimtoincentivizesaferAIpracticeswithintheindustry.Seethe“GradingProcess”sectionformoredetails.
TeganMaharaj
TeganMaharajisanAssistantProfessorintheDepartmentofDecisionSciencesatHECMontréal,wheresheleadstheERRATAlabonEcologicalRiskandResponsibleAI.SheisalsoacoreacademicmemberatMila.HerresearchfocusesonadvancingthescienceandtechniquesofresponsibleAIdevelopment.Previously,sheservedasanAssistantProfessorofMachineLearningattheUniversityofToronto.
YoshuaBengio
YoshuaBengioisaFullProfessorintheDepartmentofComputerScienceandOperationsResearchatUniversitédeMontreal,aswellastheFounderandScientificDirectorofMilaandtheScientificDirectorofIVADO.Heistherecipientofthe2018A.M.TuringAward,aCIFARAIChair,aFellowofboththeRoyalSocietyofLondonandCanada,anOfficeroftheOrderofCanada,KnightoftheLegionofHonorofFrance,MemberoftheUN’sScientificAdvisoryBoardforIndependentAdviceonBreakthroughsinScienceandTechnology,andChairoftheInternationalScientificReportontheSafetyofAdvancedAI.
JessicaNewman
JessicaNewmanistheDirectorofthe
AISecurityInitiative
(AISI),housedattheUCBerkeleyCenterforLong-TermCybersecurity.SheisalsoaCo-DirectoroftheUCBerkeley
AIPolicyHub
.Newman’sresearchfocusesonthegovernance,policy,andpoliticsofAI,withparticularattentiononcomparativeanalysisofnationalAIstrategiesandpolicies,andonmechanismsfortheevaluationandaccountabilityoforganizationaldevelopmentanddeploymentofAIsystems.
DavidKrueger
DavidKruegerisanAssistantProfessorinRobust,ReasoningandResponsibleAIintheDepartmentofComputerScienceandOperationsResearch(DIRO)atUniversityofMontreal,andaCoreAcademicMemberatMila,UCBerkeley’sCenterforHuman-CompatibleAI,andtheCenterfortheStudyofExistentialRisk.Hisworkfocusesonreducingtheriskofhumanextinctionfromartificialintelligencethroughtechnicalresearchaswellaseducation,outreach,governanceandadvocacy.
SnehaRevanur
SnehaRevanuristhefounderandpresidentofEncodeJustice,aglobalyouth-ledorganizationadvocatingfortheethicalregulationofAI.Underherleadership,EncodeJusticehasmobilizedthousandsofyoungpeopletoaddresschallengeslikealgorithmicbiasandAIaccountability.ShewasfeaturedonTIME’sinaugurallistofthe100mostinfluentialpeopleinAI.
StuartRussell
StuartRussellisaProfessorofComputerScienceattheUniversityofCaliforniaatBerkeley,holderoftheSmith-ZadehChairinEngineering,andDirectoroftheCenterforHuman-CompatibleAIandtheKavliCenterforEthics,Science,andthePublic.HeisarecipientoftheIJCAIComputersandThoughtAward,theIJCAIResearchExcellenceAward,andtheACMAllenNewellAward.In2021hereceivedtheOBEfromHerMajestyQueenElizabethandgavetheBBCReithLectures.Heco-authoredthestandardtextbookforAI,whichisusedinover1500universitiesin135countries.
FUTUREOFLIFEINSTITUTE
4
Method
IndexDesign
TheAISafetyIndexevaluatessafetypracticesacrosssixleadinggeneral-purposeAIdevelopers:Anthropic,OpenAI,GoogleDeepMind,Meta,x.AI,andZhipuAI.Theindexprovidesacomprehensiveassessmentbyfocussingonsixcriticaldomains,with42indicatorsspreadacrossthesedomains:
1.RiskAssessment
2.CurrentHarms
3.SafetyFrameworks
4.ExistentialSafetyStrategy
5.Governance&Accountability
6.Transparency&Communication
IndicatorsrangefromcorporategovernancepoliciestoexternalmodelevaluationpracticesandempiricalresultsonAIbenchmarksfocusedonsafety,fairnessandrobustness.Thefullsetofindicatorscanbefoundinthegradingsheetsin
AppendixA
.AquickoverviewisgiveninTable1onthenextpage.Thekeyinclusioncriteriafortheseindicatorswere:
1.Relevance:ThelistemphasizesaspectsofAIsafetyandresponsibleconductthatarewidelyrecognizedbyacademicandpolicycommunities.Manyindicatorsweredirectlyincorporatedfromrelatedprojectsconductedbyleadingresearchorganizations,suchasStanford’sCenterforResearchonFoundationModels.
2.Comparability:Weselectedindicatorsthathighlightmeaningfuldifferencesinsafetypractices,whichcanbeidentifiedbasedontheavailableevidence.Asaresult,safetyprecautionsforwhichconclusivedifferentialevidencewasunavailablewereomitted.
Companieswereselectedbasedontheiranticipatedcapabilitytobuildthemostpowerfulmodelsby2025.Additionally,theinclusionoftheChinesefirmZhipuAIreflectsourintentiontomaketheIndexrepresentativeofleadingcompaniesglobally.Futureiterationsmayfocusondifferentcompaniesasthecompetitivelandscapeevolves.
Weacknowledgethattheindex,whilecomprehensive,doesnotcaptureeveryaspectofresponsibleAIdevelopmentandexclusivelyfocusesongeneral-purposeAI.Wewelcomefeedbackonourindicatorselectionandstrivetoincorporatesuitablesuggestionsintothenextiterationoftheindex.
FUTUREOFLIFEINSTITUTE
5
Table1:Fulloverviewofindicators
RiskAssessment
CurrentHarms
Safety
Frameworks
Existential
SafetyStrategy
Governance&Accountability
Transparency&Communication
Dangerouscapabilityevaluations
AIRBench2024
Riskdomains
Control/Alignmentstrategy
Companystructure
Lobbyingonsafetyregulations
Uplifttrials
TrustLLM
Benchmark
Riskthresholds
Capabilitygoals
Boardofdirectors
Testimoniestopolicymakers
Pre-deploymentexternalsafetytesting
SEALLeaderboardforadversarial
robustness
Modelevaluations
Safetyresearch
Leadership
Leadership
communicationsoncatastrophicrisks
Post-deploymentexternalresearcheraccess
GraySwan
JailbreakingArena-Leaderboard
Decisionmaking
Supportingexternalsafetyresearch
Partnerships
Stanford’s2024
FoundationModelTransparencyIndex1.1
Bugbountiesformodel
vulnerabilities
Fine-tuningprotections
Riskmitigations
Internalreview
Safetyevaluationtransparency
Pre-developmentriskassessments
Carbonoffsets
Conditionalpauses
Missionstatement
Watermarking
Adherence
Whistle-blower
Protection&
Non-disparagement
Agreements
Privacyofuserinputs
Assurance
Compliancetopublic
commitments
Datacrawling
Military,warfare&intelligenceapplications
TermsofServiceanalysis
EvidenceBase
TheAISafetyIndexisunderpinnedbyacomprehensiveevidencebasetoensureevaluationsarewell-informedandtransparent.Thisevidencewascompiledintodetailedgradingsheets,whichpresentedcompany-specificdataacrossall42indicatorstothereviewpanel.Thesesheetsincludedhyperlinkstooriginalsourcesandcanbeaccessedinfullin
AppendixA
.Evidencecollectionreliedontwoprimarypathways:
?PubliclyAvailableInformation:Mostdatawassourcedfrompubliclyaccessiblematerials,includingresearchpapers,policydocuments,newsarticles,andindustryreports.Thisapproachenhancedtransparencyandenabledstakeholderstoverifytheinformationbytracingitbacktoitsoriginalsources.
?CompanySurvey:Tosupplementpubliclyavailabledata,atargetedquestionnairewasdistributedtotheevaluatedcompanies.Thesurveyaimedtogatheradditionalinsightsonsafety-relevantstructures,processes,andstrategies,includinginformationnotyetpubliclydisclosed.
EvidencecollectionspannedfromMay14toNovember27,2024.ForempiricalresultsfromAIbenchmarks,wenoteddataextractiondatestoaccountformodelupdates.Inlinewithourcommitmenttotransparencyandaccountability,allcollectedevidence—whetherpublicorcompany-provided—hasbeendocumentedandmadeavailableforscrutinyintheappendix.
FUTUREOFLIFEINSTITUTE
6
IncorporatedResearchandRelatedWork
TheAISafetyIndexisbuiltonafoundationofextensiveresearchanddrawsinspirationfromseveralnotableprojectsthathaveadvancedtransparencyandaccountabilityinthefieldofgeneral-purposeAI.
Twoofthemostcomprehensiverelatedprojectsarethe
RiskManagementRatings
producedbySaferAI,anon-profitorganizationwithdeepexpertiseinriskmanagement,and
AILabW
,aresearchinitiativeidentifyingstrategiesformitigatingextremerisksfromadvancedAIandreportingoncompanyimplementationofthosestrategies.
TheSafetyIndexdirectlyintegratesfindingsfromStanford’sCenterforResearchonFoundationModels(
CFRN
),
particularlytheir
FoundationModelTransparencyIndex
,aswellasempiricalresultsfrom
AIR-Bench2024
,a
state-of-the-artsafetybenchmarkforGPAIsystems.Additionalempiricaldatacitedincludesscoresfromthe2024
TrustLLM
Benchmark,Scale’s
AdversarialRobustnessevaluation
,andthe
GraySwanJailbreaking
.Thesesourcesofferinvaluableinsightsintothetrustworthiness,fairness,androbustnessofGPAIsystems.
Toevaluateexistentialsafetystrategies,theindexleveragedfindingsfroma
detailedmapping
oftechnicalsafetyresearchatleadingAIcompaniesbytheInstituteforAIPolicyandStrategy.Indicatorsonexternalevaluationswereinformedby
research
ledbyShayneLongpreatMIT,andthestructureofthe‘SafetyFramework’sectiondrewfromrelevantpublicationsfromthe
CenterfortheGovernanceofAI
andtheresearchnon-profit
METR
.Additionally,weexpressgratitudetothejournalistsworkingtokeepcompaniesaccountable,whosereportsarereferencedinthegradingsheets.
CompanySurvey
Tocomplementpubliclyavailabledata,theAISafetyIndexincorporatedinsightsfromatargetedcompanysurvey.Thisquestionnairewasdesignedtogatherdetailedinformationonsafety-relatedstructures,processes,andplans,includingaspectsnotdisclosedinpublicdomains.
Thesurveyconsistedof85questionsspanningsevencategories:Cybersecurity,Governance,Transparency,RiskAssessment,RiskMitigation,CurrentHarms,andExistentialSafety.Questionsincludedbinary,multiple-choice,andopen-endedformats,allowingcompaniestoprovidenuancedresponses.Thefullsurveyisattachedin
AppendixB
.
Surveyresponsesweresharedwiththereviewers,andrelevantinformationfortheindicatorswasalsodirectlyintegratedintothegradingsheets.Informationprovidedbycompanieswasexplicitlyidentifiedinthegradingsheets.Whilex.AIandZhipuAIchosetoengagewiththetargetedquestionsinthesurvey,Anthropic,GoogleDeepMindandMetaonlyreferredustorelevantsourcesofalreadypubliclysharedinformation.OpenAIdecidednottosupportthisproject.
Participationincentive
Whilelessthanhalfofthecompaniesprovidedsubstantialanswers,Engagementwiththesurveywasrecognizedinthe‘TransparencyandCommunications’section.Companiesthatchosenottoengagewiththesurveyreceivedapenaltyofonegradestep.Thisadjustmentincentivizesparticipationandacknowledgesthevalueoftransparencyaboutsafetypractices.Thispenaltyhasbeencommunicatedtothereviewpanelwithinthegradingsheet,andreviewerswereadvisednottoadditionallytakesurveyparticipationintoaccountwhengradingtherelevantsection.FLIremainscommittedtoencouraginghigherparticipationinfutureiterationstoensureasrobustandrepresentativeevaluationsaspossible.
FUTUREOFLIFEINSTITUTE
7
GradingProcess
Thegradingprocesswasdesignedtoensurearigorousandimpartialevaluationofsafetypracticesacrosstheassessedcompanies.Followingtheconclusionoftheevidence-gatheringphaseonNovember27,2024,gradingsheetssummarizingcompany-specificdataweresharedwithanindependentpanelofleadingAIscientistsandgovernanceexperts.Thegradingsheetsincludedallindicator-relevantinformationandinstructionsforscoring.
Panellistswereinstructedtoassigngradesbasedonanabsolutescaleratherthanjustscoringcompaniesrelativetoeachother.FLIincludedaroughgradingrubricforeachdomaintoensureconsistencyinevaluations.Besidestheletter-grades,reviewerswereencouragedtosupporttheirgradeswithshortjustificationsandtoprovidekeyrecommendationsforimprovement.Expertswereencouragedtoincorporateadditionalinsightsandweighindicatorsaccordingtotheirjudgment,ensuringthattheirevaluationsreflectedboththeevidencebaseandtheirspecializedexpertise.Toaccountforthedifferenceinexpertiseamongthereviewers,FLIselectedonesubsettoscorethe“ExistentialSafetyStrategy”andanothertoevaluatethesectionon“CurrentHarms.”O(jiān)therwise,allexpertswereinvitedtoscoreeverysection,althoughsomepreferredtoonlygradedomainstheyaremostfamiliarwith.Intheend,everysectionwasgradedbyfourormorereviewers.Gradeswereaggregatedintoaveragescoresforeachdomain,whicharepresentedinthescorecard.
Byadoptingthisstructuredyetflexibleapproach,thegradingprocessnotonlyhighlightscurrentsafetypracticesbutalsoidentifiesactionableareasforimprovement,encouragingcompaniestostriveforhigherstandardsinfutureevaluations.
Onecanarguethatlargecompaniesonthefrontiershouldbeheldtothehighestsafetystandards.Initially,wethereforeconsideredgiving1/3extrapointtocompanieswithmuchlessstafforsignificantlylowermodelscores.Intheend,wedecidednottodothisforthesakeofsimplicity.Thischoicedidnotchangetheresultingrankingofcompanies.
Results
Thissectionpresentsaveragegradesforeachdomainandsummarizesthejustificationsandimprovementrecommendationsprovidedbythereviewpanelexperts.
RiskAssessment
Anthropic
DeepMind
OpenAI
ZhipuAI
x.AI
Meta
Grade
C+
C
C
D+
F
D+
Score
2.67
2.10
2.10
1.55
0
1.50
OpenAI,GoogleDeepMind,andAnthropicwerecommendedforimplementingmorerigoroustestsforidentifyingpotentialdangerouscapabilities,suchasmisuseincyber-attacksorbiologicalweaponcreation,comparedtotheircompetitors.Yet,eventheseeffortswerefoundtofeaturenotablelimitations,leavingtherisksassociatedwithGPAIpoorlyunderstood.OpenAI’supliftstudiesandevaluationsfordeceptionwerenotabletoreviewers.AnthropichasdonethemostimpressiveworkincollaboratingwithnationalAISafetyInstitutes.Metaevaluateditsmodelsfordangerouscapabilitiesbeforedeployment,butcriticalthreatmodels,suchasthoserelatedtoautonomy,scheming,andpersuasionremainunaddressed.ZhipuAI’sRiskAssessmenteffortswerenotedas
FUTUREOFLIFEINSTITUTE
8
lesscomprehensive,whilex.AIfailedtopublishanysubstantivepre-deploymentevaluations,fallingsignificantlybelowindustrystandards.Areviewersuggestedthatthescopeandsizeofhumanparticipantupliftstudiesshouldbeincreasedandstandardsforacceptableriskthresholdsneedtobeestablished.ReviewersnotedthatonlyGoogleDeepMindandAnthropicmaintaintargetedbug-bountyprogramsformodelvulnerabilities,withMeta’sinitiativenarrowlyfocusingonprivacy-relatedattacks.
CurrentHarms
Anthropic
DeepMind
OpenAI
ZhipuAI
x.AI
Meta
Grade
B-
C+
D+
D+
D
D
Score
2.83
2.50
1.68
1.50
1.00
1.18
Anthropic’sAIsystemsreceivedthehighestscoresonleadingempiricalsafetyandtrustworthinessbenchmarks,withGoogleDeepMindrankingsecond.Reviewersnotedthatothercompanies’systemsattainednotablylowerscores,raisingconcernsabouttheadequacyofimplementedsafetymitigations.ReviewerscriticizedMeta’spolicyofpublishingtheweightsoftheirfrontiermodels,asthisenablesmaliciousactorstoeasilyremovethesafeguardsoftheirmodelsandusetheminharmfulways.GoogleDeepMind’sSynthIDwatermarksystemwasrecognizedasaleadingpracticeformitigatingtherisksofAI-generatedcontentmisuse.Incontrast,mostothercompanieslackrobustwatermarkingmeasures.ZhipuAIreportedusingwatermarksinthesurveybutseemsnottodocumenttheirpracticeontheirwebsite.
Additionally,environmentalsustainabilityremainsanareaofdivergence.WhileMetaandMetaactivelyoffsettheircarbonfootprints,othercompaniesonlypartiallyachievethisorevenfailtoreportontheirpracticespublicly.x.AI’sreporteduseofgasturbinestopowerdatacentersisparticularlyconcerningfromasustainabilitystandpoint.
Further,reviewersstronglyadvisecompaniestoensuretheirsystemsarebetterpreparedtowithstandadversarialattacks.Empiricalresultsshowthatmodelsarestillvulnerabletojailbreaking,withOpenAI’smodelsbeingparticularlyvulnerable(nodataforx.AIorZhipuareavailable).DeepMind’smodeldefenceswerethemostrobustintheincludedbenchmarks.
Thepanelalsocriticizedcompaniesforusinguser-interactiondatatotraintheirAIsystems.OnlyAnthropicandZhipuAIusedefaultsettingswhichpreventthemodelfrombeingtrainedonuserinteractions(exceptthoseflaggedforsafetyreview).
SafetyFrameworks
Anthropic
DeepMind
OpenAI
ZhipuAI
x.AI
Meta
Grade
D+
D-
D-
F
F
F
Score
1.67
0.80
0.90
0.35
0.35
0.35
AllsixcompaniessignedtheSeoul
FrontierAISafetyCommitments
andpledgedtodevelopsafetyframeworkswiththresholdsforunacceptablerisks,advancedsafeguardsforhigh-risklevels,andconditionsforpausingdevelopmentifriskscannotbemanaged.Asofthepublicationofthisindex,onlyOpenAI,AnthropicandGoogleDeepMindhavepublishedtheirframeworks.Assuch,thereviewerscouldonlyassesstheframeworksofthosethreecompanies.
FUTUREOFLIFEINSTITUTE
9
Whiletheseframeworkswerejudgedinsufficienttoprotectthepublicfromunacceptablelevelsofrisk,expertsstillconsideredtheframeworkstobeeffectivetosomedegree.Anthropic’sframeworkstoodouttoreviewersasthemostcomprehensivebecauseitdetailedadditionalimplementationguidance.Oneexpertnotedtheneedforamoreprecisecharacterizationofcatastrophiceventsandclearerthresholds.OthercommentsnotedthattheframeworksfromOpenAIandGoogleDeepMindwerenotdetailedenoughfortheireffectivenesstobedeterminedexternally.Additionally,noframeworksufficientlydefinedspecificsaroundconditionalpausesandareviewersuggestedtriggerconditionsshouldfactorinexternaleventsandexpertopinion.Multipleexpertsstressedthatsafetyframeworksneedtobesupportedbyrobustexternalreviewsandoversightmechanismsortheycannotbetrustedtoaccuratelyreportrisklevels.Anthropic’seffortstowardexternaloversightweredeemedbest,ifstillinsufficient.
ExistentialSafetyStrategy
Anthropic
DeepMind
OpenAI
ZhipuAI
x.AI
Meta
Grade
D+
D
D-
F
F
F
Score
1.57
1.10
0.93
0
0.35
0.17
Whileallassessedcompanieshavedeclaredtheirintentiontobuildartificialgeneralintelligenceorsuperintelligence,andmosthaveacknowledgedtheexistentialriskspotentiallyposedbysuchsystems,onlyGoogleDeepMind,OpenAIandAnthropicareseriouslyresearchinghowhumanscanremainincontrolandavoidcatastrophicoutcomes.ThetechnicalreviewersassessingthissectionunderlinedthatnoneofthecompanieshaveputforthanofficialstrategyforensuringadvancedAIsystemsremaincontrollableandalignedwithhumanvalues.Thecurrentstateoftechnicalresearchoncontrol,alignmentandinterpretabilityforadvancedAIsystemswasjudgedtobeimmatureandinadequate.
Anthropicattainedthehighestscores,buttheirapproachwasdeemedunlikelytopreventthesignificantrisksofsuperintelligentAI.Anthropic’s“CoreViewsonAISafety”blog-postarticulatesafairlydetailedportraitoftheirstrategyforensuringsafetyassystemsbecomemorepowerful.Expertsnotedthattheirstrategyindicatesasubstantialdepthofawarenessofrelevanttechnicalissues,likedeceptionandsituationalawareness.Onerevieweremphasizedtheneedtomovetowardlogicalorquantitativeguaranteesofsafety.
OpenAI’sblogposton“PlanningforAGIandbeyond”shareshigh-levelprinciples,whichreviewersconsiderreasonablebutcannotbeconsideredaplan.ExpertsthinkthatOpenAI’sworkonscalableoversightmightworkbutisunderdevelopedandcannotbereliedon.
ResearchupdatessharedbyGoogleDeepMind’sAlignmentTeamwerejudgedusefulbutimmatureandinadequatetoensuresafety.Reviewersalsostressedthatrelevantblogpostscannotbetakenasameaningfulrepresentationofthestrategy,plans,orprinciplesoftheorganizationasawhole.
NeitherMeta,x.AIorZhipuAIhaveputforthplansortechnicalresearchaddressingtherisksposedbyartificialgeneralintelligence.ReviewersnotedthatMeta’sopensourceapproachandx.AI’svisionofdemocratizedaccesstotruth-seekingAImayhelpmitigatesomerisksfromconcentrationofpowerandvaluelock-in.
FUTUREOFLIFEINSTITUTE
10
Governance&Accountability
Anthropic
DeepMind
OpenAI
ZhipuAI
x.AI
Meta
Grade
C+
D+
D+
D
F
D-
Score
2.42
1.68
1.43
1.18
0.57
0.80
ReviewersnotedtheconsiderablecareAnthropic’sfoundershaveinvestedinbuildingaresponsiblegovernancestructure,whichmakesitmorelikelytoprioritizesafety.Anthropic’sotherproactiveefforts,liketheirresponsiblescalingpolicy,werealsonotedpositively.
OpenAIwassimilarlycommendedforitsinitialnon-profitstructure,butrecentchanges,includingthedisbandmentofsafetyteamsanditsshifttoafor-profitmodel,raisedconcernsaboutareducedempha
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
- 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
- 6. 下載文件中如有侵權或不適當內容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 2022-2024年浙江中考英語試題匯編:選詞填空
- 2024版勞動合同:甲方雇傭乙方乙方提供勞動力的權利義務規(guī)定
- 2024版合同范本之六天制勞動合同
- 2024年餐飲店鋪裝修工程合同范本2篇
- 2024版北京小客車租賃合同(含維護)
- 包工不包料房屋建設合同(2篇)
- 2024消防設施設備維保服務合同補充協(xié)議范本6篇
- 二零二五年度動物園租賃合同交接與動物養(yǎng)護管理協(xié)議3篇
- 2024年版手繪墻服務協(xié)議標準模板版B版
- 2025年度新能源項目合作協(xié)議書9篇
- 《業(yè)務員銷售技巧》課件
- 《汽車涂裝》2024-2025學年第一學期工學一體化課程教學進度計劃表
- 水廠安全管理培訓
- 江西省贛州市2023-2024學年高一上學期期末考試化學試題 附答案
- 消化道出血護理常規(guī)課件
- 2024年物流運輸公司全年安全生產工作計劃例文(4篇)
- 期末卷(一)-2023-2024學年高一年級地理上學期高頻考題期末測試卷(江蘇專用)(原卷版)
- 山東師范大學《古代文學專題(一)》期末復習題
- 注塑操作員作業(yè)指導書
- 片石擋土墻砌筑施工方案及工藝方法
- 分析刑法中認識因素和意志因素的關系
評論
0/150
提交評論