




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認(rèn)領(lǐng)
文檔簡介
Copyright?2019McGraw-HillEducation.Allrightsreserved.NoreproductionordistributionwithoutthepriorwrittenconsentofMcGraw-HillEducation.
CHAPTER2LABS-KEY
(Level1Header)Lab2-1Createarequestfordataextraction
Q1.GiventhatyouarenewandtryingtogetagrasponSláinte’soperations,listthreequestionsrelatedtosalesthatwouldhelpyoubeginyouranalysis.Forexample,howmanyproductsweresoldineachstate?
Open-ended–nokeyprovided.
Possibleanswers:
Whatisthehighestsellingproduct?
Howdoquantitiessoldperproductdifferacrossstates?
Whatisaveragequantityofeachproductsoldperday/perstate/permonth?
Whatisthetotalquantityofeachproductsoldperday/perstate/permonth?
Q2.Nowhypothesizetheanswerstoeachofthequestions.Remember,youranswersdon’thavetobecorrectatthispoint.Theywillhelpyouunderstandwhattypeofdatayouarelookingfor.Forexample:500inMissouri,6,000inPennsylvania,4,000inNewYork,etc.
Open-ended–nokeyprovided.
Q3.Finally,foreachquestion,identifythespecifictablesandattributesthatareneededtoansweryourquestions.Forexample,toanswerthequestionaboutstatesales,youwouldneedthe[State]attributewhichismostlikelylocatedinthe[Customer]mastertableaswellasa[QuantitySold]attributeina[Sales]table.Ifyouhadaccesstostoreordistributioncenterlocationdata,youmayalsolookfora[State]fieldthereaswell.
Open-ended–nokeyprovided.
(Level2Header)Part2:Generatearequestfordata
Nowthatyou’veidentifiedthedatayouneedforyouranalysis,completeaDataRequestForm.
OpentheDataRequestForm
Enteryourcontactinformation.
Inthedescriptionfield,identifythetablesthatyou’dliketoanalyze,alongwiththetimeperiods(e.g.pastmonth,pastyear,etc.)
Open-ended–nokeyprovided.
Selectafrequency.Inthiscasethisisa“One-offrequest”.
Enterarequestdate(today)andarequireddate(oneweekfromtoday)
Chooseaformat(spreadsheet).
FinallycompletetheTobeusedinbox(internalanalysis).
TAKEASCREENSHOT(2-1)ofyourcompletedform.
(Level2Header)Part3:Performananalysisofthedata
Q4.Takeamomentandidentifyanyattributesthatyouaremissingfromyouroriginalrequestthatwouldbenecessarytoansweryouroriginalquestionof“Howmanyproductsweresoldineachstate?”.
Open-ended–nokeyprovided.
Q5.Evaluateyouroriginalquestionsandresponses.Canyoustillanswertheoriginalquestion?
Open-ended–nokeyprovided.
Q6.IsthereanotherquestionyoucouldanswerfromthedataRachelprovided?
Possibleanswers:
Howmanysalesordershaseachemployeecreated?
HowmanysaleswerecreatedinthemonthofOctober?
Howmuchmoneywasgeneratedthroughsalesfortheentireperiod?
HowmuchmoneywasgeneratedthroughsalesforthemonthofOctober?
ENDOFLAB
(Level1Header)Lab2-2UsePivotTablestode-normalizeandanalyzethedata
(Level2Header)Part1:IdentifytheQuestions
Q1.GivenSláinte’srequest,identifythedataattributesandtablesneededtoanswerthequestion.
Sales_Subset:Product_Code,Sales_Order_Quantity_Sold
WouldalsobehelpfultohaveFGI_Product:Product_Description
(Level2Header)Part2:Masterthedata:PreparedataforanalysisinExcel
Q2.Whenwoulditbeagoodideatouseasingletable?
Anytimeallofthedatayouneedareinasingletable,thereisnoneedtoextractmorethanonetable.
Alternative2:UsetheExcelInternalDataModel
TAKEASCREENSHOT(2-2a)oftheManageRelationshipswindowwithbothrelationshipscreated.
Q3.Howcomfortableareyouwithidentifyingprimarykey-foreignkeyrelationships?
KEY:open-endedquestion,nokeyprovided
Alternative3:MergingthedataintoasingletableusingExcelQueryEditor
MaximizetheQueryEditorwindow,andTAKEASCREENSHOT(2-2b).
KEYScreenshot:
Q4.HaveyouusedtheQueryEditorinExcelbefore?Double-clickthe[Sales_Subset]queryandclickthroughthetabsontheribbon.Whichoptionsdoyouthinkwillbeusefulinthefuture?
KEY:Open-endedquestion,nokeyprovided.
Alternative4:UseSQLqueriesinAccess
TAKEASCREENSHOT(2-2c).
KEY:Screenshot
(Level2Header)Part3:PerformananalysisusingPivotTablesandQueries
TAKEASCREENSHOT(2-2d)
KEYSREENSHOT:
TAKEASCREENSHOT(2-2e)
Keyscreenshot:
TAKEASCREENSHOT(2-2f)
Keyscreenshot:
SaveyourqueryasTotal_Sales_By_Productandcloseyourdatabase.
(Level2Header)Part4:Addressandrefineyourresults
Q5.IftheownerofSláintewishestoidentifywhichproductsoldthemost,howwouldyoumakethisreportmoreuseful?
Severalpossibleanswers.Someoptionsinclude:sortingthedataorfilteringthedatatoviewonlytheproductassociatedwithhighesttotal_sales.
Q6.Ifyouwantedtoprovidemoredetail,whatotherattributeswouldbeusefultoaddasadditionalrowsorcolumnstoyourreport,orwhatotherreportswouldyoucreate?
Manypossibleanswers.AgoodoptionwouldbetoincludeDatedatafromtheSales_Subsettabletodoanalysisonwhichproductsellsmorebasedonmonthsorseasons.
(Level2Header)Part5:Communicateyourfindings
Let'smakethiseasyforotherstounderstandusingvisualizationandexplanations.
Q7.WriteabriefparagraphabouthowyouwouldinterprettheresultsofyouranalysisinplainEnglish?Forexample,whichdatapointsstandout?
Open-endedquestion,nosolutionprovided.
Q8.InChapter4we’lldiscusssomevisualizationtechniques.Describeawayyoucouldpresentthisdataasachartorgraph.
Open-endedquestion,nosolutionprovided.PossibleanswersincludePivotChartvisualizedasabarchart,usingfilters,slicers,ortimelinestomakethedatamoreinteractive.
Endoflab
(Level1Header)Lab2-3ResolvecommondataproblemsinExcelandAccess
Q1.WhatdoyouexpectwillbemajordataqualityissueswithLendingClub’sdata?
Open-endedquestion,nokeyprovided.StudentsshouldrelyonwhattheylearnedinChapter2regardingdataqualityissuestomakeassumptionsonwhatcouldcauseproblemsinthisfile.
(Level2Header)Part2:MastertheData
Q2.Giventhislistofattributes,whatconcernsdoyouhavewiththedata’sabilitytopredictanswerstothequestionsyouidentifiedinChapter1?
Open-endedquestion,nokeyprovided.
Q3.Isthereanythinginthedatathatyouthinkwillmakeanalysisdifficult?Forexample,arethereanyspecialsymbols,non-standarddata,ornumbersthatlookoutofplace?
Open-endedquestion,nokeyprovided.Thesheersizeofthedatamaystrikesomestudentsasbeingdifficulttoanalyze,aswellastheamountofblank/nullvalues.
Q4.Whatwouldyoudotocleanthedatainthisfile?
Open-endedquestion,nokeyprovided.Thenextsectionofthelab,“Let’sidentifysomeissueswiththedata…”introducesseveraloftheitemsthatneedtobecleaned(ortransformed).
Let’sidentifysomeissueswiththedata.
Therearemanyattributeswithoutanydata,andthatmaynotbenecessary.
The[int_rate]valuesarewrittenin##.##%,butanalysiswillrequire#.####
The[term]valuesincludetheword“months”,whichshouldberemovedfornumericalanalysis.
The[emp_length]valuesinclude“n/a”,“<”,“+”,“year”,and“years”,whichshouldberemovedfornumericalanalysis
Dates,including[issue_d],canbemoreusefulifweexpandthemtoshowtheday,month,andyearasseparateattributes.Datescauseissuesingeneralbecausedifferentsystemsusedifferentdateformats(e.g.1/9/2009,Jan-2009,9/1/2009forEuropeandates,etc.),sotypicallysomeconversionisnecessary.
First,removetheunwanteddata:
Saveyourfileas“Loans2007-2011.xlsx”totakeadvantageofsomeofExcel’sfeatures.
Deletethefirstrowthatsays“Notesofferedbyprospectus…”
Deletethelastfourrowsthatinclude“Totalamountfunded…”
Deletecolumnsthathavenovalues,including[id],[member_id],[url]
Repeatforanyotherblankcolumnsorunwantedattributes.
Thecolumnswiththeheadersrevol_bal_joint,sec_app_earliest_cr_line,sec_app_inq_last_6mths,sec_app_mort_acc,sec_app_open_acc,sec_app_revol_util,sec_app_open_act_il,sec_app_num_rev_accts,sec_app_chargeoff_within_12_mths,sec_app_collections_12_mths_ex_med,andsec_app_mths_since_last_major_derogcanalsobedeleted.
Next,fixyournumbers:
Selectthe[int_rate]column.
IntheHometab,gototheNumbersectionandchangethenumbertypefromPercentagetoGeneralusingthedrop-downmenu.
Repeatforanyotherattributeswithpercentages.
TAKEASCREENSHOT(2-3a)ofyourpartiallycleaneddatafile.
KeyScreenshot:
Then,removeanywordsfromnumericalvalues:
Selectthe[term]column.
UseFind&Replace(Ctrl+HorHome>Editing>Find&Select>Find&Replace)tofindthewords“months”and“month”andreplacethemwithanull/blankvalue“”.Important:Besuretoincludethespacebeforethewordsandgofromthelongestvariationofthewordtotheshortest.Inthiscaseifyoureplaced“month”first,youwouldendupwithalotofvaluesthatstillhadtheletter“s”From“months”.
Nowselectthe[emp_length]columnandfindandreplacethefollowingvalues:
Originalvalue
Newvalue
naorn/a
0
<1year
0
1year
1
2years
2
3years
3
4years
4
5years
5
6years
6
7years
7
8years
8
9years
9
10+years
10
,(comma)
(blank)
ThiscanbedoneeitherwithFindandReplaceorwithaFalseVLookup.Then/acellshavenonprintablecharactersinthem,sothe=CLEANfunctionwillbeusefulforensuringthen/avaluesarefoundintheircells.
TAKEASCREENSHOT(2-3b)ofyourpartiallycleaneddatafile,showingthe[term]column.
Q5.Whydoyouthinkitisusefultoreformatandextractpartsofthedatesbeforeyouconductyouranalysis?Whatdoyouthinkwouldhappenifyoudidn’t?
Open-endedquestion,nokeyprovided.Possibleanswersincludethatifyouintendtodoanalysisonanyofthevaluesthathaven’tfirstbeencleaned,youranalysismaynotworkormaynotrunonthecompletedataset.
Q6.Didyourunintoanymajorissueswhenyouattemptedtocleanthedata?Howwouldyouresolvethose?
Open-endedquestion,nokeyprovided.
ENDOFLAB
(Level1Header)Lab2-4GeneratesummarystatisticsinExcel
Becauseeveryquestioninthislabisopen-ended,thereisnokeyprovided.
ENDOFLAB
(Level1Header)Lab2-5–CollegeScorecardExtractandDataPreparation
(Level2Header)Part2:MastertheData
Takeascreenshot(1)
ScreenshotKey:
Q1.Bylookingthroughthedatainthetextfile,whatdoyouthinkthedelimiteris?
Comma
Takeascreenshot(2)
ScreenshotKey:
Toensurethatyoucapturedallofthedatathroughtheextractionfromthe.txtfile,weneedtovalidateit.Validatethefollowingchecksums:
Youshouldhave7,704records(rows).
Comparetheattributenames(columnheaders)totheattributeslistedinthedatadictionary.Areyoumissingany,ordoyouhaveanyextras?
TheaverageSATscoreshouldbe1,059.07(thisisleavingNULLvaluesasNULL).
Q2.Inthechecksums,youvalidatedthattheaverageSATscoreforalloftherecordsis1,059.07.Whenweworkwiththedatamorerigorously,severaltestswillrequireustotransformNULLvalues.IfyouweretotransformtheNULLSATvaluesinto0,whatwouldhappentotheaverage(woulditstaythesame,decrease,orincrease)?
Theaveragewoulddecrease
Howwouldthatchangetotheaverageimpactthewayyouwouldinterpretthedata?
ItwouldinaccuratelyrepresentaverylowSATaverageacrossallschools(CorrectAnswer)
Doyouthinkit’sagoodideatoreplaceNULLvalueswith0sinthiscase?
No
ToavoidtheissueswithNULL,blanks,and0s,wewillremovealloftherecordsthatcontainNULLvaluesineitherSAT_AVGorC150_4.Doso.
Performa=COUNT()toverifytheamountofrecordsthatremainafterremovingallrecordsassociatedwithNULLvaluesinSAT_AVGorC150_4.1,271recordsshouldremain.
Takeascreenshot(3)
KeyScreenshot:
Yourdataisnowreadyforthetestplan.Thislabwillcontinueinchapter3.
ENDOFLAB
(Level1Header)Lab2-6ComprehensiveCase:Dillard’sStoreData:HowtoCreateanE-RDiagram
QuestionsforParts1-3areallopen-ended,nokeyprovided.
(Level2Header)Part4:AddressandRefineResults
Q3. WhatistheprimarykeyfortheTRANSACTtable?WhatistheprimarykeyfortheSKUtable?
ITEM_ID–CorrectAnswerforSKUtable
TRANSACTION_ID–CorrectAnswerforTRANSACTtable
Q4. HowdoweconnecttheSKUdatabasetotheTRANSACTtable?Howdowejointablesfromtwodifferentrelatedtables?
Tablesarejoinedbyrelatingtheforeignandprimarykeys.TheTRANSACTtablehasaforeignkeyfromtheSKUtable,sotherelationshipbetweenthetwocharactersisthejoiningofTRANSACT.ITEM_IDandSKU.ITEM_ID.
ENDOFLAB
(Level1Header)Lab2-7ComprehensiveCase:Dillard’sStoreData:HowtoPreviewDataFromTablesinaQuery
(Level2Header)Part1:IdentifytheQuestions
Q1. Howwouldaviewoftheentiredatabaseorcertaintablesoutofthatdatabaseallowustogetafeelforthedata?
Open-endedquestion,possibleanswersincludethatitisnecessarytoviewhowadatabaseisstructuredinordertoknowwhatdataisavailabletoanalyze.Viewingtheactualdatastoredintablescanhelpexplainwhatcertainattributesrepresent,especiallyifyoudon’thaveadatadictionaryorifthedatadictionaryisn’tverydescriptive.
Q2. WhattypesofdatawouldyouguessthatDillard’s,aretailstore,gatherthatmightbeuseful?HowcouldDillard’ssuppliersusethisdatatopredictfuturepurchases?
Open-endedquestion,nokeyprovided.Possibleanswersinclude:salesdata(salesorders,salesorderdates,itemssold),customerdata(whateachcustomerpurchases,wheretheylive),inventory(retailprice,cost,category),etc.
TAKEASCREENSHOTOFYOURRESULTS(2)
KEYScreenshot
Q3. Whatdoyouthink‘P’and‘R’representintheTRAN_TYPEtable?Howmighttransactionsdifferiftheyarerepresentedby‘P’or‘R’.
Answerswillvary,butPrepresentsPurchaseandRrepresentsReturn.
Q4. Whatbenefitcanyougainfromselectingonlythetopfewrowsofyourdata,particularlyfromalargedataset?
Answerswillvary,butsomepossiblesolutionsincludegettingaquickglanceatthedatawithouthavingtowaitforthequerytorunifit’salargedataset.
(Level1Header)Lab2-8ComprehensiveCase:Dillard’sStoreData:ConnectingExceltoaSQLDatabase
Q1. WhatcanyoudoinExcelthatismuchmoredifficulttodoinotherdatamanagementprograms?
Open-endedquestion,nokeyprovided.Possibleanswersmightbebasedaroundstudents’generalfamiliaritywithExcel–it’seasierforthemtoworkwiththanPython,R,orSQL,forexample,becauseofitsfriendlierinterface.
Q2. BecausemostaccountantsarefamiliarwithExcel,namethreedatamanagementfunctionsyoucandoeasierinExcelthananyotherprogram?Howdoesthatfamiliarityhelpyouwithyouranalysis?
Open-endedquestion,nokeyprovided.PossibleanswersincludePivotTables,Tables,andvisualizingdata(it’sarguablewhetherthesefunctionsaretrulyeasierinExcel,butmostofourstudentswillprobablythinkit’seasierinExcelatthispointduetohavinglessexposuretoothertools.Theiranswerstothisquestionisverydependentontheirpreviousexperience).
TakeascreenshotofthePivotTable.
Q3. ReferenceyourPivotTableandfindwhichstatehasthehighestnumberofDillard’sstores.Whichstateshavethefewest?Howmanystoresarethereacrossthecountry?
Texashasthehighestnumberofstores,NewYorkandWyominghavethelowest.Thereare313storesacrossthecountry.
Q4. CountingthenumberofstoresperstateisoneexampleofhowthedatathathasbeenloadedfromSQLServerintoExcelcanbecomeusefulinformationthroughaPivotTable.WhatareotherwaysthatyoucouldorganizetheSTOREdatainaPivotTabletocomeupwithmeaningfulinformation?
Open-endedquestion,possibleanswersincludedrillingdownintoDivisionandCity.Tomakethedatafarmoreinteresting,joininginothertablescouldprovidemeaningfulanalysis(salesperstore,salesperstate,etc.)
Q5.JoinsaremadebasedontheirPrimaryKey–ForeignKeyrelationship.LookingattheERDiagramorthedataset,whichtwocolumnsformtherelationshipbetweentheTRANSACTandSTOREtables?
Transact.Store=Store.Store
Q6.Lookingatthefirstseveralrowsofdata,comparetheamountsinORIG_PRICE,SALE_PRICE,TRAN_AMT.Whatdoyouthinktran_amtrepresents?
Thetotaltransactionamount,takingintoaccountdiscounts.
Q7.Whatarethemeansforeachoftheattributes?
ORIG_PRICE:53.99
SALE_PRICE:35.46
TRAN_AMT:27.84
Q8.ThemeanfromTRAN_AMTislowerthanthemeansforbothORIG_PRICEandSALE_PRICE,whydoyouthinkthatis?(Hint:itisnotanerror).
TheTRAN_AMTnotonlytakesintoaccountdiscounts,butalsoisnegativewhenthetransactionisareturn.
(Level2Header)Part5:AddressandRefineResults
Q9.HowdoesdoingaquerywithinExcelallowquickerandmoreefficientaccessandanalysisofthedata?
Open-endedquestion,nokeyprovided.Possibleresponsesincludenothavingtoexportthequeryresultsfromthedatabase.
Q10.Is15daysofdata
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 2025年網(wǎng)狀蓬松石英棉項目建議書
- 如何學(xué)會護理操作
- 2025年管業(yè):地暖專用管項目建議書
- 2025屆四川省達州市重點中學(xué)高考仿真卷化學(xué)試卷含解析
- 幼兒小班講課課件
- 五年級數(shù)學(xué)(小數(shù)乘法)計算題專項練習(xí)及答案匯編
- 三年級數(shù)學(xué)計算題專項練習(xí)匯編及答案集錦
- 2025年大孔燒結(jié)空心磚項目建設(shè)總綱及方案
- 2025年網(wǎng)絡(luò)特性測試儀器項目合作計劃書
- 陜西西安雁塔區(qū)師范大附屬中學(xué)2024-2025學(xué)年初三適應(yīng)性月考(六)化學(xué)試題試卷含解析
- 信息技術(shù)行業(yè)安全保障措施及系統(tǒng)穩(wěn)定性維護
- 《移動通信技術(shù)簡介》課件
- 病人私自外出護理不良事件
- DBJ50-T -026-2021 建筑智能化系統(tǒng)工程驗收標(biāo)準(zhǔn)
- 克服考試緊張情緒的技巧
- 2025年一季度安全自檢自查工作總結(jié)(3篇)
- 2025年泰興經(jīng)濟開發(fā)區(qū)國有企業(yè)招聘筆試參考題庫含答案解析
- 《卵巢癌的手術(shù)治療》課件
- 中學(xué)教育《養(yǎng)成好習(xí)慣》主題班會課件
- 無線通信射頻收發(fā)系統(tǒng)設(shè)計研究
- 護理授權(quán)管理制度內(nèi)容
評論
0/150
提交評論