s1ep3-pandas和大熊貓們Pandas一起游戲吧_第1頁(yè)
s1ep3-pandas和大熊貓們Pandas一起游戲吧_第2頁(yè)
s1ep3-pandas和大熊貓們Pandas一起游戲吧_第3頁(yè)
s1ep3-pandas和大熊貓們Pandas一起游戲吧_第4頁(yè)
s1ep3-pandas和大熊貓們Pandas一起游戲吧_第5頁(yè)
免費(fèi)預(yù)覽已結(jié)束,剩余65頁(yè)可下載查看

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

和大熊貓們(Pandas)一起游API速查:(基于NumPy,SciPy的功能,在其上補(bǔ)充了大量的數(shù)據(jù)操作(DataManipulation)功能0.上手玩:Why普通的程序員看到一份數(shù)據(jù)會(huì)怎么做InimportcodecsimportrequestsimportnumpyasnpimportscipyasspimportimportcodecsimportrequestsimportnumpyasnpimportscipyasspimportpandasaspdimportdatetimeimportjsonInrr=withcodecs.open('S1EP3_Iris.txt','w',encoding='utf-8')asf:5.1,3.5,1.4,0.2,Iris-4.9,3.0,1.4,0.2,Iris-4.7,3.2,1.3,0.2,Iris-4.6,3.1,1.5,0.2,Iris-5.0,3.6,1.4,0.2,Iris-5.4,3.9,1.7,0.4,Iris-4.6,3.4,1.4,0.3,Iris-5.0,3.4,1.5,0.2,Iris-4.4,2.9,1.4,0.2,Iris-4.9,3.1,1.5,0.1,Iris-5.4,3.7,1.5,0.2,Iris-4.8,3.4,1.6,0.2,Iris-4.8,3.0,1.4,0.1,Iris-,3.0,1.1,0.1,Iris-5.8,4.0,1.2,0.2,Iris-5.7,4.4,1.5,0.4,Iris-5.4,3.9,1.3,0.4,Iris-5.1,3.5,1.4,0.3,Iris-,3.8,1.7,0.3,Iris-5.1,3.8,1.5,0.3,Iris-5.4,3.4,1.7,0.2,Iris-,3.7,1.5,0.4,Iris-,3.6,1.0,0.2,Iris-,3.3,1.7,0.5,Iris-4.8,3.4,1.9,0.2,Iris-5.0,3.0,1.6,0.2,Iris-5.0,3.4,1.6,0.4,Iris-5.2,3.5,1.PAGE5.2,3.5,1.PAGE6,0.2,Iris-,3.4,1.4,0.2,Iris-,3.2,1.6,0.2,Iris-5.4,3.4,1.5,0.4,Iris-,4.1,1.5,0.1,Iris-5.5,4.2,1.4,0.2,Iris-4.9,3.1,1.5,0.1,Iris-5.0,3.2,1.2,0.2,Iris-5.5,3.5,1.3,0.2,Iris-4.9,3.1,1.5,0.1,Iris-,3.0,1.3,0.2,Iris-5.1,3.4,1.5,0.2,Iris-5.0,3.5,1.3,0.3,Iris-,2.3,1.3,0.3,Iris-4.4,3.2,1.3,0.2,Iris-5.0,3.5,1.6,0.6,Iris-5.1,3.8,1.9,0.4,Iris-,3.0,1.4,0.3,Iris-5.1,3.8,1.6,0.2,Iris-,3.2,1.4,0.2,Iris-,3.7,1.5,0.2,Iris-5.8,2.6,4.0,1.2,Iris-5.0,2.3,3.3,1.0,Iris-5.0,2.3,3.3,1.0,Iris-5.1,2.5,3.0,1.1,Iris-5.1,2.5,3.0,1.1,Iris-Inwithwithcodecs.open('S1EP3_Iris.txt','r',encoding='utf-8')asf:lines=f.readlines()forlineinprint5.1,3.5,1.4,0.2,Iris-4.9,3.0,1.4,0.PAGE4.9,3.0,1.4,0.PAGE3,Iris-,3.2,1.3,0.2,Iris-4.6,3.1,1.5,0.2,Iris-5.0,3.6,1.4,0.2,Iris-,3.9,1.7,0.4,Iris-5.0,3.4,1.5,0.2,Iris-4.4,2.9,1.4,0.2,Iris-,3.1,1.5,0.1,Iris-5.4,3.7,1.5,0.2,Iris-,3.4,1.6,0.2,Iris-4.8,3.0,1.4,0.1,Iris-,3.0,1.1,0.1,Iris-,4.0,1.2,0.2,Iris-5.7,4.4,1.5,0.4,Iris-5.4,3.9,1.3,0.4,Iris-5.1,3.5,1.4,0.3,Iris-5.7,3.8,1.7,0.3,Iris-,3.8,1.5,0.3,Iris-5.4,3.4,1.7,0.2,Iris-,3.7,1.5,0.4,Iris-,3.6,1.0,0.2,Iris-,3.3,1.7,0.5,Iris-4.8,3.4,1.9,0.2,Iris-5.0,3.0,1.6,0.2,Iris-5.0,3.4,1.6,0.4,Iris-,3.5,1.5,0.2,Iris-,3.4,1.4,0.2,Iris-,3.2,1.6,0.2,Iris-,3.1,1.6,0.2,Iris-5.4,3.4,1.5,0.4,Iris-,4.1,1.5,0.1,Iris-5.5,4.2,1.4,0.2,Iris-,3.1,1.5,0.1,Iris-5.0,3.2,1.2,0.2,Iris-5.5,3.5,1.3,0.2,Iris-4.9,3.1,1.5,0.1,Iris-,3.0,1.3,0.2,Iris-5.1,3.4,1.5,0.2,Iris-5.0,3.5,1.3,0.3,Iris-,2.3,1.3,0.3,Iris-4.4,3.2,1.3,0.2,Iris-5.0,3.5,1.6,0.6,Iris-5.1,3.8,1.9,0.4,Iris-4.8,3.0,1.4,0.3,Iris-5.1,3.8,1.6,0.2,Iris-,3.2,1.4,0.2,Iris-,3.7,1.5,0.2,Iris-6.2,2.2,4.5,1.5,Iris-5.6,2.5,3.9,1.1,Iris-5.6,2.5,3.9,1.1,Iris-6.4,2.9,4.3,1.3,Iris-6.4,2.9,4.3,1.3,Iris-6.4,3.1,5.5,1.PAGE6.4,3.1,5.5,1.PAGE9,Iris-6.9,3.1,5.1,2.3,Iris-Pandas的意義Inimportimportpandasasirisdata=pd.read_csv('S1EP3_Iris.txt',header=None,encoding='utf-8')012340Iris-1Iris-2Iris-3Iris-4Iris-5Iris-6Iris-7Iris-8Iris-9Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-150rows×5Incnamescnames=['sepal_length','sepal_width','petal_length','petal_width','class'] olumns=cnames0Iris-1Iris-2Iris-3Iris-4Iris-5Iris-6Iris-7Iris-8Iris-9Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-150rows×5InIris-Iris-Iris-In0Inprintprintforxins=print'{0:<12}'.format(.upper()),"Statistics:",'{0:>5}{1:>5}{2:>5}{3:>5}'.format(s.max(),s.min(),Iris- Iris- Iris- dtype:快速I(mǎi)nslogsslogs=lambdaentpy=lambdax:sp.exp((slogs(x.sum())-x.map(slogs).sum())/x.sum())Iris-Iris-Iris-歡迎來(lái)到大熊貓世Pandas的重要數(shù)據(jù)類(lèi)Index(行索引,行級(jí)元數(shù)據(jù)Series:pandas的長(zhǎng)槍(數(shù)據(jù)表中的一列或一行,觀測(cè)向量,一維數(shù)組數(shù)據(jù)世界中對(duì)于任意一個(gè)的全面觀測(cè),或者對(duì)于任意一組某一屬性的觀測(cè),全部可以抽象為Series的概念用值構(gòu)建一個(gè)由默認(rèn)index和values組成InSeries1Series1=printprintprint - -dtype:float64<class'pandas.core.series.Series'>Int64Index([0,1,2,3],dtype='int64')[0.030480420.07274621-0.18660749-Series支持過(guò)濾的原理就如同Inprintprintprint dtype:當(dāng)然也支持Inprintprintprint - -dtype: dtype:以及UniversalInprintprint#NumPyUniversalf_np=np.frompyfunc(lambdaprint0123dtype:0123dtype:在序列上就使用行標(biāo),而不是創(chuàng)建一個(gè)2列的數(shù)據(jù)表,能夠輕松辨別哪里是數(shù)據(jù),哪里是元數(shù)據(jù)InSeries2Series2=pd.Series(Series1.values,index=['norm_'+unicode(i)foriinprintprintSeries2.indexprintSeries2.values--float64<classIndex([u'norm_0',u'norm_1',u'norm_2',u'norm_3'],<class[0.030480420.07274621-0.18660749-雖然行是有順序的,但是仍然能夠通過(guò)行級(jí)的index來(lái)到數(shù)據(jù)(當(dāng)然也不盡然像OrderedDict,因?yàn)樾兴饕踔量梢灾貜?fù),不推薦重復(fù)的行索引不代表不能用Inprintprint dtype:float64Inprintprint'norm_0'inprint'norm_6'in默認(rèn)行索引就像行號(hào)一樣InprintprintInt64Index([0,1,2,3],從Key不重復(fù)的OrderedDict或者從Dict來(lái)定義Series就不需要擔(dān)心行索引重復(fù)InSeries3_DictSeries3_Dict={"Japan":"Tokyo","S.Korea":"Seoul","China":"Beijing"}Series3_pdSeries=pd.Series(Series3_Dict)printprintprint dtype:object['Beijing''Tokyo'Index([u'China',u'Japan',u'S.Korea'],想讓序列按你的排序方式保存?就算有缺失值都毫無(wú)問(wèn)InSeries4_IndexList=["Japan","China","Singapore","S.Korea"]Series4_pdSeriesSeries4_IndexList=["Japan","China","Singapore","S.Korea"]Series4_pdSeries=pd.Series(Series3_Dict,index=Series4_IndexList)printSeries4_pdSeriesprintSeries4_pdSeries.valuesprintSeries4_pdSeries.indexprintSeries4_pdSeries.isnull()print S.KoreaSeouldtype:['Tokyo''Beijing'nanIndex([u'Japan',u'China',u'Singapore',u'S.Korea'],dtype='object') dtype:bool dtype:bool整個(gè)序列級(jí)別的元數(shù)據(jù)信息當(dāng)數(shù)據(jù)序列以及index本身有了名字,就可以更方便的進(jìn)行后續(xù)的數(shù)據(jù)關(guān)聯(lián)InprintprintprintInInSeries4_pdSSeries4_pdS="CapitalSeries"Series4_pdS="Nation"printSeries4_pdSeries ChinaBeijingSingaporeNaNS.KoreaSeoulName:CapitalSeries,dtype:"字典"?不是的,行index可以重復(fù),盡管不推薦InSeries5_IndexListSeries5_IndexList=Series5=pd.Series(Series1.values,index=printprintABB-C-dtype:BB-Adtype:Series的有序集合,就像R的DataFrame一樣方便仔細(xì)想想,絕大部分的數(shù)據(jù)形式都可以表現(xiàn)為DataFrame從NumPy二維數(shù)組、從文件或者從數(shù)據(jù)庫(kù)定義:數(shù)據(jù)雖好,勿忘IndataNumPydataNumPy=np.asarray([('Japan','Tokyo',4000),('S.Korea','Seoul',1300),('China','Beijing',910DF1=pd.DataFrame(dataNumPy,columns=['nation','capital','GDP'])012等長(zhǎng)的列數(shù)據(jù)保存在一個(gè)字典里(JSON):很不幸,字典keyInIndataDictdataDict{'nation':['Japan','S.Korea','China'],'capital':['Tokyo','Seoul','Beijing'],'GDP':[DF2=012從另一個(gè)DataFrame定義DataFrame:啊,強(qiáng)迫癥犯InDF21DF21=pd.DataFrame(DF2,columns=['nation','capital','GDP'])012InDF22DF22=pd.DataFrame(DF2,columns=['nation','capital','GDP'],index=[2,0,1])201從DataFrame中取出列??jī)煞N方法(與JavaScript完全一致'[]'的寫(xiě)法最安全。InInprintprintprint201nation,dtype:object01capital,dtype:201Name:GDP,dtype:從DataFrame中取出行?(至少)兩種方法InprintprintDF22[0:1]#給出的實(shí)際是printDF22.ix[0]#通過(guò)對(duì)應(yīng)Index2Name:0,dtype:像NumPy切片一樣的終極招Inprintprintprint Name:2,dtype: Name:nation,dtype:聽(tīng)說(shuō)你從AlterTable來(lái),大熊貓笑然而動(dòng)態(tài)增加列無(wú)法用"."的方式完成,只能用"[InDF22['population']DF22['population']=[1600,130,55]201Index:pandas進(jìn)行數(shù)據(jù)的鬼牌(行級(jí)索引行級(jí)索引元數(shù)可以和列名進(jìn)行交換,也可以進(jìn)行堆疊和展開(kāi),達(dá)到Excel表效pd.Index(普通)Int64Index(數(shù)值型索引PeriodIndex(含周期的時(shí)間格式作為索引)直接定義普通索引,長(zhǎng)得就和普通的SeriesInindex_names=['a','b','c']Series_for_Indexindex_names=['a','b','c']Series_for_Index=pd.Series(index_names)printpd.Index(index_names)printIndex([u'a',u'b',u'c'],Index([u'a',u'b',u'c'],可惜Immutable,牢Inindex_names=['a','b','c']index0index_names=['a','b','c']index0=pd.Index(index_names)printindex0.get_values()index0[2]='d'['a''b' Traceback(mostrecentcall<ipython-input-31-f34da0a8623c>in2index0=3print>4index0[2]=/Users/wangweiyang/anaconda/anaconda/lib/python2.7/site-packages/pandas/core/index.pycin titem(self,key,value) setitem(self,key,->1057 raiseTypeError("Indexesdoesnotsupportmutableoperations") getitem(self,TypeError:Indexesdoesnotsupportmutable扔進(jìn)去一個(gè)含有多元組的List,就有了可惜,如果這個(gè)ListComprehension改成小括號(hào),就不對(duì)了InInmulti1multi1=pd.Index([('Row_'+str(x+1),'Col_'+str(y+1))forxinxrange(4)foryinxrange(4)])=['index1','index2']printMultiIndex(levels=[[u'Row_1',u'Row_2',u'Row_3',u'Row_4'],[u'Col_1',u'Col_2',u'Col_3',u'labels=[[0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3],[0,1,2,3,0,1,2,0,1,2,3,0,1,2,對(duì)于Series來(lái)說(shuō),如果擁有了多重Index,數(shù)據(jù)下列代碼In=Row_10123Row_24567Row_389Row_4dtype:In0123456789InIn0123456789我們來(lái)看一下非平衡數(shù)據(jù)的例子Row_1,2,3,4和Col_1,2,3,4并不是全組Inmulti2multi2=pd.Index([('Row_'+str(x+1),'Col_'+str(y+1))forxinxrange(5)foryinxrange(x)])MultiIndex(levels=[[u'Row_2',u'Row_3',u'Row_4',u'Row_5'],[u'Col_1',u'Col_2',u'Col_3',u'labels=[[0,1,1,2,2,2,3,3,3,3],[0,0,1,0,1,2,0,1,2,In=pd.Series(np.arange(10),index=Row_20Row_312Row_4345Row_56789dtype:In0123456789In0123456789DateTime標(biāo)準(zhǔn)庫(kù)如此好用,你值得Indatesdates=[datetime.datetime(2015,1,1),datetime.datetime(2015,1,8),datetime.datetime(2015,1,30)]DatetimeIndex(['2015-01-01','2015-01-08','2015-01-30'],dtype='datetime64[ns]',freq=None,t如果你不僅需要時(shí)間格式統(tǒng)一,時(shí)間頻率也要統(tǒng)一的Inperiodindex1periodindex1=pd.period_range('2015-01','2015-printPeriodIndex(['2015-01','2015-02','2015-03','2015-04'],dtype='int64',月級(jí)精度和日級(jí)精度如何轉(zhuǎn)有的公司統(tǒng)一以1號(hào)代表當(dāng)月,有的公司統(tǒng)一以最后一天代表當(dāng)月,轉(zhuǎn)化起來(lái)很麻煩,可以InprintprintprintPeriodIndex(['2015-01-01','2015-02-01','2015-03-01','2015-04-01'],dtype='int64',PeriodIndex(['2015-01-31','2015-02-28','2015-03-31','2015-04-30'],dtype='int64',最后的最后,我要真正把兩種頻率的時(shí)間精度匹配上Inperiodindex_monperiodindex_mon=pd.period_range('2015-01','2015-03',freq='M').asfreq('D',how='start')periodindex_day=pd.period_range('2015-01-01','2015-03-31',freq='D')printprintPeriodIndex(['2015-01-01','2015-02-01','2015-03-01'],dtype='int64',PeriodIndex(['2015-01-01','2015-01-02','2015-01-03','2015-01-'2015-01-05','2015-01-06','2015-01-07','2015-01-'2015-01-09','2015-01-10','2015-01-11','2015-01-'2015-01-13','2015-01-14','2015-01-15','2015-01-'2015-01-17','2015-01-18','2015-01-19','2015-01-'2015-01-21','2015-01-22','2015-01-23','2015-01-'2015-01-25','2015-01-26','2015-01-27','2015-01-'2015-01-29','2015-01-30','2015-01-31','2015-02-'2015-02-02','2015-02-03','2015-02-04','2015-02-'2015-02-06','2015-02-07','2015-02-08','2015-02-'2015-02-10','2015-02-11','2015-02-12','2015-02-'2015-02-14','2015-02-15','2015-02-16','2015-02-'2015-02-18','2015-02-19','2015-02-20','2015-02-'2015-02-22','2015-02-23','2015-02-24','2015-02-'2015-02-26','2015-02-27','2015-02-28','2015-03-'2015-03-02','2015-03-03','2015-03-04','2015-03-'2015-03-06','2015-03-07','2015-03-08','2015-03-'2015-03-10','2015-03-11','2015-03-12','2015-03-'2015-03-14','2015-03-15','2015-03-16','2015-03-'2015-03-18','2015-03-19','2015-03-20','2015-03-'2015-03-22','2015-03-23','2015-03-24','2015-03-'2015-03-'2015-03-

'2015-03-'2015-03-

'2015-03-28','2015-03-粗粒度數(shù)據(jù)Infull_tsfull_ts=pd.Series(periodindex_mon,index=periodindex_mon).reindex(periodindex_day,method='ffi2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-Freq:D,dtype:關(guān)于索引,方便的操作有前面描述過(guò)了,索引有序,重復(fù),但一定程度上又能通過(guò)key來(lái),也就是說(shuō),某些集合操作都是可以支持的Inindex1index1=index2=index3=pd.Index(['B','C','A'])printindex1.append(index2)printindex1.difference(index2)printindex1.union(index2)#Supportunique-valueIndexprintprintprintindex1.insert(0,'K')#Notprintindex3.drop('A')#Supportunique-valueIndexprintprintIndex([u'A',u'B',u'B',u'C',u'C',u'C',u'D',u'E',u'E',u'F'],dtype='object')Index([u'A',u'B'],dtype='object')Index([u'C',u'C'],Index([u'A',u'B',u'B',u'C',u'C',u'D',u'E',u'E',u'F'],dtype='object')[FalseFalseFalseTrueTrue]Index([u'A',u'B',u'C',u'C'],Index([u'K',u'A',u'B',u'B',u'C',u'C'],dtype='object')Index([u'B',u'C'],dtype='object')TrueTrueFalseFalseFalse大熊貓世界來(lái)去自如:Pandas的老生常談,從基礎(chǔ)來(lái)看,我們?nèi)匀魂P(guān)心pandas對(duì)于與外部數(shù)據(jù)是如何交互read_csv與 是一對(duì)輸入輸出的工具,read_csv直接返回pandas.DataFrame,而to_csv只要執(zhí)行命令即可寫(xiě)文read_fwf:操作fixedwidthfileread_excel與to_excel方便的與excel還記得剛開(kāi)始的例子嗎names表示要用給定的列名來(lái)作為最終的列名encoding表示數(shù)據(jù)集的字符編碼,通常而言一份數(shù)據(jù)為了方便的進(jìn)行文件傳輸都以u(píng)tf-8作為標(biāo)準(zhǔn) Inprintprintirisdata=pd.read_csv('S1EP3_Iris.txt',header=None,names=cnames,encoding='utf-8')['sepal_length','sepal_width','petal_length','petal_width','class']0Iris-1Iris-2Iris-3Iris-4Iris-5Iris-6Iris-7Iris-8Iris-9Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-150希望ws×5解全部參數(shù)的s移步(這(這里介紹一些常用的參數(shù)處理skiprows:跳過(guò)一定的行 一定的行數(shù)skipfooter:尾部固定的行數(shù)永不 內(nèi)容處理sep/delimiter:分隔符很重要,常見(jiàn)的有逗號(hào),空格和na_values:指定應(yīng)該被當(dāng)作na_valuesthousands:處值類(lèi),每分隔不統(tǒng)一 (1.234.567,89或1,234,567.89,此要把串轉(zhuǎn)為符收尾處理 2.1.xExcel...對(duì)于著極為規(guī)整數(shù)據(jù)的Excel而言,其實(shí)是沒(méi)必要一定用Excel來(lái)存,盡管Pandas也十分友好的提供了I/O接口Inirisdata.to_excel('S1EP3_irisdata.xls',index=None,encoding='utf-8')irisdata_from_excelirisdata.to_excel('S1EP3_irisdata.xls',index=None,encoding='utf-8')irisdata_from_excel=pd.read_excel('S1EP3_irisdata.xls',header=0,encoding='utf-8')0Iris-1Iris-2Iris-3Iris-4Iris-5Iris-6Iris-7Iris-8Iris-9Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-150rows×5唯一重要的參數(shù):sheetname=k,標(biāo)志著一個(gè)excel的第k個(gè)sheet頁(yè)將會(huì)被取出。(從0開(kāi)始JSON:網(wǎng)絡(luò)傳輸中常用的一種數(shù)據(jù)key可能元數(shù)據(jù)被保存在數(shù)Injson_datajson_data=data_employee=pd.read_json(json.dumps(json_data))data_employee_ri=data_employee.reindex(columns=['name','job','sal','report'])012深入Pandas數(shù)在第一部分的基礎(chǔ)上,數(shù)據(jù)會(huì)有種方式按記錄拼接(就像UnionAll)或者關(guān)聯(lián)(join)缺失值處與Excel一樣靈活的數(shù)據(jù)表(在第四部分更詳細(xì)介紹橫向拼接:直接In012InIn012012012可以使用left_on和In012In012根據(jù)index關(guān)聯(lián),可以直接使用left_index和InIndata_employee_='index1'data_employee_='index1'012TIPS:增加how關(guān)鍵字,并how='inner'how='left'how='right'結(jié)合how,可以看到merge基本再現(xiàn)了SQL應(yīng)有的功能,并保持代碼IndataNumPy32dataNumPy32=np.asarray([('Japan','Tokyo',4000),('S.Korea','Seoul',1300),('China','Beijing',9DF32=pd.DataFrame(dataNumPy,columns=['nation','capital','GDP'])012InIndefdefGDP_Factorize(v):fv=np.float64(v)iffv>6000.0:returneliffv<returnreturnDF32['GDP_Level']=DF32['GDP'].map(GDP_Factorize)DF32['NATION']=DF32.nation.map(str.upper)012sort:按一列或者多列的值進(jìn)行行級(jí)排sort_index:根據(jù)index里的取值進(jìn)行排序,而且可以根據(jù)axis決定是重還是IndataNumPy33dataNumPy33=np.asarray([('Japan','Tokyo',4000),('S.Korea','Seoul',1300),('China','Beijing',9DF33=pd.DataFrame(dataNumPy,columns=['nation','capital','GDP'])012In102InIn012In210In012一個(gè)好用的功In023213212113InIn021211232331注意tieddata(相同值)的處理:method='average'method='min'method='max'method='first'InDF34DF34=data_for_multi2.unstack()0123456789忽略缺失Indtype:如果不想忽略缺失值的話,就需要祭出fillnaInIn In“一組”大熊貓:Pandas的groupby的功能類(lèi)似SQL的groupby關(guān)鍵字Split,就是按照規(guī)Pandas的groupby的靈活I(lǐng)nfromfromIPython.displayimportImage分組的具Inirisdata_groupirisdata_group=irisdata.groupby('class')<pandas.core.groupby.DataFrameGroupByobjectatInforforlevel,subsetDFinprintprintIris-sepal_lengthsepal_widthpetal_length 0.2Iris- 0.2Iris- 0.2Iris- 0.2Iris- 0.2Iris- 0.4Iris- 0.3Iris- 0.2Iris- 0.2Iris- 0.1Iris- 0.2Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-sepal_lengthsepal_widthpetal_length 1.4Iris- 1.5Iris- 1.5Iris- 1.3Iris- 1.5Iris- 1.3Iris- 1.6Iris- 1.0Iris- 1.3Iris- 1.4Iris- 1.0Iris- 1.5Iris- 1.0Iris- 1.4Iris- 1.3Iris- 1.4Iris- 1.5Iris- 1.0Iris- 1.5Iris- 1.1I

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論