Python數(shù)據(jù)分析與數(shù)據(jù)挖掘第9章數(shù)據(jù)分析

上傳人：老*** IP屬地：江蘇上傳時(shí)間：2023-08-31 格式：PPTX 頁(yè)數(shù)：54 大?。?.98MB 積分：6 舉報(bào) 版權(quán)申訴

Python數(shù)據(jù)分析與數(shù)據(jù)挖掘第9章數(shù)據(jù)分析_第2頁(yè)

Python數(shù)據(jù)分析與數(shù)據(jù)挖掘第9章數(shù)據(jù)分析_第3頁(yè)

Python數(shù)據(jù)分析與數(shù)據(jù)挖掘第9章數(shù)據(jù)分析_第4頁(yè)

Python數(shù)據(jù)分析與數(shù)據(jù)挖掘第9章數(shù)據(jù)分析_第5頁(yè)

已閱讀5頁(yè)，還剩49頁(yè)未讀，繼續(xù)免費(fèi)閱讀

版權(quán)說(shuō)明：本文檔由用戶(hù)提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

第9章數(shù)據(jù)分析Python數(shù)據(jù)分析與數(shù)據(jù)挖掘9.1統(tǒng)計(jì)分析描述性統(tǒng)計(jì)匯總統(tǒng)計(jì)參數(shù)估計(jì)與假設(shè)檢驗(yàn)相關(guān)性分析9.1.1描述性統(tǒng)計(jì)頻數(shù)分析集中趨勢(shì)分析離散程度分析其他頻數(shù)分析pandas.DataFrame.value_counts(subset

None,

normalize=

False,

sort=True,

ascending=

False)>>>df

#原始數(shù)據(jù)num_legs

num_wings>>>df.value_counts()

#統(tǒng)計(jì)df各系列數(shù)據(jù)取值的組合情況的頻次num_legs

num_wingsfalcon22402dog40641cat4001ant60221bee64dtype:int64falcon22dog40cat40ant60bee64頻數(shù)分析pandas.Series.value_counts(normalize=False,

sort=True,ascending=False,

bins=None,

dropna=True)>>>df

#原始數(shù)據(jù)num_legs

num_wings>>>df["num_legs"].unique()

#統(tǒng)計(jì)

"num_legs"的取值array([2,

6],

dtype=int64)>>>df["num_legs"].value_counts(normalize=True)#統(tǒng)計(jì)取值頻次比例6

0.44

0.42

0.2Name:

num_legs,

dtype:

float64集中趨勢(shì)分析函數(shù)說(shuō)明.mean()計(jì)算數(shù)據(jù)中各系列的均值；.median()計(jì)算數(shù)據(jù)中各系列的中位數(shù)值；.mode()計(jì)算數(shù)據(jù)中各系列的眾數(shù)值；.quantile()計(jì)算數(shù)據(jù)中各系列的四(n)分位數(shù)值。集中趨勢(shì)分析>>>

df=pd.DataFrame(np.random.randint(1,100,(50,3)),columns=list("ABC"))>>>df.mean()

#計(jì)算各數(shù)據(jù)系列的均值A(chǔ)

48.32B

47.28C

48.54dtype:

float64>>>df.median()#計(jì)算各數(shù)據(jù)系列的中位數(shù)值A(chǔ)

47.5B

47.5C

55.0dtype:

float64>>>df["A"].mean()

#計(jì)算pandas.Series的均值48.32集中趨勢(shì)分析0.2529.7527.5021.750.5047.5047.5055.000.7576.7567.2573.001.0096.0099.0095.000

362

66dtype:

int32>>>df.quantile(q=[0.0,0.25,0.5,0.75,1.0])#計(jì)算各數(shù)據(jù)系列的四分位數(shù)A

C0.00

1.00

4.00

1.00>>>

df["A"].mode()#計(jì)算pandas.Series的眾數(shù)值>>>

df["A"].quantile(q=0.35)#計(jì)算特定分位數(shù)值離散程度分析函數(shù)說(shuō)明.max().min()計(jì)算數(shù)據(jù)中各系列的最大值和最小值.std

()計(jì)算數(shù)據(jù)中各系列的標(biāo)準(zhǔn)差值.mad()計(jì)算數(shù)據(jù)中各系列的平均絕對(duì)偏差.cov()計(jì)算數(shù)據(jù)中各系列的協(xié)方差值numpy.ptp()計(jì)算數(shù)據(jù)中各系列的極差值離散程度分析>>>df.std()#計(jì)算各數(shù)據(jù)系列的標(biāo)準(zhǔn)差值A(chǔ)

29.143165B

26.264774C

28.829697dtype:

float64969995dtype:

int32>>>

df.cov()A

849.324082

-126.866939

63.599184B

-126.866939

689.838367

9.743673C

63.599184

9.743673

831.151429產(chǎn)生數(shù)據(jù)>>>

df=pd.DataFrame(np.random.randint(1,100,(50,3)),columns=list("ABC"))>>>

df.max()

#計(jì)算各數(shù)據(jù)系列的最大值

>>>

df.mad()

#計(jì)算各數(shù)據(jù)系列的平均絕對(duì)偏差A(yù)

24.6656B

21.1200C

25.3184dtype:

float64其他>>>

df.describe()A

Ccount

50.00000

50.000000

50.000000mean

56.90000

50.900000

45.680000std30.62562

30.125757

30.615749min1.00000

2.000000

2.00000025%39.00000

21.000000

17.25000050%57.00000

55.500000

41.00000075%84.00000

73.500000

74.750000max99.00000

99.000000

99.000000>>>

df.aggregate(np.max,

axis=0)999999dtype:

int32>>>

df.apply(np.median,

axis=0)A

57.0B

55.5C

41.0dtype:

float64匯總統(tǒng)計(jì)時(shí)序數(shù)據(jù)匯總resample交叉表crosstable分類(lèi)匯總groupby數(shù)據(jù)透視表pivot_table時(shí)序數(shù)據(jù)匯總resamplepandas.DataFrame.resample(rule,

axis=0,

closed=None,

label=None,convention="start",

kind=None,

loffset=None,

base=None,

on=None,level=None,

origin="start_day",

offset=None)>>>df.resample("3T").sum()

#按3個(gè)樣本進(jìn)行匯總A

B2000-01-01

00:00:00

82000-01-01

00:03:00

82000-01-01

00:06:00

3>>>

df.resample("4min").mean()

#按4分鐘進(jìn)行匯總A

B2000-01-01

00:00:00

2.0

3.0000002000-01-01

00:04:00

3.0

2.333333時(shí)序數(shù)據(jù)匯總resamplepandas.DataFrame.resample(rule,

axis=0,

closed=None,

label=None,convention="start",

kind=None,

loffset=None,

base=None,

on=None,level=None,

origin="start_day",

offset=None)>>>

pd.DataFrame(np.random.randint(1,5,(7,2)),pd.date_range("1/1/2000",periods=7,freq="T"),

list("AB"))>>>

dfA

B2000-01-01