《人工智能與數(shù)據(jù)挖掘教學ppt課件 》_第1頁
《人工智能與數(shù)據(jù)挖掘教學ppt課件 》_第2頁
《人工智能與數(shù)據(jù)挖掘教學ppt課件 》_第3頁
《人工智能與數(shù)據(jù)挖掘教學ppt課件 》_第4頁
《人工智能與數(shù)據(jù)挖掘教學ppt課件 》_第5頁
已閱讀5頁,還剩21頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)

文檔簡介

Chapter

3Basic

Data

Mining

Techniques3.1

Decision

Trees(For

classification)9/18/2023AI&DM19/18/2023AI&DM2Introduction:

Classification—A

Two-Step

Process1.

Model

construction:

build

a

model

that

can

describe

a

set

ofpredetermined

classesPreparation:

Each

tuple/sample

is

assumed

to

belong

to

a

predefinedclass,

labeled

by

theoutput

attribute

or

class

label

attributeThis

set

of

examples

is

used

for

model

construction:

training

setThe

model

can

be

represented

as

classification

rules,

decision

trees,

ormathematical

formulaeEstimate

accuracy

of

themodelThe

known

label

of

test

sample

is

compared

with

the

classified

result

fromthe

modelAccuracy

rate

is

the

percentage

of

testing

set

samples

that

are

correctlyclassified

by

the

modelNote:

Test

set

is

independent

of

training

set,

otherwise

over-fitting

willoccur2.

Model

usage:

use

the

model

to

classify

future

or

unknownobjects9/18/2023AI&DM3Classification Process

(1):Model

ConstructionTrainingDataClassificationAlgorithmsClassifier(Model)IF

rank=

‘professor’OR

years

>

6THEN

tenured

=

‘y3

es’Classification Process

(2):

Usethe

Model

in

PredictionClassifierTestingDataUnseen

Data(Jeff,

Professor,4)Tenured?41 Example

(1):

Training

DatasetAnexamplefromQuinlan’s

ID3(1986)9/19/2023AI&DM51

Example

(2):Output:

A

Decision

Tree

for

“buys_computer”age?overcaststudent?credit

rating?noyesfairexcellent<=30>40no9/19/2023AI&DM6noyesyesyes30..409/19/2023AI&DM72

Algorithm

for

Decision

Tree

BuildingBasic

algorithm

(a

greedyalgorithm)Tree

isconstructed

in

a

top-down

recursivedivide-and-conquer

mannerAt

start,

all

the

training

examples

are

at

the

rootAttributes

are

categorical

(if

continuous-valued,

they

are

discretized

inadvance)Examples

are

partitioned

recursively

based

on

selected

attributesTest

attributes

are

selected

on

the

basis

of

a

heuristic

orstatisticalmeasure

(e.g.,

information

gain)Conditions

for

stoppingpartitioningAll

samples

for

a

given

node

belong

to

the

sameclassThere

are

no

remaining

attributes

for

further

partitioning

majorityvoting

is

employed

for

classifying

the

leafThere

are

no

samples

leftReachthe

pre-setaccuracyInformation

Gain(信息增益)(ID3/C4.5)Select

the

attribute

with

the

highest

information

gainAssume

there

are

two

classes,

P

and

NLet

the

set

of

examples

S

contain

p

elements

of

class

P

and

nelements

of

classNThe

amount

of

information,

needed

to

decide

if

an

arbitraryexample

inS

belongs

toP

or

N

is

defined

as9/19/2023AI&DM8Information

Gain

in

Decision

Tree

BuildingAssume

that

using

attribute

A,

a

set

S

will

bepartitioned

into

sets

{S1,

S2

,

…,

Sv}If

Si

containspi

examples

of

P

and

ni

examples

of

N,theentropy(熵),or

the

expected

information

needed

toclassify

objects

in

all

subsets

Si

isThe

encoding

information

that

would

be

gained

bybranching

onA9/19/2023AI&DM9Attribute

Selection

by

InformationGainComputation

Class

P:buys_computer

=

“yes”

Class

N:buys_computer

=

“no”

I(p,

n)

=

I(9,

5)=0.940

Compute

the

entropy

forage:HenceSimilarly=

0.940-0.69=0.259/19/2023AI&DM109/19/2023AI&DM113.

Decision Tree

RulesAutomate

rule

creationRules

simplification

and

eliminationA

default

rule

ischosen9/19/2023AI&DM123.1

Extracting

Classification

Rules

from

TreesRepresent

the

knowledge

in

the

form

of

IF-THENrulesOne

rule

is

created

for

each

path

from

the

root

to

a

leafRules

are

easier

for

humans

tounderstandExampleIF

age

=

“<=30”

AND

student

=

“no” THEN

buys_computer

=

“no”IF

age

=

“<=30”

AND

student

=

“yes”

THENbuys_computer

=“yes”IFage=

“31…40” THEN

buys_computer

=

“yes”IFage=

“>40” AND

credit_rating

=

“excellent” THEN

buys_computer

=“yes”IF

age

=

“>40”

AND

credit_rating=

“fair”

THEN

buys_computer

=

“no”9/19/2023AI&DM13A

Rule

for the

Tree

in Figure

3.4IF

Age

<=43

&

Sex

=

Male&

Credit

Card

Insurance

=

NoTHEN

Life

Insurance

Promotion

=

No(accuracy

=

75%,

Figure

3.4)A

Simplified Rule

Obtained

byRemoving Attribute

AgeIF

Sex

=

Male

&

Credit

Card

Insurance

=

NoTHEN

Life

Insurance

Promotion

=

No(accuracy

=

83.3% (5/6),

Figure

3.5)3.2 Rules

simplification

and

eliminationFigure

3.4A

three-node

decisiontreefor

the

credit

card

database14Figure

3.5

Atwo-node

decision

tree

for

the

credit

card

database9/19/2023AI&DM159/19/2023AI&DM164.

Further

discussionAttributes

with

more

valuesaccuracy

/

splitsGainRatio(A)

=

Gain(A)

/

SplitInfo(A)Numerical

attributesbinarysplitStopping

conditionMore

than

2

valuesOther

Methods

for

building

decisiontreesID3C4.5CARTCHAID9/19/2023AI&DM175.

General

consideration:Advantages

of

Decision

TreesEasy

tounderstand.Map

nicely

to

a

set

of

production

rules.Applied

to

real

problems.Make

no

prior

assumptions

about

the

data.Able

to

process

both

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論