版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)
文檔簡介
Chapter
3Basic
Data
Mining
Techniques3.1
Decision
Trees(For
classification)9/18/2023AI&DM19/18/2023AI&DM2Introduction:
Classification—A
Two-Step
Process1.
Model
construction:
build
a
model
that
can
describe
a
set
ofpredetermined
classesPreparation:
Each
tuple/sample
is
assumed
to
belong
to
a
predefinedclass,
labeled
by
theoutput
attribute
or
class
label
attributeThis
set
of
examples
is
used
for
model
construction:
training
setThe
model
can
be
represented
as
classification
rules,
decision
trees,
ormathematical
formulaeEstimate
accuracy
of
themodelThe
known
label
of
test
sample
is
compared
with
the
classified
result
fromthe
modelAccuracy
rate
is
the
percentage
of
testing
set
samples
that
are
correctlyclassified
by
the
modelNote:
Test
set
is
independent
of
training
set,
otherwise
over-fitting
willoccur2.
Model
usage:
use
the
model
to
classify
future
or
unknownobjects9/18/2023AI&DM3Classification Process
(1):Model
ConstructionTrainingDataClassificationAlgorithmsClassifier(Model)IF
rank=
‘professor’OR
years
>
6THEN
tenured
=
‘y3
es’Classification Process
(2):
Usethe
Model
in
PredictionClassifierTestingDataUnseen
Data(Jeff,
Professor,4)Tenured?41 Example
(1):
Training
DatasetAnexamplefromQuinlan’s
ID3(1986)9/19/2023AI&DM51
Example
(2):Output:
A
Decision
Tree
for
“buys_computer”age?overcaststudent?credit
rating?noyesfairexcellent<=30>40no9/19/2023AI&DM6noyesyesyes30..409/19/2023AI&DM72
Algorithm
for
Decision
Tree
BuildingBasic
algorithm
(a
greedyalgorithm)Tree
isconstructed
in
a
top-down
recursivedivide-and-conquer
mannerAt
start,
all
the
training
examples
are
at
the
rootAttributes
are
categorical
(if
continuous-valued,
they
are
discretized
inadvance)Examples
are
partitioned
recursively
based
on
selected
attributesTest
attributes
are
selected
on
the
basis
of
a
heuristic
orstatisticalmeasure
(e.g.,
information
gain)Conditions
for
stoppingpartitioningAll
samples
for
a
given
node
belong
to
the
sameclassThere
are
no
remaining
attributes
for
further
partitioning
–
majorityvoting
is
employed
for
classifying
the
leafThere
are
no
samples
leftReachthe
pre-setaccuracyInformation
Gain(信息增益)(ID3/C4.5)Select
the
attribute
with
the
highest
information
gainAssume
there
are
two
classes,
P
and
NLet
the
set
of
examples
S
contain
p
elements
of
class
P
and
nelements
of
classNThe
amount
of
information,
needed
to
decide
if
an
arbitraryexample
inS
belongs
toP
or
N
is
defined
as9/19/2023AI&DM8Information
Gain
in
Decision
Tree
BuildingAssume
that
using
attribute
A,
a
set
S
will
bepartitioned
into
sets
{S1,
S2
,
…,
Sv}If
Si
containspi
examples
of
P
and
ni
examples
of
N,theentropy(熵),or
the
expected
information
needed
toclassify
objects
in
all
subsets
Si
isThe
encoding
information
that
would
be
gained
bybranching
onA9/19/2023AI&DM9Attribute
Selection
by
InformationGainComputation
Class
P:buys_computer
=
“yes”
Class
N:buys_computer
=
“no”
I(p,
n)
=
I(9,
5)=0.940
Compute
the
entropy
forage:HenceSimilarly=
0.940-0.69=0.259/19/2023AI&DM109/19/2023AI&DM113.
Decision Tree
RulesAutomate
rule
creationRules
simplification
and
eliminationA
default
rule
ischosen9/19/2023AI&DM123.1
Extracting
Classification
Rules
from
TreesRepresent
the
knowledge
in
the
form
of
IF-THENrulesOne
rule
is
created
for
each
path
from
the
root
to
a
leafRules
are
easier
for
humans
tounderstandExampleIF
age
=
“<=30”
AND
student
=
“no” THEN
buys_computer
=
“no”IF
age
=
“<=30”
AND
student
=
“yes”
THENbuys_computer
=“yes”IFage=
“31…40” THEN
buys_computer
=
“yes”IFage=
“>40” AND
credit_rating
=
“excellent” THEN
buys_computer
=“yes”IF
age
=
“>40”
AND
credit_rating=
“fair”
THEN
buys_computer
=
“no”9/19/2023AI&DM13A
Rule
for the
Tree
in Figure
3.4IF
Age
<=43
&
Sex
=
Male&
Credit
Card
Insurance
=
NoTHEN
Life
Insurance
Promotion
=
No(accuracy
=
75%,
Figure
3.4)A
Simplified Rule
Obtained
byRemoving Attribute
AgeIF
Sex
=
Male
&
Credit
Card
Insurance
=
NoTHEN
Life
Insurance
Promotion
=
No(accuracy
=
83.3% (5/6),
Figure
3.5)3.2 Rules
simplification
and
eliminationFigure
3.4A
three-node
decisiontreefor
the
credit
card
database14Figure
3.5
Atwo-node
decision
tree
for
the
credit
card
database9/19/2023AI&DM159/19/2023AI&DM164.
Further
discussionAttributes
with
more
valuesaccuracy
/
splitsGainRatio(A)
=
Gain(A)
/
SplitInfo(A)Numerical
attributesbinarysplitStopping
conditionMore
than
2
valuesOther
Methods
for
building
decisiontreesID3C4.5CARTCHAID9/19/2023AI&DM175.
General
consideration:Advantages
of
Decision
TreesEasy
tounderstand.Map
nicely
to
a
set
of
production
rules.Applied
to
real
problems.Make
no
prior
assumptions
about
the
data.Able
to
process
both
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
- 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 二零二五版裝配式廠房買賣合同范本3篇
- 二零二五年方木產(chǎn)業(yè)園區(qū)建設(shè)與購銷合作合同3篇
- 二零二五版快遞物流服務(wù)合同匯編3篇
- 二零二五年度空壓機設(shè)備零配件供應(yīng)與倉儲合同3篇
- 二零二五年文化活動兼職主持人聘任合同范本2篇
- 2025版快遞驛站快遞服務(wù)場地租賃及配套設(shè)施合同模板2篇
- 二零二五年無線基站場地天面租賃及維護合同3篇
- 二零二五版能源企業(yè)安全生產(chǎn)責任合同3篇
- 二零二五版建筑工程混凝土材料綠色認證合同文本2篇
- 二零二五年知識產(chǎn)權(quán)貸款抵押擔保合同標準版2篇
- 團隊成員介紹
- 水泵行業(yè)銷售人員工作匯報
- 《流感科普宣教》課件
- 離職分析報告
- 春節(jié)家庭用電安全提示
- 醫(yī)療糾紛預(yù)防和處理條例通用課件
- 廚邦醬油推廣方案
- 乳腺癌診療指南(2024年版)
- 高三數(shù)學寒假作業(yè)1
- 保險產(chǎn)品創(chuàng)新與市場定位培訓課件
- (完整文本版)體檢報告單模版
評論
0/150
提交評論