ELM-極限學(xué)習(xí)機(jī)-黃廣斌學(xué)術(shù)報(bào)告-講稿.doc

上傳人：燈*** IP屬地：河北上傳時(shí)間：2020-02-08 格式：DOC 頁數(shù)：29 大小：112KB 積分：15 舉報(bào) 版權(quán)申訴

ELM-極限學(xué)習(xí)機(jī)-黃廣斌學(xué)術(shù)報(bào)告-講稿.doc_第2頁

ELM-極限學(xué)習(xí)機(jī)-黃廣斌學(xué)術(shù)報(bào)告-講稿.doc_第3頁

ELM-極限學(xué)習(xí)機(jī)-黃廣斌學(xué)術(shù)報(bào)告-講稿.doc_第4頁

ELM-極限學(xué)習(xí)機(jī)-黃廣斌學(xué)術(shù)報(bào)告-講稿.doc_第5頁

已閱讀5頁，還剩24頁未讀，繼續(xù)免費(fèi)閱讀

版權(quán)說明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

Zhengyou Zhang: Okay. So its my pleasure to introduce Professor GuangBin Huang from Nanyang Technological University of Singapore, NTU. So he is talking about his long interest research in extreme learning machine. He graduated from Northeastern University in China in applied mathematics and has been with NTU for quite a few years. And hes associate editor of Neurocomputing and also IEEE Transactions on Systems, Man and Cybernetics. And hes organizing our workshop. You can advertise your conference later. So please. GuangBin Huang: Thanks, Dr. Zhang, for inviting me here. Its my honor to give a talk to introduce the extreme learning machine. This is actually the idea was initially initiated in 9 actually 2003. That was the first time we submitted papers. And then it become recognized by more and more researchers recently. So we just had a workshop in Australia last December. So we are going to have an international symposium on extreme learning machine that is coming December in China. Its Dr.Zhangs hometown, or nearby his hometown. So were hoping you can join. Okay. So what is extreme learning machine? Actually, this is a technical talk about talking about the kind of learning method. Such a learning method is different from the traditional learning method. So tuning is not required. So you will wish to know extreme learning machine I think its better to go back to review the traditional feedforward neural networks, including support vector machines. I assume many people consider support vector machines to not be known to neural networks. But actually in my opinion inaudible they have the same target, same architecture in some aspects, in some sense. So after we review feedforward neural networks, then we can go to the to introduce what is called extreme learning machine. Actually, extreme learning machine is very, very simple. Okay. During my talk I also wish to give the comparison between the year end and this square SVM. So finally I wish to show linkage between the ELM and also the traditional SVM, so we know whats the difference, whats the relationship. Okay. So the feedforward neural network, I have several types of frameworks, architectures. One of the popular ones is multilayer feedforward neural networks. But then in theory and also in applications people found a singlehidden layer is enough for us to handle all the applications in theory. So that means given any applications, we can design a singlehidden layer feedforward network. This singlehidden layer feedforward network can be useful as to approximate any continuous target function. Can be useful as to classify any disjoined regions. Okay. So for singlehidden layer feedforward networks, usually we also have the two type of popular architectures. The first one is socalled sigmoid type of feedforward network. So that mean the hidden layer here use sigmoid of type network. Okay. Sometime I call additive hidden nodes. That means the input of each hidden node is weighted sum of the input. Okay. So of course G here, usually people are using sigmoid type of function. But you can also write the output hidden layer output hidden node as uppercase G. So AIPI and X. X is input of the network. AIPI are the hidden parameters for each say for hidden node I, the AIPI are the parameters of the node I. All right. So this is the sigmoid type of hidden feedforward network. Of course another one is very, very popular is RBF network. So RBF network here, a hidden node output function is RBF function. So what you rewrite then in this computer format, we actually have the same socalled output function of the singlehidden layer feedforward network as sigmoid type network. So here always uppercase G. So these two type of networks are very interesting in the past two to three decades. Two groups wrote papers, researchers, to work on these two areas, and they consider them separate. And they use a different learning method for these two method networks. But generally for both type of networks, we have this socalled theorem. So given any target continuous function, FX, so the output of function of this singlehidden layer can be as close as to this target continuous function given any arrow. And definitely in series, we can find such a network. So that is the output of this network can be as close to this target function effects. Of course, in real applications, we do not know the target function, FX. All right. So your sigmoid processing will have a sampling. We only can sample the discrete samples. And we wish to learn these discrete samples, training samples. So we wish to adjust the parameters of the hidden layer and also the waste between the hidden layer to the output layer. Try to find some algorithm to learn these parameters to make sure the output of the network to approximate a target function. Okay. From learning point of view so given were given, say, N training samples, XI, TI, so we wish to have the network output of the network with respect to the input XJ incurred to your target output, TJ. Of course, in most cases, your output not exactly the same as your target. So there is some error there. Suppose the output of the network is OJ, so we wish to minimize this cost function. So in order to do this, many people, right, actually spend time finding different methods on how to tune the hidden layer parameters, AI, BI, and also the waste between the hidden layer to the output layer. That is beta I, which is I call output weight. All right. So that is the situation for the learning. So that is for approximation. But how about classification? So in my theory published in year 2000 we say as long as this kind of hidden singlehidden layer feedforward network can approximate any target function, this network in theory can be used for us to classify any disjoint regions. So this is a classification case. Yeah. : So theres a very wellknown result in neural networks that you probably know which says that in order for the inaudible to be valid, the number of units had to be very, very large and that if you can infinite inaudible whats called a Gaussian process. GuangBin Huang: Yeah, youre right. : So that gives inaudible indication of processing inaudible. GuangBin Huang: Ah. Okay. That is actually very useful theory. That one actually used for our further development. So that is why we come to extreme learning machine. That is a guide. But infinity number of hidden nodes usually not required in real applications. But in theory, in order to approximate any target function, say epsilon, the arrow reaching zero, then in that sense infinity number of hidden nodes needs to be given. But in real application we do not need that. So I will mention later. But that theory is very, very important. Okay. Without umhmm. : Also, recently many people observe that if you have many layers, actually you can inaudible single layer system, even if you have very large number of hidden units in a single layer system. GuangBin Huang: Youre right. So this will have another paper actually, I didnt mention here that is we prove instance of, say, three hidden layer. Just talk about two hidden layer. Compare one hidden layer of architecture. So two hidden layer of network usually need much fewer number of hidden nodes than singlehidden layer. : inaudible. GuangBin Huang: Yeah. I proved in theoretical and also showing in simulations. So that means that from learning capability point of view, my hidden layer looks more powerful. But that is actually not shown here. Okay. That one we can discuss later. Okay. So then you want to learn these kind of networks, so then in the past two decades most people use gradientbased learning method. So one of the most popular one is backpropagation. Of course and its variants. So many variants. People talk about just some parameter. They generate another learning method. Okay. Another method is called for RBF network. So talk about leastsquare method. But in this leastsquare method is some kind of something different from ELM, which I will introduce later. So singlehidden so single impact factor used in all hidden nodes. That means in all hidden nodes they use the all hidden nodes use the same impact factor inaudible. Okay. Sometime it call sigma. Right? Okay. So whats the drawbacks of those gradientbased method or so usually we will fear very difficult in research. So different group of researchers will count different network. Actually, intuitively speaking, they have the similar architectures, but usually RBF network researcher work on RBF. So feedforward network people work on feedforward network. So they consider the little difference. So we actually sometime waste resources. And also in all the network, users actually, you inaudible two users. Theres sometimes so many parameter for user to tune manually. Right? Its case by case. Okay. So its sometimes inconvenient for long expert users. Okay. So usually we also face overfitting issues. So this is why even too many use the number of hidden nodes use it in hidden layer. We were afraid its a problem, overfitting problem. Right? And also for RBF a local minimum. Right? So you kind of get the optimum solution. Usually get a local minimal solution. That is better for that local area but not for the general the entire applications. Of course timeconsuming. Timeconsuming not only on learning inaudible and also actually on human effort. Human has to spend time to find the socalled proper parameters, userspecified parameters. So we wish to overcome all these limitations constraints in the original learning methods. Now, lets look at support vector machine. Is there any relationship between the support vector machine and the traditional feedforward network. Of course, when SVM people talk about SVM, they never talk about neural network. They say theyre separate. So this is whythere is a story. So when I joined inaudible in 2004 before 2004, SVM paper is seldom appearing neural network conference. So then in 2004 the organizer of 2004, right at the top we have a committee, committee meeting, and they say why this year so many SVM papers come to neural network conference? Okay. So people consider these are different. But what I found, actually they are very close to each other. Generally speaking, they are the same, in the same network architecture. Lets look at the SVM. So SVM, of course, we talk a little bit about optimization objective. SVM is to minimize this formula. Right. So this objective function. So minimize the weight, the actual output of weights, apply the tuning error. On this you inaudible conditions. Right? But in looking at the final solution, decision of SVM is this decision. But what is this? K here is a kernel of the SVM inaudible is a parameter we want to find. Okay. But looking at this formula, this actually exactly is a singlehidden layer feedforward network. What is singlehidden layer? Singlehidden layer formed by the hiddens with these kernels. Right? So KX, X1, KX, X2, XI2, KX, XN. So this is the hidden layer. Right? This is the kernel hidden layer with this kernel. So what is the output weight, then? Output weight is F over 1, T1, R over I, TI, R over N, TN. That is the output weight. That is a better inaudible in feedforward network. Okay. Yeah. Please. : This is inaudible. GuangBin Huang: Okay. Yeah. This is not objective of the SVM. But finally it turn out to be in this formula. So I say from the architecture point of view. So finally they have a similar architecture. : inaudible was called new ways to train neural networks. So that was a connection from the beginning inaudible. GuangBin Huang: Yeah. Yeah. So actually yeah. You will talk about inaudible paper published in 1995. So WAPNIC phonetic, what is the socalled expectation inspiration to inaudible? So WAPNIC sayactually we have to go back to the original neural network. Okay. So inaudible actually 1962 inaudible studied much in that year feedforward network. In those days, we found has new idea how to tune in the hidden layer. So then inaudible said how about it? We considered a inaudible layer inaudible hidden layer. You could consider master hidden layer. Master hidden layer is output mapping function. Looks at output of mapping function. So then can we find out some way to just find us an output the last hidden layer output function. That is feature map function. Then you can inaudible. So then WAPNIC say, okay, so what is so the output function of the master hidden layer that can be considered is phi XI. But what is phi XI? I have no idea. So then you WAPNIC paper say how about it, let phi X inaudible. So we have this kind of constraint, because this is very important for classification. Under this constraint they finally get this. So although the output hidden layer mapping function, that is the SVM feature mapping function, phi X is inaudible. But we can find the corresponding kernel. So thats why I come to this stage. But I should Im inaudible. Im talking about from a structural point of view, if we go to this stage, final turn out to be in the same format. Of course from here the question is how to find other ITS. How to find other ITS is how to find this output inaudible. So you would consider the hidden layer feedforward network. Feedforward of hidden network also try to find the parameter. So SVM and the traditional BPN. So in this sense is either just different ways to find the parameters of the hidden layer feedforward network, singlehidden layer feedforward networks. SVM find it in SVM format. BP find it in BP format. RBF found it in RBF format. So this is why different people find methods in different ways. So then the question is can we unify them. Okay. So this is one of my socalled research work. Okay. Actually, in order to say to show the linkage between the SVM and the singlehidden layer feedforward network, I think that people better read these two papers. Okay. Because these two papers also give me some idea, inspire my ideas. And they build the linkage between the ELM and SVM first. So then I found this linkage also. Okay. So now lets go back to the lets go to talk about extreme learning machine. Of course extreme learning machine originally start from the neural networks. We come from neural networks. So those days we try to say, okay, we started from BP network. So BP we found is inefficient. But what is the sense, what is the original expectation for neural network? We try to eliminate human synching. The human can synch something, can find a solution very quickly. So the question is do we have tuning in human brain. But usually to me its neutron in there in most cases, right? Just find a solution theoretically or I can learn very fast. Right? Either very fast, either without tuning. But go back to the original learning method of the neural network. So whatever machine you see, always tuning there. Right? Always you had to find a parameter is there. Can we have some way to simplify the implementation of the computation intended in this method? So then we found we first we from all this inaudible we see O can be simplified in this singlehidden layer feedforward networks. So O is a hidden layer, AI, BI. But hidden layer here, hidden node may not be neural. You can be a kernel. You can be other subnetwork even. But each hidden layer had a parameter, AI, BI. Those are the parameters. inaudible real formally of each output of hidden each output function of the hidden node, right, details. We just write them in this compact format. Okay. So then the output of this output function of this network is FX equals summation of beta I, G AI, BI, X. G, AI, BI, X actually is the output function of the Is hidden node. Or it can be RBF little kernel. It can be socalled sigmoid type network if something else. Even can be long differential inaudible network. Okay. So then you see this hidden layer, the entire hidden layer, what is the output of the entire hidden layer. The entire hidden layer supposed to have supposed the entire hidden layer has a AO hidden nodes. So then the output layer, what is output layer? The output layer is HX, is that vector of these AO elements of this the output of functions, output of this AO hidden nodes. Right? Its a vector. HX here is the feature mapping, is a hidden layer socalled output mapping. Right? Okay. So in this case the question is in the traditional method, AI, BI, AAO, A1, B1, AO, BAO, O have to be tuning there. So this is why we have to have the gradientbased method or SVM method. Because, you see, if this parameter say parameter or hidden layer has been tuning, the result you obtain in the early stage in the output layer, right, beta I here, may not be optimum. You have to adjust the output layer again. If you adjust the output layer, the beta I, beta 1 to beta AO, then the parameter in the hidden layer may not be optimal further anymore. So you have to adjust again. So this is why you always have to adjust it, right, iteratively. So then luckily in our theory, we find tuning in the hidden layer actually is not required. All the parameters in the hidden layer can be rando

人人文庫> 全部分類> 行業(yè)資料 > 管理策劃

溫馨提示

1. 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽，若沒有圖紙預(yù)覽就沒有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間，僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理，對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請(qǐng)與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

ELM-極限學(xué)習(xí)機(jī)-黃廣斌學(xué)術(shù)報(bào)告-講稿.doc

文檔簡介

溫馨提示

最新文檔

評(píng)論

ELM-極限學(xué)習(xí)機(jī)-黃廣斌學(xué)術(shù)報(bào)告-講稿.doc

文檔簡介

溫馨提示

最新文檔

評(píng)論

相關(guān)文檔