iccv2019論文全集8541-quaternion-knowledge-graph-embeddings_第1頁(yè)
iccv2019論文全集8541-quaternion-knowledge-graph-embeddings_第2頁(yè)
iccv2019論文全集8541-quaternion-knowledge-graph-embeddings_第3頁(yè)
iccv2019論文全集8541-quaternion-knowledge-graph-embeddings_第4頁(yè)
iccv2019論文全集8541-quaternion-knowledge-graph-embeddings_第5頁(yè)
已閱讀5頁(yè),還剩6頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、Quaternion Knowledge Graph Embeddings Shuai Zhang, Yi Tay, Lina Yao, Qi Liu University of New South Wales Nanyang Technological University,University of Oxford Abstract In this work, we move beyond the traditional complex-valued representations, introducing more expressive hypercomplex representatio

2、ns to model entities and relations for knowledge graph embeddings. More specifi cally, quaternion embed- dings, hypercomplex-valued embeddings with three imaginary components, are utilized to represent entities. Relations are modelled as rotations in the quaternion space. The advantages of the propo

3、sed approach are: (1) Latent inter-dependencies (between all components) are aptly captured with Hamilton product, encouraging a more compact interaction between entities and relations; (2) Quaternions enable expressive rotation in four-dimensional space and have more degree of freedom than rotation

4、 in complex plane; (3) The proposed framework is a generalization of ComplEx on hypercomplex space while offering better geometrical interpretations, concurrently satisfying the key desiderata of relational representation learning (i.e., modeling symmetry, anti-symmetry and inversion). Experimental

5、results demonstrate that our method achieves state-of-the-art performance on four well- established knowledge graph completion benchmarks. 1Introduction Knowledge graphs (KGs) live at the heart of many semantic applications (e.g., question answering, search, and natural language processing). KGs ena

6、ble not only powerful relational reasoning but also the ability to learn structural representations. Reasoning with KGs have been an extremely productive research direction, with many innovations leading to improvements to many downstream applications. However, real-world KGs are usually incomplete.

7、 As such, completing KGs and predicting missing links between entities have gained growing interest. Learning low-dimensional representations of entities and relations for KGs is an effective solution for this task. Learning KG embeddings in the complex spaceChas been proven to be a highly effective

8、 inductive bias, largely owing to its intrinsic asymmetrical properties. This is demonstrated by the ComplEx embedding method which infers new relational triplets with the asymmetrical Hermitian product. In this paper, we move beyond complex representations, exploring hypercomplex space for learning

9、 KG embeddings. More concretely, quaternion embeddings are utilized to represent entities and relations. Each quaternion embedding is a vector in the hypercomplex spaceHwith three imaginary componentsi,j,k, as opposed to the standard complex spaceCwith a single real componentrand imaginary component

10、i. We propose a new scoring function, where the head entityQhis rotated by the relational quaternion embedding through Hamilton product. This is followed by a quaternion inner product with the tail entity Qt. There are numerous benefi ts of this formulation. (1) The Hamilton operator provides a grea

11、ter extent of expressiveness compared to the complex Hermitian operator and the inner product in Euclidean space. The Hamilton operator forges inter-latent interactions between all ofr,i,j,k, resulting in Equal contribution. 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Va

12、ncouver, Canada. a highly expressive model. (2) Quaternion representations are highly desirable for parameterizing smooth rotation and spatial transformations in vector space. They are generally considered robust to sheer/scaling noise and perturbations (i.e., numerically stable rotations) and avoid

13、 the problem of Gimbal locks. Moreover, quaternion rotations have two planes of rotation2while complex rotations only work on single plane, giving the model more degrees of freedom. (3) Our QuatE framework subsumes the ComplEx method, concurrently inheriting its attractive properties such as its abi

14、lity to model symmetry, anti-symmetry, and inversion. (4) Our model can maintain equal or even less parameterization, while outperforming previous work. Experimental results demonstrate that our method achieves state-of-the-art performance on four well- established knowledge graph completion benchma

15、rks (WN18, FB15K, WN18RR, and FB15K-237). 2Related Work Knowledge graph embeddings have attracted intense research focus in recent years, and a myriad of embedding methodologies have been proposed. We roughly divide previous work into translational models and semantic matching models based on the sc

16、oring function, i.e. the composition over head (2) take the quaternion inner product between the rotated head quaternion and 3 1 i -1 -i ii=-1 Imaginary Axis Real Axis (a) 1 j i -k -1 -j -i k ij=k ji=-k (b) k -k 1i -i -j j (c) Figure 1: (a) Complex plane; (b) Quaternion units product; (c) sterograph

17、ically projected hypersphere in 3D space. The purple dot indicates the position of the unit quaternion. the tail quaternion to score each triplet. If a triplet exists in the KG, the model will rotate the head entity with the relation to make the angle between head and tail entities smaller so the pr

18、oduct can be maximized. Otherwise, we can make the head and tail entity be orthogonal so that their product becomes zero. Quaternion Embeddings of Knowledge Graphs More specifi cally, we use a quaternion matrix Q HNkto denote the entity embeddings andW HMkto denote the relation embeddings, wherekis

19、the dimension of embeddings. Given a triplet(h,r,t), the head entityhand the tail entity tcorrespond toQh= ah+bhi+chj+dhk:ah,bh,ch,dh RkandQt= at+bti+ctj+dtk :at,bt,ct,dt Rk, respectively, while the relationris represented byWr= ar+bri+crj+drk : ar,br,cr,dr Rk. Hamilton-Product-Based Relational Rota

20、tion We fi rst normalize the relation quaternionWrto a unit quaternionW/ r = p+qi+uj+vkto eliminate the scaling effect by dividingWrby its norm: W/ r(p,q,u,v) = Wr |Wr| = ar+ bri + crj + drk pa2 r+ b2r+ c2r+ d2r (4) We visualize a unit quaternion in Figure 1(c) by projecting it into a 3D space. We k

21、eep the unit hypersphere which passes throughi,j,kin place. The unit quaternion can be project in, on, or out of the unit hypersphere depending on the value of the real part. Secondly, we rotate the head entity Qhby doing Hamilton product between it and W/ r: Q0h(a0h,b0h,c0h,d0h) = Qh W/ r = (ah p b

22、h q ch u dh v) + (ah q + bh p + ch v dh u)i + (ah u bh v + ch p + dh q)j + (ah v + bh u ch q + dh p)k (5) wheredenotes the element-wise multiplication between two vectors. Right-multiplication by a unit quaternion is a right-isoclinic rotation on QuaternionQh. We can also swapQhandW/ r and do a left

23、-isoclinic rotation, which does not fundamentally change the geometrical meaning. Isoclinic rotation is a special case of double plane rotation where the angles for each plane are equal. Scoring Function and LossWe apply the quaternion inner product as the scoring function: (h,r,t) = Q0h Qt= ha0h,at

24、i + hb0h,bti + hc0h,cti + hd0h,dti(6) Following Trouillon et al. 2016, we formulate the task as a classifi cation problem, and the model parameters are learned by minimizing the following regularized logistic loss: L(Q,W) = X r(h,t) log(1 + exp(Yhrt(h,r,t) + 1k Q k2 2+2k W k 2 2 (7) Here we use the2

25、norm with regularization rates1and2to regularizeQandW, respectively. is sampled from the unobserved set0using negative sampling strategies such as uniform sampling, bernoulli sampling Wang et al., 2014, and adversarial sampling Sun et al., 2019. Note that the loss function is in Euclidean space, as

26、we take the summation of all components when computing the scoring function in Equation (6). We utilise Adagrad Duchi et al., 2011 for optimization. 4 Table 1: Scoring functions of state-of-the-art knowledge graph embedding models, along with their parameters, time complexity. “? denotes the circula

27、r correlation operation; “ denotes Hadmard (or element-wise) product. “ denotes Hamilton product. ModelScoring FunctionParametersOtime TransEk (Qh+ Wr) QtkQh,Wr,Qt RkO(k) HolEhWr,Qh? QtiQh,Wr,Qt RkO(k log(k) DistMulthWr,Qh,QtiQh,Wr,Qt RkO(k) ComplExRe(hWr,Qh, Qti)Qh,Wr,Qt CkO(k) RotatEk Qh Wr QtkQh,

28、Wr,Qt Ck,|Wri| = 1O(k) TorusEmin(x,y)(Qh+Qh)Wrk x y kQh,Wr,Qt TkO(k) QuatEQh W/ r Qt Qh,Wr,Qt HkO(k) InitializationFor parameters initilaization, we can adopt the initialization algorithm in Parcollet et al., 2018b tailored for quaternion-valued networks to speed up model effi ciency and conver- gen

29、ce Glorot and Bengio, 2010. The initialization of entities and relations follows the rule: wreal= cos(),wi= Q/ imgisin(),wj = Q/ imgjsin(),wk= Q / imgksin(), (8) wherewreal,wi,wj,wk denote the scalar and imaginary coeffi cients, respectively.is randomly generated from the interval,.Q/ imgis a normal

30、ized quaternion, whose scalar part is zero.is randomly generated from the interval 1 2k, 1 2k, reminiscent to the He initialization He et al., 2015. This initialization method is optional. 4.2Discussion Table 1 summarizes several popular knowledge graph embedding models, including scoring functions,

31、 parameters, and time complexities. TransE, HolE, and DistMult use Euclidean embeddings, while ComplEx and RotatE operate in the complex space. In contrast, our model operates in the quaternion space. Capability in Modeling Symmetry, Antisymmetry and Inversion . The fl exibility and representa- tion

32、al power of quaternions enable us to model major relation patterns at ease. Similar to ComplEx, our model can model both symmetry(r(x,y) r(y,x)and antisymmetry(r(x,y) qr(y,x) relations. The symmetry property of QuatE can be proved by setting the imaginary parts ofWrto zero. One can easily check that

33、 the scoring function is antisymmetric when the imaginary parts are nonzero. As for the inversion pattern(r1(x,y) r2(y,x), we can utilize the conjugation of quaternions. Conjugation is an involution and is its own inverse. One can easily check that: Qh W/ r Qt= Qt W/ r Qh(9) The detailed proof of an

34、tisymmetry and inversion can be found in the appendix. Composition patterns are commonplace in knowledge graphs Lao et al., 2011, Neelakantan et al., 2015. Both transE and RotatE have fi xed composition methods Sun et al., 2019. TransE composes two relations using the addition (r1+ r2) and RotatE us

35、es the Hadamard product (r1 r2). We argue that it is unreasonable to fi x the composition patterns, as there might exist multiple composition patterns even in a single knowledge graph. For example, suppose there are three persons“x,y,z”. If yis the elder sister (denoted asr1) ofxandzis the elder bro

36、ther (denoted asr2) ofy, we can easily infer thatzis the elder brother ofx. The relation betweenzandxisr2instead ofr1+ r2orr1 r2, violating the two composition methods of TransE and RotatE. In QuatE, the composition patterns are not fi xed. The relation betweenzandxis not only determined by relation

37、sr1andr2but also simultaneously infl uenced by entity embeddings. Connection to DistMult and ComplEx.Quaternions have more degrees of freedom compared to complex numbers. Here we show that the QuatE framework can be seen as a generalization of ComplEx. If we set the coeffi cients of the imaginary un

38、itsjandkto zero, we get complex embeddings as in ComplEx and the Hamilton product will also degrade to complex number multiplication. We 5 Table 2: Statistics of the data sets used in this paper. DatasetNM#training#validation#testavg. #degree WN184094318141442500050003.45 WN18RR409431186835303431342

39、.19 FB15K149511345483142500005907132.31 FB15K-23714541237272115175352046618.71 further remove the normalization of the relational quaternion, obtaining the following equation: (h,r,t) = Qh Wr Qt= (ah+ bhi) (ar+ bri) (at+ bti) = (ah ar bh br) + (ah br+ bh ar)i (at+ bti) = har,ah,ati + har,bh,bti + hb

40、r,ah,bti hbr,bh,ati (10) whereha,b,ci = P kakbkck denotes standard component-wise multi-linear dot product. Equation 10 recovers the form of ComplEx. This framework brings another mathematical interpretation for ComplEx instead of just taking the real part of the Hermitian product. Another interesti

41、ng fi nding is that Hermitian product is not necessary to formulate the scoring function of ComplEx. If we remove the imaginary parts of all quaternions and remove the normalization step, the scoring function becomes (h,r,t) = hah,ar,ati, degrading to DistMult in this case. 5Experiments and Results

42、5.1Experimental Setup Datasets Description:We conducted experiments on four widely used benchmarks, WN18, FB15K, WN18RR and FB15K-237, of which the statistics are summarized in Table 2. WN18 Bordes et al., 2013 is extracted from WordNet3, a lexical database for English language, where words are inte

43、rlinked by means of conceptual-semantic and lexical relations. WN18RR Dettmers et al., 2018 is a subset of WN18, with inverse relations removed. FB15K Bordes et al., 2013 contains relation triples from Freebase, a large tuple database with structured general human knowledge. FB15K-237 Toutanova and

44、Chen, 2015 is a subset of FB15K, with inverse relations removed. Evaluation Protocol:Three popular evaluation metrics are used, including Mean Rank (MR), Mean Reciprocal Rank (MRR), and Hit ratio with cut-off valuesn = 1,3,10. MR measures the average rank of all correct entities with a lower value r

45、epresenting better performance. MRR is the average inverse rank for correct entities. Hitn measures the proportion of correct entities in the topnentities. Following Bordes et al. 2013, fi ltered results are reported to avoid possibly fl awed evaluation. Baselines:We compared QuatE with a number of

46、strong baselines. For Translational Distance Models, we reported TransE Bordes et al., 2013 and two recent extensions, TorusE Ebisu and Ichise, 2018 and RotatE Sun et al., 2019; For Semantic Matching Models, we reported DistMult Yang et al., 2014, HolE Nickel et al., 2016, ComplEx Trouillon et al.,

47、2016 , SimplE Kazemi and Poole, 2018, ConvE Dettmers et al., 2018, R-GCN Schlichtkrull et al., 2018, and KNGE (ConvE based) Wang et al., 2018. Implementation Details:We implemented our model using pytorch4and tested it on a single GPU. The hyper-parameters are determined by grid search. The best mod

48、els are selected by early stopping on the validation set. In general, the embedding sizekis tuned amongst50,100,200,250,300. Regularization rate1and2are searched in0,0.01,0.05,0.1,0.2 . Learning rate is fi xed to 0.1without further tuning. The number of negatives (#neg) per training sample is select

49、ed from 1,5,10,20. We create10batches for all the datasets. For most baselines, we report the results in the original papers, and exceptions are provided with references. For RotatE (without self-adversarial negative sampling), we use the best hyper-parameter settings provided in the paper to reprod

50、uce the results. We also report the results of RotatE with self-adversarial negative sampling and denote it as a-RotatE. Note that we report three versions of QuatE: including QuatE with/without type constraints, QuatE with N3 regularization and reciprocal learning. Self-adversarial negative samplin

51、g Sun et al., 2019 is not used for QuatE. All hyper-parameters of QuatE are provided in the appendix. 3/ 4/ 6 Table 3: Link prediction results on WN18 and FB15K. Best results are in bold and second best results are underlined. : Results are taken from Ni

52、ckel et al., 2016; ?: Results are taken from Kadlec et al., 2017; : Results are taken from Sun et al., 2019. a-RotatE denotes RotatE with self-adversarial negative sampling. QuatE1: without type constraints; QuatE2: with N3 regularization and reciprocal learning; QuatE3: with type constraints. WN18F

53、B15K ModelMRMRRHit10Hit3Hit1MRMRRHit10Hit3Hit1 TransE-0.4950.9430.8880.113-0.4630.7490.5780.297 DistMult?6550.7970.946-42.20.7980.893- HolE-0.9380.9490.9450.930-0.5240.7390.7590.599 ComplEx-0.9410.9470.9450.936-0.6920.8400.7590.599 ConvE3740.9430.9560.9460.935510.6570.8310.7230.558 R-GCN+-0.8190.964

54、0.9290.697-0.6960.8420.7600.601 SimplE-0.9420.9470.9440.939-0.7270.8380.7730.660 NKGE3360.9470.9570.9490.942560.730.8710.7900.650 TorusE-0.9470.9540.9500.943-0.7330.8320.7710.674 RotatE1840.9470.9610.9530.938320.6990.8720.7880.585 a-RotatE3090.9490.9590.9520.944400.7970.8840.8300.746 QuatE13880.9490

55、.9600.9540.941410.7700.8780.8210.700 QuatE2-0.9500.9620.9540.944-0.8330.9000.8590.800 QuatE31620.9500.9590.9540.945170.7820.9000.8350.711 Table 4: Link prediction results on WN18RR and FB15K-237. : Results are taken from Nguyen et al., 2017; ?: Results are taken from Dettmers et al., 2018; : Results

56、 are taken from Sun et al., 2019. WN18RRFB15K-237 ModelMRMRRHit10Hit3Hit1MRMRRHit10Hit3Hit1 TransE 33840.2260.501-3570.2940.465- DistMult?51100.430.490.440.392540.2410.4190.2630.155 ComplEx?52610.440.510.460.413390.2470.4280.2750.158 ConvE?41870.430.520.440.402440.3250.5010.3560.237 R-GCN+-0.2490.41

57、70.2640.151 NKGE41700.450.5260.4650.4212370.330.5100.3650.241 RotatE32770.4700.5650.4880.4221850.2970.4800.3280.205 a-RotatE33400.4760.5710.4920.4281770.3380.5330.3750.241 QuatE134720.4810.5640.5000.4361760.3110.4950.3420.221 QuatE2-0.4820.5720.4990.436-0.3660.5560.4010.271 QuatE323140.4880.5820.508

58、0.438870.3480.5500.3820.248 5.2Results Table 5: MRR for the models tested on each relation of WN18RR. Relation NameRotatEQuatE3 hypernym0.1480.173 derivationally_related_form0.9470.953 instance_hypernym0.3180.364 also_see0.5850.629 member_meronym0.2320.232 synset_domain_topic_of0.3410.468 has_part0.

59、1840.233 member_of_domain_usage0.3180.441 member_of_domain_region0.2000.193 verb_group0.9430.924 similar_to1.0001.000 The empirical results on four datasets are reported in Table 3 and Table 4. QuatE performs extremely competitively com- pared to the existing state-of-the-art models across all metrics. As a q

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論