參考文獻viewvide_第1頁
參考文獻viewvide_第2頁
參考文獻viewvide_第3頁
參考文獻viewvide_第4頁
參考文獻viewvide_第5頁
已閱讀5頁,還剩7頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

1、Reconstruction of Multi-viewon GANBasedSong Li, Chengdong Lan(&), and Tiesong ZhaoSchool of Physics and Information Engineering, Fuzhou University, Fuzhou,lancdAbstract. There is a huge amount of data in multi-viewwhich bringsenormous challenges to the compression, storage, and transmission ofda

2、ta. Transmitting part of the viewpoint information is a prior solution to reconstruct the original multi-viewpoint information. They are all based on pixel matching to obtain the correlation between adjacent viewpoint images. How- ever, pixels cannot express the invariability of image features and a

3、re suscep- tible to noise. Therefore, in order to overcome the above problems, the VGG network is used to extract the high-dimensional features between the images, indicating the relevance of the adjacent images. The GAN is further used to more accurately generate virtual viewpoint images. We extrac

4、t the lines at the same positions of the viewpoints as local areas for image merging and input the local images into the network. In the reconstruction viewpoint, we generate a local image of a dense viewpoint through the GAN network. Experiments on multiple test sequences show that the proposed met

5、hod has a 0.20.8-dB PSNR and 0.150.61 MOS improvement over the traditional method.Keywords: Hybrid resolution Á SRGAN Á Virtual view reconstructionEPI Á Multi-view1IntroductionIn recent years, with the rapid development of computing and multimedia technology,immersivehas alsogreat pro

6、gress in order to satisfy users increasingdemand for high-quality visual experience 1. In 2009, the largest and most techno- logically advanced 3D movie, Avatar, was popular among people 2. Later, with the success and popularity of 3D digital movies, 3D TVs have begun to reach the public. Moreover,

7、VR has been applied in education 3, entertainment, health 4 and otherelds due to its good development prospects. Currently, Sony, and HTChave launched new products for consumers. To meet peoples thirsty for high-qualityvisual experience, an effective reconstruction technology is required in multi-vi

8、ew s.© Springer Nature Switzerland AG 2018R. Hong et al. (Eds.): PCM 2018, LNCS 11165, pp. 618629, 2018.Reconstruction of Multi-ViewBased on GAN6191.1Related WorkConsidering the transmission of data, the currently promising reconstruction tech-nologies in Multi-view Rendering (DIBR).are mainly

9、hybrid resolution and Depth Image BasedHybrid Resolution. Due to the limitations of data transmission and capacity storage, how to transmit high-quality super resolution viewpoints to users is a huge challenge in the related eld 5. Therefore, a super resolution technique under a multi-view hybrid re

10、solution framework 6 has been proposed, which mainly uses the high-frequency part of the high-resolution graph to increase the quality of adjacent low-resolution viewpoints. In 7, a hybrid resolution scheme that intercepts low-resolution and full- resolution viewpoints is proposed. In 8, an algorith

11、m based on the displacement compensation high-frequency synthesis method to correct the projection error was proposed. In 9, the adjacent images are interleaved and complementarily down- sampled at the coding end, and the missing pixels are interpolated and restored by the virtual viewpoint at the d

12、ecoding end.DIBR. DIBR mainly includes: 3D-War , pixel interpolation and hole lling. On the one hand, due to the inaccuracy of the depth map used in DIBR technology, on the other hand, there are still some void areas in the virtual viewpoint images after pixel interpolation, and the void needs to be

13、 lled. For inaccurate depth maps, the entire depth map is smoothed using a Gaussian lter and an asymmetric Gaussian low-pass lter 10. This technique causes geometric distortion in other without void regions; therefore, an adaptive edge smoothing lters 11 were successively proposed. Filling holes: In

14、 the spatial domain, in 12 uses the neighborhood interpolation method; in 13, the method of image fusion is proposed; in 14, the pixels are lled according to the priority of the void boundary. In the time domain, background updating techniques are used in 15. In 16, we use it as a reference algorith

15、m, the technique needs the adjacent view-points texture images and corresponding depth maps. Then, a large number of Gaussian mixture m s are used to separate the foreground pixels from the background pixels, and the pixel intensity is modied accordingly; and adaptive weighted mean generation of pix

16、els is used to restore the missing pixels of the back-ground image, correcting errors in War However, using the pixel value as a.similarity criterion to match the content ofadjacent viewpoints, the pixel change caused by the different viewpoints of the adjacent viewpoints will cause the inaccuracy o

17、f the depth information calculation, so the cal- culation based on the pixel level will directly affect the nal reconstruction result.1.2ContributionThe main contributions are as follows:We improve input of the network based on multi-view resolution local images as input.and take the hybridWe propos

18、e a multi-view reconstruction framework based on SRGAN. VGG is used to extract the high-dimensional features of the content and location of local images and use them to generate local images with dense viewpoints.620S. Li et al.We talk about the proposed network architecture in Sect. 2. The experime

19、ntal process and experimental results are described in Sects. 3 and 4 respectively. In the nal Sect. 5, the work of this article is summarized and prospected.2Proposed Multi-view ReconstructionThis section describes the proposed multi-view reconstruction method based on the SRGAN network 17. We have

20、 taken the hybrid resolution local images as input which contain information from different viewpoints. To solve the inaccuracy based on pixel-matching similarity, the VGG19 network 18 is used to extract high-dimensional features and used for the representation of correlation. In the network trainin

21、g, the map relation between the low-resolution and the high-resolution local image is learned.2.1Adversarial Network ArchitectureAs shown in Fig. 1, the adversarial network contains two networks, a generator and a discriminator. G generates a high-resolution Epipolar Plane Image (HREPI) from a low-r

22、esolution Epipolar Plane Image (LREPI). D is used to determine if the distribution is similar. D can help G achieve better performance by loss. Specically expressed as:HREPImin max E HREPIHREPI ½log D ðIÞIpðIÞhDtrainhGhDð1ÞLREPIþ EILREPI pG ðILREPI Þ

23、½logð1 À DhD ðGhD ðIÞÞÞFig. 1. GAN in multiview reconstruction, the gure above is t below is the discriminator network.erate network, the gureReconstruction of Multi-ViewBased on GAN621In teration network, there are B residual blocks which have the same dis-tr

24、ibution. Each residual block has two convolutional layers. After the convolutionallayer, batch-normalization is added, and ReLU is used as the activation function. Theconvolution layers are all with 3 Â 3 convolutional kernels and have 64 feature maps.In this network, the extracted viewpoint in

25、formation is synthesized by training subpixel convolutional layer and the resolution is improved. We have used single sub-pixel convolution as shown in generator. The function loss is the same as the SRGAN.2.2Relevance RepresentationInaccurate Representations of Correlation Redundancy Between Adjace

26、nt Views. In the process of transmission, the conventional method includes not only the original image and the low-resolution image but also the data of the part of the depth map. The depth map information represents the redundancy correlation between the viewpoints. The similarity of information is

27、 measured based on the error between pixel values. Therefore, it only uses the low-dimensional features to represent the correlation between viewpoints, resulting in inaccuracy and large redundancy.The SRGAN Network Uses High-Dimensional Features to Represent Correla- tions. In the SRGAN network, VG

28、G19 can extract the high-dimensional features oferated image and high digital image, as shown in Fig. 2. If filteri is used totextract a certain feature i, and pi is a receptive led with the high-dimensional feature ofthe lter, then there isð2ÞGin ¼ Pin à fliterinFig. 2. The 2nd

29、point-view of lovebird1 is convolved with 4 layers, the information of the “V” collar pi1 is extracted and assumed to be qu ed into a 3 * 3 matrix with a 3 * 3 fliteri, which is specically used to extract “V” collars, then when pi1 is convoluted with fliteri, a large valueGi1will be obtained. Simila

30、rly,Gi2 can be obtained by qufying and convolving with fourlayers.622S. Li et al.Where Gin represents the convolution result of the feature i of the n-th viewpoint. If the value is large, it means that the curve in the input content may activate the lter. If e > 0 exists:jGij À Gik jeð3

31、ÞWhere e is a small positive number relative to the G value. Then the high-dimensional feature i of the j-th viewpoint and the k-th viewpoint have high similarity. Therefore, we can use a kind of high-dimensional feature to express the same content information in multi-viewpoints. Then, the con

32、tent information of the occlusion can be generated from the high dimensional features extracted from other corresponding locations.In order to use high-dimensional features to represent the relevance of multi-view, we did the following:Synthesis EPI. In order to input multi-viewpoint information by

33、single image, we introduce Epipolar Plane Image (EPI) to multi-viewpoint signals. Due to the principle of polar plane, the same scene object captured by different viewpoints will appear on a diagonal line of the EPI image. The slope is related with the disparity and directly depends on the depth of

34、eld between the object and the viewpoint of the pho- tographing. Therefore, information on corresponding objects in different viewpoints can be gathered to the same image by using EPI 19, so the reconstruction can more easily utilize the correlation between the viewpoints. Based on the above analysi

35、s, we chose EPI as a representation of the input signal in the multi-view reconstruction framework for deep learning. Compared with natural images, EPI has a specic diagonal texture, as shown in Fig. 3. The EPI construction method is as follows.Fig. 3. From left to right, original image, EPI and its

36、 spectrum diagramLet K view images be I1; I2. .IK respectively. The denition matrix is a matrix whose m-th row is 1 and all other rows are 0. The size of the matrix Am is equal to the image size and is expressed as follows:230 0ÁÁÁ 001.67A ¼1 1. 11m.ð4Þm67.45.0MReconstr

37、uction of Multi-ViewBased on GAN623Then, the EPI can be expressed as:XKTE ¼ðI :  A Þð5Þmimi¼1Where T represents the matrix transpose. m represents the m-th row of the multi-view image, and K represents the total number of multi-view images.However, the above-menti

38、oned single-row pixel-level EPI is not suitable for the network; because a single-row pixel-level EPI is not conducive to extract the high- dimensional features. Thus, we extend the matrix whose n rows are 1, and obtain the local image (EPI consists of multiple lines of pixels), as shown in step 1 o

39、f Fig. 4, which can get hybrid resolution local images more easily and efciently.Fig. 4. The book arrival sequence with 16 viewpoints takes n rows of pixel synthesis local imagesCaptuybrid Resolution Local Images. As shown step 2 in Fig. 4, we reduce the resolution to 1/2 for all even-numbered colum

40、ns of viewpoint information whileleaving the information for the odd-numbered columns completely p. Then, weachieve hybrid resolution local images whose even-numbered columns of viewpoint are low resolution. Finally, we put hybrid resolution local images into SRGAN to train.2.3Multiview Reconstructi

41、onContent Information. From Sect. 2.2, it can be seen that the content information in the purple rhombus frame in the viewpoint 2 need to be synthesized in Fig. 5 and can be directly expressed by the high-dimensional features extracted by the viewpoints 1624S. Li et al.and 3 because the content info

42、rmation of the multi-viewpoints has high similarity. Moreover, Low-resolution information in even-numbered view-points can assist adjacent view-points to nish reconstruction.Location Information of Content. Because the EPI has a specic slash texture compared to the natural image, as shown by the two

43、 red slashes in the right gure in Fig. 5, we can reconstruct location information of content more accurately with obvious slash features. In Sect. 3, we also designed related experiments to verify this slash feature.Fig. 5. Reconstruction of multiple views, where purple represents content informatio

44、n, and green represents location information of content. (Color gure online)Block Effect. Blocking is mainly due to the fact that our input is obtained by splicing EPI blocks. During training, the network is likely to learn the segmentation features of the boundary, leading to signicant block effect

45、s; we perform post-processing by overlap and simply ltering operations.3 Experiment3.1 DatasetIn the multi-view transmission task, all of our information is known, but we articially determine which part of the data is transmitted in order to reduce the amount of data, and to reconstruct high-quality

46、 images on the decoding side. Therefore, we just use different content in the same scenario to test. Our datasets are from Nagoya University, etc. as following: Newspaper, Lovebird1, Lovebird2, Book_arrival and Balloon. Firstly, we obtain the images at intervals of 10 frames from the rst half of eac

47、h multi-view, and then compose the images into local images as a training set. The testset consists of the frames with a large difference in the content information extractedfrom the latter half of the.Reconstruction of Multi-ViewBased on GAN6253.2Trainings and ParametersAll of our networks are trai

48、ned on Nvidia GPU:1080Ti using 1080 local imagesfrom Nagoya et al.s multi-view dataset. We crop the 384 * 384 sub-local images of different HREPI images from the leftmost side. In order to optimize the network, an Adam gradient algorithm was used, where b1 = 0.9. Only the pixel mean square erroris u

49、sed when initializing terator, and the training is performed 100 times with alearning rate of 1e4 to avoid unnecessary local optimum. During training, 250 epochswere trained using 1e4, then the learning rate was reduced to the initial 10%, and 250 epochs were trained with 1e5. Our experiment is base

50、d on tensorflow and tensorlayer.3.3Slash Features of EPIIn Fig. 6, Since the slash features saved by the network training are: relative to the previous given viewpoint, the latter reconstruction viewpoint shifts upward with a xed slope, so the dense view EPI map with the overall upward slash texture

51、 feature can be well reconstructed in the right gure. What is interesting is that in the left gure, the reconstruction view downward with a xed slope occurs when we deliberately reversed the order of the view-points. Therefore, after the network is trained, the slash-like features of the EPI maps ar

52、e effectively learned. Moreover, the two slopes are opposite numbers to each other.Fig. 6. The green part represents the reconstructed viewpoint. During test, we deliberately reversed the local images view order in the left image, while the right image is the local image reconstructed from the norma

53、l viewpoint order. (Color gure online)3.4Experimental ResultFigure 7 illustrates the subjective quality for Newspapersequence. Figure 7(a)shows the original images, i.e. 10th original frame of the virtual view and green rect-angular boxes are utilized to mark the cropped and zoomed portion which is

54、shown in626S. Li et al.Fig. 7(b). Similarly, Fig. 7(c), (e), (g), (i) shows the view synthesis by bicubic, Gaussianmixture mand the proposed technique and Fig. 7(b), (d), (f), (h), (j) shows corre-sponding cropped and zoomed images. As can be seen from the Fig. 7, the bicubicwhich generate from the

55、network has a remarkable block effect, and the proposed hybrid resolution is more realistic than the Gaussian in terms of hair texture.Fig. 7. Original image (a), synthesis images (c, e, g, i), crop and zoom images (b, d, f, h, j) forNewspapersequence by the proposed method and three traditional met

56、hods.Figure 8 has demonstrated that the proposed method brings a large improvement inthe subjective quality when the network learned the well mapbetween the LREPIand the HREPI. Whats more, high-dimensional features can express more accurately in the content information without ghosting.Reconstructio

57、n of Multi-ViewBased on GAN627Fig. 8. (a) Original image (b) Image after bicubic (c) Image after learning inverse mapinGaussian mixed m(d) Previous hybrid resolution (e) Proposed hybrid resolutionTables 1 and 2 show the average PSNR and MOS comparison on Newspaper, Lovebird1, Book_arrival and Balloo

58、n four sequences. Proposed technique improves0.8 dB, 0.2 dB 0.2 dB oage for Newspaper, Lovebird1 and Balloon, but there isa 0.9 dB drop on Book_arrival. Similarly, in MOS score, proposed technique improves0.61, 0.15 0.32 oage for Newspaper, Lovebird1 and Balloon, but there is a 0.04drop on Book_arrival. We extracted 10 synthesis images in each sequence, then weinvited 10 points.s and teachers who major in multiviews

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論