軟工專業(yè)英語PPTChapter6_第1頁
軟工專業(yè)英語PPTChapter6_第2頁
軟工專業(yè)英語PPTChapter6_第3頁
軟工專業(yè)英語PPTChapter6_第4頁
軟工專業(yè)英語PPTChapter6_第5頁
已閱讀5頁,還剩35頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)

文檔簡介

1、我們畢業(yè)啦其實是答辯的標(biāo)題地方Taiyuan University of Technology軟件工程專業(yè)英語計算機(jī)科學(xué)與技術(shù)學(xué)院 計算機(jī)軟件學(xué)院SOFTWARE ENGINEERING ESSENTIALSCOMPETENCIESAfter you have read this chapter, you should be able to:Explain what big data is.Explain the 4V properties of big data: volume, velocity, variety, and veracity.Distinguish the categori

2、es of big data.Understand the big value of big data.Describe Jim Grays Fourth Paradigm.Understand the evolution of big data.Discuss the four kinds of challenges for big data.2022/8/6太原理工大學(xué).計算機(jī)科學(xué)與技術(shù)學(xué)院.計算機(jī)軟件學(xué)院2 We are experiencing unprecedented growth in the amount of data available in nearly every ar

3、ea, ranging from the physical world, including biology, astronomy, remote sensing etc., to human activities, including social networks, Internet, health, finance, economics, transportation, etc. 2022/8/6太原理工大學(xué).計算機(jī)科學(xué)與技術(shù)學(xué)院.計算機(jī)軟件學(xué)院3 These data are commonly called big data, and believed to contain great

4、 values and imply new opportunities. This chapter presents an overview of big data, including its definition, properties, and categories, as well as challenges it brings to us.2022/8/6太原理工大學(xué).計算機(jī)科學(xué)與技術(shù)學(xué)院.計算機(jī)軟件學(xué)院46.1 BIG DATA AND ITS PROPERTIES6.2 CATEGORIES OF BIG DATA6.3 BIG VALUE FOR BIG DATA6.4 JIM

5、 GRAYS FOURTH PARADIGM6.5 EVOLUTION OF DATA MANAGEMENTSUMMARYCHPTER 6 BIG DATA6.6 BIG DATA CHALLENGES2022/8/6太原理工大學(xué).計算機(jī)科學(xué)與技術(shù)學(xué)院.計算機(jī)軟件學(xué)院56.1 BIG DATA AND ITS PROPERTIES Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. Most profess

6、ionals in the industry consider multiple terabytes or petabytes to be the current big data benchmark.2022/8/6太原理工大學(xué).計算機(jī)科學(xué)與技術(shù)學(xué)院.計算機(jī)軟件學(xué)院6 Others, however, are hesitant to commit to a specific quantity, as the rapid pace of technological development may render todays concept of big as tomorrows norm. I

7、n the meantime, a consensus is reached on the properties of big data.2022/8/6太原理工大學(xué).計算機(jī)科學(xué)與技術(shù)學(xué)院.計算機(jī)軟件學(xué)院712WHAT IS BIG DATA?FOUR DIMENSIONS OF BIG DATA2022/8/6太原理工大學(xué).計算機(jī)科學(xué)與技術(shù)學(xué)院.計算機(jī)軟件學(xué)院8In a dynamic world, organizations have begun to more heavily rely on insights derived from their data to uncover new

8、facts and opportunities for science discovery or revenue growth. In the process of discovering and determining these insights, large complex datasets are generated that then must be managed, analyzed and manipulated by skilled professionals. The compilation of this large collection of data is collec

9、tively known as big data. The definition of big data from Wikipedia follows.”Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process data within a tolerable elapsed time. Big data “size” is a constantly moving target,

10、as of 2012 ranging from a few dozen terabytes to many petabytes of data. Big data is a set of techniques and technologies that require new forms of integration to uncover large hidden values from large datasets that are diverse, complex, and of a massive scale.”WHAT IS BIG DATA?2022/8/6太原理工大學(xué).計算機(jī)科學(xué)與

11、技術(shù)學(xué)院.計算機(jī)軟件學(xué)院9In a 2001 research report and related lectures, META Group (now Gartner) analyst Doug Laney defined data growth challenges and opportunities as being three-dimensional, i.e. increasing volume (amount of data), velocity (speed of data in and out), and variety (range of data types and sou

12、rces). In 2012, Gartner updated its definition as follows: Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.4 Additionally, a new V Veracity is added by s

13、ome organizations. FOUR DIMENSIONS OF BIG DATA2022/8/6太原理工大學(xué).計算機(jī)科學(xué)與技術(shù)學(xué)院.計算機(jī)軟件學(xué)院10VolumeThe size of the data determines the value and potential of the data under consideration, and whether it can actually be considered big data or not. Big data doesnt sample. It just observes and tracks what happens.

14、 VelocityVelocity is an indication of how quickly the data can be made available for analysis. Big data is often available in real-time.VarietyVariety references the different types of structured and unstructured data that organizations can collect, such as transaction data, video, audio, text and l

15、og files. Big data draws from all types of data.VeracityVeracity is an indication of data integrity and the ability for an organization to trust the data and be able to confidently use it to make crucial decisions.2022/8/6太原理工大學(xué).計算機(jī)科學(xué)與技術(shù)學(xué)院.計算機(jī)軟件學(xué)院11Different kind of data shows different properties.

16、For example, social network has more requirements on data volume and velocity rather than variety and veracity.2022/8/6太原理工大學(xué).計算機(jī)科學(xué)與技術(shù)學(xué)院.計算機(jī)軟件學(xué)院126.2 CATEGORIES OF BIG DATA2022/8/6太原理工大學(xué).計算機(jī)科學(xué)與技術(shù)學(xué)院.計算機(jī)軟件學(xué)院1312DATA FROM THE PHYSICAL WORLDDATA FROM HUMAN ACTIVITIES The amount of data in our world has

17、been exploding with our ability to acquire data enhancing. Big data is ubiquitous and can be partitioned into two categories:2022/8/6太原理工大學(xué).計算機(jī)科學(xué)與技術(shù)學(xué)院.計算機(jī)軟件學(xué)院14Scientific experiments and observations produce massive scientific data sets about the physical world. Sensor networks consisting of large n

18、umber of cheap sensors have been widely used to obtained natural data. Ocean Observatories Initiative uses electro-optically cabled observing systems to measure ocean activities in the northeast Pacific Ocean. Panoramic Survey Telescope and Rapid Response System (Pan-STARRS) produce 2.5PB data each

19、year. Over 5000 genome projects in 2010 produced several EB genomic data. Remote sensing produces even larger natural datasets.DATA FROM THE PHYSICAL WORLD2022/8/6太原理工大學(xué).計算機(jī)科學(xué)與技術(shù)學(xué)院.計算機(jī)軟件學(xué)院152022/8/6太原理工大學(xué).計算機(jī)科學(xué)與技術(shù)學(xué)院.計算機(jī)軟件學(xué)院16Big data sets produced by human activities, such as social networks, Intern

20、et, health, finance, economics, transportation, etc., attracted much attention in recent years. To gain a better understanding on how many data is being generated, consider the following noteworthy facts:Facebook currently holds more than 45 billion photos in its user database, a number that is grow

21、ing daily.According to IBM, users create 2.5 quintillion bytes of data every day. In practical terms, this means that 90% of the data in the world today has been created in the last two years alone.According to FICO, the credit card fraud detection system currently in place helps protect over two bi

22、llion accounts all over the globe. Walmart controls more than 1 million customer transactions every hour, which are then transferred into a database working with over 2.5 petabytes of information.DATA FROM HUMAN ACTIVITIES2022/8/6太原理工大學(xué).計算機(jī)科學(xué)與技術(shù)學(xué)院.計算機(jī)軟件學(xué)院176.3 BIG VALUE FOR BIG DATA Ginni Rometty, P

23、resident and Chief Executive Officer of IBM, has said that “Data is the worlds great new natural resource. What steam power was to the 18th century, electromagnetism to the 19th and fossil fuels to the 20th data will be to the 21st.” Like other essential factors of production such as hard assets and

24、 human capital, it is increasingly the case that much of modern economic activity, innovation, and growth simply couldnt take place without data.2022/8/6太原理工大學(xué).計算機(jī)科學(xué)與技術(shù)學(xué)院.計算機(jī)軟件學(xué)院18 While digital data might once have been concerned by only a few data geeks, big data is now relevant for leaders across

25、 every sector, and consumers of products and services to stand to benefit from its application. 2022/8/6太原理工大學(xué).計算機(jī)科學(xué)與技術(shù)學(xué)院.計算機(jī)軟件學(xué)院19 The combination of deepening investments in big data and managerial innovation to create competitive advantage and boost productivity is very similar to the way IT deve

26、loped from the 1970s onward. The experience of IT strongly suggests that we could be on the cusp of a new wave of productivity growth enabled by the use of big data. There are many ways that big data can be used to create value. Large companies across the globe have scored early successes in their u

27、se of big data. There are notable examples of companies around the globe that are well-known for their extensive and effective use of data. For instance, Tescos loyalty program generates a tremendous amount of customer data that the company mines to inform decisions from promotions to strategic segm

28、entation of customers. Amazon uses customer data to power its recommendation engine “you may also like ” based on a type of predictive modeling technique called collaborative filtering. 2022/8/6太原理工大學(xué).計算機(jī)科學(xué)與技術(shù)學(xué)院.計算機(jī)軟件學(xué)院20By making supply and demand signals visible between retail stores and suppliers

29、, Wal-Mart was an early adopter of vendor-managed inventory to optimize the supply chain. Harrahs, the US hotels and casinos group, compiles detailed holistic profiles of its customers and uses them to tailor marketing in a way that has increased customer loyalty. Progressive Insurance and Capital O

30、ne are both known for conducting experiments to segment their customers systematically and effectively and to tailor product offers accordingly. Smart, a leading wireless player in the Philippines, analyzes its penetration, retailer coverage, and average revenue per user at the city or town level in

31、 order to focus on the micro markets with the most potential.2022/8/6太原理工大學(xué).計算機(jī)科學(xué)與技術(shù)學(xué)院.計算機(jī)軟件學(xué)院21 McKinsey & Company observed how big data created values after in-depth research on the U.S. healthcare, the EU public sector administration, the U.S. retail, the global manufacturing, and the global pers

32、onal location data. Through research on the five core industries that represent the global economy, the McKinsey report pointed out that big data may give a full play to the economic function, improve the productivity and competitiveness of enterprises and public sectors, and create huge benefits fo

33、r consumers. 2022/8/6太原理工大學(xué).計算機(jī)科學(xué)與技術(shù)學(xué)院.計算機(jī)軟件學(xué)院22 McKinsey summarized the values that big data could create: if big data could be creatively and effectively utilized to improve efficiency and quality, the potential value of the U.S. medical industry gained through data may surpass USD 300 billion, th

34、us reducing the U.S. healthcare expenditure by over 8%; retailers that fully utilize big data may improve their profit by more than 60%; big data may also be utilized to improve the efficiency of government operations, such that the developed economies in Europe could save over EUR 100 billion (whic

35、h excludes the effect of reduced frauds, errors, and tax difference). 2022/8/6太原理工大學(xué).計算機(jī)科學(xué)與技術(shù)學(xué)院.計算機(jī)軟件學(xué)院23 The McKinsey report is regarded as prospective and predictive, while the following facts may validate the values of big data. During the 2009 flu pandemic, Google obtained timely information by

36、analyzing big data, which even provided more valuable information than that provided by disease prevention centers. Nearly all countries required hospitals inform agencies, such as disease prevention centers, of new type of influenza cases. However, patients usually did not see doctors immediately w

37、hen they got infected. It also took some time to send information from hospitals to disease prevention centers, and for disease prevention centers to analyze and summarize such information. Therefore, when the public is aware of the pandemic of a new type of influenza, the disease may have already s

38、pread for one to two weeks with a serious hysteretic nature. 2022/8/6太原理工大學(xué).計算機(jī)科學(xué)與技術(shù)學(xué)院.計算機(jī)軟件學(xué)院24 Google found that during the spreading of influenza, entries frequently sought at its search engines would be different from those at ordinary times, and the usage frequencies of the entries were correla

39、ted to the influenza spreading in both time and location. Google found 45 search entry groups that were closely relevant to the outbreak of influenza and incorporated them in specific mathematic models to forecast the spreading of influenza and even to predict places where influenza will spread from

40、. The related research results have been published in Nature. 2022/8/6太原理工大學(xué).計算機(jī)科學(xué)與技術(shù)學(xué)院.計算機(jī)軟件學(xué)院25 In 2008, Microsoft purchased Farecast, a sci-tech venture company in the U.S. Forecast has an airline ticket forecasting system that predicts the trends and rising/dropping ranges of airline ticket pric

41、es. The system has been incorporated into the Bing search engine of Microsoft. By 2012, the system has saved nearly USD 50 per ticket per passenger, with the forecast accuracy as high as 75%.2022/8/6太原理工大學(xué).計算機(jī)科學(xué)與技術(shù)學(xué)院.計算機(jī)軟件學(xué)院262022/8/6太原理工大學(xué).計算機(jī)科學(xué)與技術(shù)學(xué)院.計算機(jī)軟件學(xué)院276.4 JIM GRAYS FOURTH PARADIGM2022/8/6太原

42、理工大學(xué).計算機(jī)科學(xué)與技術(shù)學(xué)院.計算機(jī)軟件學(xué)院28FirstExperimental ScienceOriginally, there was just experimental science describing natural phenomena.SecondTheoretical ScienceThen in the last few hundred years, there was theoretical science, with Keplers Laws, Newtons Laws of Motion, Maxwells equations, and so on.ThirdCom

43、putational Science In the last few decades, for many problems, the theoretical models grew too complicated to solve analytically, and people had to start simulating. These simulations have carried us through much of the last half of the last millennium. At this point, these simulations are generatin

44、g a whole lot of data, along with a huge increase in data from the experimental sciences. People now do not actually look through telescopes. Instead, they are “l(fā)ooking” through large-scale, complex instruments which relay data to datacenters, and only then do they look at the information on their c

45、omputers.FourthData-intensive Science The world of science has changed, and there is no question about this. The new model is for the data to be captured by instruments or generated by simulations before being processed by software and for the resulting information or knowledge to be stored in compu

46、ters. Scientists only get to look at their data fairly late in this pipeline. The techniques and technologies for such data-intensive science are so different that it is worth distinguishing data-intensive science from computational science as a new, fourth paradigm for scientific exploration. Jim G

47、ray believes that scientific discovery has experienced four paradigms. 2022/8/6太原理工大學(xué).計算機(jī)科學(xué)與技術(shù)學(xué)院.計算機(jī)軟件學(xué)院296.5 EVOLUTION OF DATA MANAGEMENT2022/8/6太原理工大學(xué).計算機(jī)科學(xué)與技術(shù)學(xué)院.計算機(jī)軟件學(xué)院30People proposed “share nothing” a parallel database system, to meet the demand of the increasing data volume. The share nothing

48、 system architecture is based on the use of cluster and every machine has its own processor, storage, and disk. Teradata system was the first successful commercial parallel database system. Such database became very popular lately. In the 1980sThe concept of “database machine” emerged, which is a te

49、chnology specially used for storing and analyzing data. With the increase of data volume, the storage and processing capacity of a single mainframe computer system has become inadequate. In late 1970s2022/8/6太原理工大學(xué).計算機(jī)科學(xué)與技術(shù)學(xué)院.計算機(jī)軟件學(xué)院31The advantage of the parallel database was widely recognized in t

50、he database field.In late 1990sOn June 2, 1986, a milestone event occurred, when Teradata delivered the first parallel database system with a storage capacity of 1TB to Kmart to help the large-scale retail company in North America to expand its data warehouse. On June 2, 19862022/8/6太原理工大學(xué).計算機(jī)科學(xué)與技術(shù)學(xué)

51、院.計算機(jī)軟件學(xué)院32Another milestone event occurred, when EMC/IDC published a research report titled Extracting Values from Chaos, which introduced the concept and potential of big data for the first time. This research report aroused great interest in both industry and academia on big data.In June 2011Howe

52、ver, many challenges on big data arose. Contents generated by users, sensors, and other ubiquitous data sources drive the overwhelming data flows, which required a fundamental change on the computing architecture and large-scale data processing mechanism. Jim Gray, a pioneer of database software, ca

53、lled such transformation “The Fourth Paradigm”. He thought the only way to cope with such a paradigm was to develop a new generation of computing tools to manage, visualize, and analyze massive data. In January 20072022/8/6太原理工大學(xué).計算機(jī)科學(xué)與技術(shù)學(xué)院.計算機(jī)軟件學(xué)院33Over the past few years, nearly all major companie

54、s, including EMC, Oracle, IBM, Microsoft, Google, Amazon, and Facebook, etc., have started their big data projects. Taking IBM as an example, since 2005, IBM has invested USD 16 billion on 30 acquisitions related to big data.Industry2022/8/6太原理工大學(xué).計算機(jī)科學(xué)與技術(shù)學(xué)院.計算機(jī)軟件學(xué)院34In academia, big data was also u

55、nder the spotlight. In 2008, Nature published the big data special issue.In 2011, Science also launched a special issue on the key technologies of “data processing” in big data. In 2012, European Research Consortium for Informatics and Mathematics (ERCIM) News published a special issue on big data.

56、In the beginning of 2012, a report titled Big Data, Big Impact presented at the Davos Forum in Switzerland, announced that big data has become a new kind of economic assets, just like currency of gold. Gartner, an international research agency, issued Hype Cycles from 2012 to 2013, which classified

57、big data computing, social analysis, and stored data analysis into 48 emerging technologies that deserve most attention.Academia2022/8/6太原理工大學(xué).計算機(jī)科學(xué)與技術(shù)學(xué)院.計算機(jī)軟件學(xué)院352022/8/6太原理工大學(xué).計算機(jī)科學(xué)與技術(shù)學(xué)院.計算機(jī)軟件學(xué)院36Many national governments such as the U.S. also paid great attention to big data. In March 2012, the O

58、bama Administration announced a USD 200 million investment to launch the Big Data Research and Development Initiative, which was a second major scientific and technological development initiative after the Information Highway Initiative in 1993. In July 2012, the Japans ICT project issued by Ministr

59、y of Internal Affairs and Communications indicated that the big data development should be a national strategy and application technologies should be the focus. In July 2012, the United Nations issued Big Data for Development report, which summarized how governments utilized big data to better serve

60、 and protect their people. Government2022/8/6太原理工大學(xué).計算機(jī)科學(xué)與技術(shù)學(xué)院.計算機(jī)軟件學(xué)院376.6 BIG DATA CHALLENGES2022/8/6太原理工大學(xué).計算機(jī)科學(xué)與技術(shù)學(xué)院.計算機(jī)軟件學(xué)院38VolumeData volume poses the most noticeable challenge due to limited hardware capacity and software efficiency and effectiveness. Hardware capacity includes size and spee

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論