大數(shù)據(jù)經(jīng)濟(jì)學(xué)_第1頁
大數(shù)據(jù)經(jīng)濟(jì)學(xué)_第2頁
大數(shù)據(jù)經(jīng)濟(jì)學(xué)_第3頁
大數(shù)據(jù)經(jīng)濟(jì)學(xué)_第4頁
大數(shù)據(jù)經(jīng)濟(jì)學(xué)_第5頁
已閱讀5頁,還剩30頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、u 引引 言言u(píng) 什么是大數(shù)據(jù)什么是大數(shù)據(jù) u 大數(shù)據(jù)的使用大數(shù)據(jù)的使用u 大數(shù)據(jù)下的經(jīng)濟(jì)和政策分析大數(shù)據(jù)下的經(jīng)濟(jì)和政策分析u大數(shù)據(jù)的挑戰(zhàn)大數(shù)據(jù)的挑戰(zhàn)u數(shù)據(jù)科學(xué)家應(yīng)具備的條件數(shù)據(jù)科學(xué)家應(yīng)具備的條件u自從美國(guó)奧巴馬總統(tǒng)將大數(shù)據(jù)列為美國(guó)科技發(fā)展戰(zhàn)略以自從美國(guó)奧巴馬總統(tǒng)將大數(shù)據(jù)列為美國(guó)科技發(fā)展戰(zhàn)略以來,大數(shù)據(jù)受到社會(huì)各界和媒體的高度關(guān)注來,大數(shù)據(jù)受到社會(huì)各界和媒體的高度關(guān)注u幾年之前人們沒有聽說的幾年之前人們沒有聽說的“數(shù)據(jù)科學(xué)家數(shù)據(jù)科學(xué)家”突然變得異常突然變得異?;鸨?,社會(huì)對(duì)火爆,社會(huì)對(duì)“數(shù)據(jù)科學(xué)家數(shù)據(jù)科學(xué)家”的需求異常高漲,他們的薪的需求異常高漲,他們的薪酬也隨之水漲船高。酬也隨之水漲船高。u但

2、但“數(shù)據(jù)科學(xué)家數(shù)據(jù)科學(xué)家”的供給卻非常有限,為此,一些北美的供給卻非常有限,為此,一些北美高校開始設(shè)立數(shù)據(jù)科學(xué)本科和碩士學(xué)位項(xiàng)目。高校開始設(shè)立數(shù)據(jù)科學(xué)本科和碩士學(xué)位項(xiàng)目。u大數(shù)據(jù)如此受歡迎,還應(yīng)歸功于奧巴馬的總統(tǒng)競(jìng)選;他大數(shù)據(jù)如此受歡迎,還應(yīng)歸功于奧巴馬的總統(tǒng)競(jìng)選;他們通過數(shù)據(jù)科學(xué)家對(duì)大量數(shù)據(jù)的分析,獲取了募捐和廣告?zhèn)兺ㄟ^數(shù)據(jù)科學(xué)家對(duì)大量數(shù)據(jù)的分析,獲取了募捐和廣告方面的優(yōu)勢(shì)方面的優(yōu)勢(shì) u數(shù)據(jù)科學(xué)家成功的預(yù)測(cè)了奧巴馬競(jìng)選連任。數(shù)據(jù)科學(xué)家成功的預(yù)測(cè)了奧巴馬競(jìng)選連任。u微軟的數(shù)據(jù)科學(xué)家成功地預(yù)測(cè)世界杯比賽結(jié)果,擊敗了微軟的數(shù)據(jù)科學(xué)家成功地預(yù)測(cè)世界杯比賽結(jié)果,擊敗了所有其他的預(yù)測(cè),包括所有其他的預(yù)測(cè)

3、,包括IBM 的數(shù)據(jù)科學(xué)家。的數(shù)據(jù)科學(xué)家。 u數(shù)據(jù)科學(xué)家在中國(guó)也有巨大需求。數(shù)據(jù)科學(xué)家在中國(guó)也有巨大需求。u經(jīng)濟(jì)理論一般高度簡(jiǎn)化,假設(shè)經(jīng)濟(jì)理論一般高度簡(jiǎn)化,假設(shè)“其他因素不變其他因素不變”。在實(shí)。在實(shí)際中,際中,“其他因素其他因素”是變化的。是變化的。 如果如果“其他因素其他因素”變化變化聊,理論結(jié)果還有效嗎?這就是所謂的比較分析,經(jīng)濟(jì)理聊,理論結(jié)果還有效嗎?這就是所謂的比較分析,經(jīng)濟(jì)理論很少做,也很難做。論很少做,也很難做。u在實(shí)證分析中,我們深知在實(shí)證分析中,我們深知“其他因素其他因素”都在變,但因缺都在變,但因缺乏數(shù)據(jù),不得不忽略他們。乏數(shù)據(jù),不得不忽略他們。u以前,經(jīng)濟(jì)活動(dòng)的數(shù)據(jù)記錄下

4、來的很少,有的大都是總以前,經(jīng)濟(jì)活動(dòng)的數(shù)據(jù)記錄下來的很少,有的大都是總體數(shù)據(jù)。今天,計(jì)算機(jī)技術(shù)和英特網(wǎng)改變了一切。體數(shù)據(jù)。今天,計(jì)算機(jī)技術(shù)和英特網(wǎng)改變了一切。u當(dāng)你在百度上搜索時(shí),搜素的關(guān)鍵詞及訪問的網(wǎng)站都記當(dāng)你在百度上搜索時(shí),搜素的關(guān)鍵詞及訪問的網(wǎng)站都記錄在案。當(dāng)你在淘寶上逛街的時(shí)候,每一項(xiàng)游覽活動(dòng)和每錄在案。當(dāng)你在淘寶上逛街的時(shí)候,每一項(xiàng)游覽活動(dòng)和每一項(xiàng)購(gòu)買都記錄在案。當(dāng)你在網(wǎng)上閱讀、看錄像、聊天或一項(xiàng)購(gòu)買都記錄在案。當(dāng)你在網(wǎng)上閱讀、看錄像、聊天或者查看你的個(gè)人金融狀況是,你的行為都記錄在案。者查看你的個(gè)人金融狀況是,你的行為都記錄在案。u短信、微信、推特、手機(jī)、超市的攝像機(jī)和取付款機(jī)、短

5、信、微信、推特、手機(jī)、超市的攝像機(jī)和取付款機(jī)、銀行的提款機(jī),道路口和各種場(chǎng)合的攝像鏡頭等各種電子銀行的提款機(jī),道路口和各種場(chǎng)合的攝像鏡頭等各種電子通訊設(shè)備都留下了數(shù)據(jù)的腳印。通訊設(shè)備都留下了數(shù)據(jù)的腳印。 u大數(shù)據(jù)是通過各種手段記錄下來的數(shù)據(jù);它有可能是實(shí)大數(shù)據(jù)是通過各種手段記錄下來的數(shù)據(jù);它有可能是實(shí)時(shí)的、非結(jié)構(gòu)化的、復(fù)雜的大量數(shù)據(jù)時(shí)的、非結(jié)構(gòu)化的、復(fù)雜的大量數(shù)據(jù)u例一,例一,Consider the data collected by retail stores. Few decades ago, stores might have collected data on daily sales

6、, and it would have been considered high quality if the data was split by products or product categories. Nowadays, scanner data makes it possible to track individual purchases and item sales, capture the exact time at which they occur and the purchase histories of individuals, and use electronic in

7、ventory data to link purchases to specific shelf locations uExample 1. Internet retailers observe not just this information, but can trace individuals behavior around the sale, including his or her initial search queries, items viewed and discarded, recommendations and promotions that were shown and

8、 subsequent product or seller review.uIn principle, these data could be linked to demographics, advertising exposure, social media activity, offline spending or credit history uExample 2. There has been a parallel evolution in business activity. As firms have moved their day to day operations to com

9、puters then online, it has become possible to compile rich datasets of sales contacts, hiring practices, and physical shipments of goods. Increasingly, there are also electronic records of collaborative work efforts, personnel evaluations and productivity measures. uSame story can be told about the

10、public sector uThis is a lot of data. Whats exactly new about it?uData is now available faster, has greater coverage and scope and includes new types of observations and measurements that previously were not available. uA key aspect of such modern datasets is that they have much less or more structu

11、re than the traditional datasetsuData is available in real timeThe ability to capture and process large amount of data in real time is crucial for many business applications, but has not been used much in economic research and policy analysis. Perhaps this is because many economic questions are retr

12、ospective so that it is important for data to be detailed and accurate rather than available immediately. This may change in the future. uData is available in large scale A major change for economists is the scale of the modern datasets. Before, we worked with data with hundreds or thousands observa

13、tions and few variables. With small samples, statistical power was an important issue; the omitted variable bias was also a concern. Now datasets with tens of millions of observations and huge number of variables are common. Statistical power is no longer an issue. uData come with less structure. Th

14、e information available about a consumer may include her entire shopping history. With this information, it is possible to create an almost unlimited set of individual characteristics. While this is very powerful, it is also challenging. We are familiar with “rectangular” form of data with N observa

15、tions and K variables。uK is a lot smaller than N. uWhen data arrive in its raw form of digital recording of a sequence of events, with no further structure, there are a huge number of ways to move from that recording to the standard “rectangular” format. Figuring out how to organize unstructured dat

16、a and reduce its dimensionality and assessing whether the way we do this matters is not something we are capable of doing. uData is available on novel types of variables. Much of the data now being recorded is on activities that previously were very difficult to observe. Email or geo-location data r

17、ecords where people have been. Social network data captures personal connections. Most economists believe that social connections play an important role in job search, in shaping consumer preferences and in the transmission of information. The challenge is in figuring out how to make effective use o

18、f these data, which may have novel structures. Traditional econometrics assume cross sectional independence or grouped as in panel data or linked by time. But individuals in a social network may be connected in highly complex ways. Indeed the point of econometric modeling may be to uncover exactly w

19、hat are the key features of this dependence structure. Developing methods that are suited to these settings is an interesting challenge for an econometric research. uThe most common uses of big data are tracking business processes and outcomes, and for building a wide array of predictive models. Whi

20、le business analytics is a big deal and surely has improved the efficiency of many organizations, predictive modeling lies behind many of the information products and services introduced in recent years . uAmazon and Netflix recommendations rely on predictive model of what book or movie an individua

21、l might want to purchase uGoogle (Baidu) search results and news feed rely on algorithms that predict the relevance of particular web pages or articles uApples auto-complete tries to predict the rest of ones text or email. uOnline advertising and marketing rely on automated predictive models that at

22、tempt to target individuals who most likely to respond to offersuIn health care, it is common for insurers to adjust payments and quality measures based on “risk scores”, which are derived from predictive models of individual health costs and outcomes. An individuals risk score is a weighted sum of

23、health indicators that identify whether an individual has different chronic conditions. uCredit card companies use predictive models of default and repayment to guide their underwriting, pricing and marketing activities uBanks use predictive models of deposit and withdrawal to manage their cash hold

24、ingsuCompanies use predictive models of demand to schedule production and manage inventory and supply chain uPredictive models can also be used to detect fraudulent activities and to manage risk uAll these applications rely on converting large amount of unstructured data into “vertical” or predictiv

25、e scores, often in a fully automated and scalable way, and sometimes in real time. The scores can be used in various ways. First, they can speed up or automate the existing processes (Amazon recommendation recommends items that it predicts to be relevant for a given consumer, replacing a recommendat

26、ion one could have obtained from a libarian ). uSecond, they can be used to offer new services (Apple auto-complete takes the word or sentence with the top score and proposes it as the auto-completion). uFinally, the scores can be used to support decision making (credit card fraud; the transaction s

27、core is reported to the issuing bank , and most banks implement some policy that dictates which transaction scores are approved, which are rejected, which need further investigation ) uData on tracking business processes and outcomes can be used to improve efficiencyuTargeted pricing uUse stock tran

28、saction data to arbitrage uThere has been a remarkable amount of work on the statistical and machine learning techniques that underlie these applications: classification models, lasso and ridge regressions, data mining, text mining, etc. uA conceptual overview of building predictive models: N observ

29、ations and K variables; K is very large, often larger than N. With these types of data, we often get perfect fit within sample but poor prediction out of sample. Solution? Lasso uMachine learning models assume stable environment; but this assumption may not be satisfied if individuals respond to the

30、 change (Lucas critique) uGovernment collect or could collect a large amount of detailed micro-level data. These data can be used for tracking economic activities, evaluating policies, fighting fraud, risk control, support decision making and for developing new information services and products (ale

31、rting fraud, inform consumers about the consequence of their decisions such as taking out loans, purchasing houses and retirement decisions) uBig data provides a detailed snapshot of economic activity (almost) in real time. Therefore, big data allow for better measurements of economic effects and ou

32、tcomes, help to pose new sorts of research questions and enable novel research designs that can inform us about the consequences of different economic policies and eventsuBig data may change the way economists approach empirical questions and the tool they use to answer them uFor example, economists

33、 have not embraced some of the data mining tools. The reason is that economists do not want to shift away from the single covariate causal effects framework. In the mind of economists, there is a sharp distinction between predictive modeling and causal inference, and as a result statistical learning

34、 approaches have little to contribute uBig data may change that. uBig data enable novel research design uExample 1. Chetty et al (2012) studies the long term effects of better teaching. The study combines 2.5 million NY schoolchildren with their earnings as adults 20 years later. The main question i

35、s if the students of teachers who have higher “value-added” in the short run subsequently have higher earnings as adults, where teachers value added is measured by the amount that test scores are improved.The results are striking. The authors find that replacing a teacher in the bottom 5%with an ave

36、rage teacher raisesthe lifetime earnings of students bya quarter of a million dollars in present value terms.u Example 2. Internet commerce. Use detailed browsing and purchase data on the universe eBay customers (100 millions in the United States alone) to study the sales taxes on internet commerce.

37、 Aggregated data on state-to-state trade flows provide relatively standard estimates of tax elasticities, but we also use the detailed browsing data to obtain more micro-level evidence on tax responsiveness. Specifically, we find groups of individuals who clicked to view the same item, some ofwhom w

38、ere located in thesame state as the seller, and hence taxed, and some of whom were not, and hence went untaxed. We compare the purchasing propensities of the two groups, doing this for many thousands of items and millions of browsing sessions. We find significant tax responsiveness, and evidence of sub

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論