自動化機器學(xué)習(xí):讓機器為你工作_第1頁
自動化機器學(xué)習(xí):讓機器為你工作_第2頁
自動化機器學(xué)習(xí):讓機器為你工作_第3頁
自動化機器學(xué)習(xí):讓機器為你工作_第4頁
自動化機器學(xué)習(xí):讓機器為你工作_第5頁
已閱讀5頁,還剩33頁未讀 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認(rèn)領(lǐng)

文檔簡介

1、Global Quantitative & Derivatives Strategy 05 December 2019Automated Machine LearningMake machine learning work for you: we explore the implications of a dividend cutWe write this research note on the back of our Q4 2019 DeepFin Investor Tutorial on automated machine learning. This is an exciting to

2、pic which is growing in demand and we explore the capabilities and practical applications. Specifically, in this research note we aim to find out the implications of a dividend cut.Automated Machine Learning(AutoML)AutoML is the process of automating aspects of the data science workflow. AutoML allo

3、ws you to remove the focus on how to do machine learning and instead centers on the output generated. Over the past few years, new sources of competitive advantage are emerging with the availability of alternative data sources as well as the application of new quantitative techniques of machine lear

4、ning to analyse these data. Further, the demand for machine learning and AI solutions has been growing at an exponential rate. AutoML aims to a) meet the demand for such solutions and b) efficiently and effectively search the growing space of available machine learning models.AutoML can significantl

5、y speed-up your machine learninganalysisOn the practical application we asked the question given a dividend cut can we understand what drives forward performance? We analysed 53k observations of stocks in GDM from 2008 2018. Using AutoML we prepare, build, runand receive results from 71 models in un

6、der 30 minutes(!). A typical data scientist coding + running the same models manually would take in excess of 80 hours. We enhanced the simulation through a grid-search and built 4,241 models: running these models took 9 hours. Such a task would be time-consuming and beyond reasonable complexity for

7、 a human to process. Executing on theand at scale allows AutoML to run models in a fraction of time comparedto traditional methods therefore focusing on outcomes rather than the data science and machine learning processesinvolved.AutoML promotes inspection and avoidsBlack-boxesMachine learning model

8、s particularly those that chain together multiple models are often labelled as black-boxes, lacking interpretability and explainability. AutoML actively encourages model evaluation and selection, providing a number of tools to review model accuracy and performance, understand how each independent mo

9、del works and providing a leaderboard to track out-of-sample performance. In our analysis we use metadata (meta-ML) to analyse the machine learning models output. AutoML empowers individuals to focus on the value-add reasoning and rationale and not the mundane aspects of model development and testin

10、g.AutoML suggests these are the drivers of performance following DPS cuts In answer to our question on what drives stock performance following a dividend cut?, AutoML suggests that Price Momentum, Volatility and Price/Book are important factors (see cover charts). Most stocks who have seen their con

11、sensus dividend expectation cut have falling LT Price Momentum. The bulk of stocks whose consensus dividend expectations have been cut are also cheap Book Value.Global Quantitative Strategy Ayub Hanif, PhD AC(44-20) 7742-5620 HYPERLINK mailto:ayub.hanif ay HYPERLINK mailto:ub.hanif ub.hanifBloomberg

12、 JPMA HANIF J.P. Morgan Securities plcKhuram Chaudhry AC(44-20) 7134-6297 HYPERLINK mailto:khuram.chaudhry khuram.chaudhryBloomberg JPMA CHAUDHRY J.P. Morgan Securities plcBerowne Hlavaty AC(61-2) 9003-8602 HYPERLINK mailto:berowne.d.hlavaty berowne.d.hlavatyJ.P. Morgan Securities Australia LimitedG

13、lobal Head of Quantitative and Derivatives StrategyMarko Kolanovic, PhD(1-212) 622-3677 HYPERLINK mailto:marko.kolanovic marko.kolanovicJ.P. Morgan Securities LLCFigure 1: What drives stock performance after a dividend cut?Source: J.P. Morgan Macro QDS, DataRobot Inc.Figure 2: Falling momentum stock

14、s more likely to face a dividend cutSource: J.P. Morgan Macro QDS, DataRobot Inc.See page 35 for analyst certification and important disclosures, including non-US analyst disclosures.J.P.Morgandoesandseekstodobusinesswithcompaniescoveredinitsresearchreports.Asaresult,investorsshouldbeawarethatmay a

15、of the of as a HYPERLINK / Table of Contents HYPERLINK l _bookmark0 Towards Automated Machine Learning:thetheory3 HYPERLINK l _bookmark5 Automated Machine Learning in practice AutoML with HYPERLINK l _bookmark5 DataRobot HYPERLINK l _bookmark6 Problemstatement7 HYPERLINK l _bookmark8 The data8 HYPER

16、LINK l _bookmark10 Hands-on Modelling I: thenaveapproach10 HYPERLINK l _bookmark18 Hands-on Modelling II: aninformedapproach14 HYPERLINK l _bookmark27 The importance of therightquestion19 HYPERLINK l _bookmark31 A sample model from theGridSearch21 HYPERLINK l _bookmark33 Limitationsandenhancements23

17、 HYPERLINK l _bookmark34 Conclusions24 HYPERLINK l _bookmark35 Appendix I: Simple Definitions Style&Factor25 HYPERLINK l _bookmark36 Appendix II: FeatureAssociationMatrix26 HYPERLINK l _bookmark37 Appendix III: Alternate Model Frameworks and Theories HYPERLINK l _bookmark37 Considered HYPERLINK l _b

18、ookmark38 Appendix IV: Model Evaluation and Selection - HYPERLINK l _bookmark38 Leaderboards HYPERLINK l _bookmark39 Appendix V: Initial run candidate model development HYPERLINK l _bookmark39 workflow HYPERLINK l _bookmark40 Appendix VI: Meta-ML modeldevelopmentworkflow31 HYPERLINK l _bookmark42 Ap

19、pendix VII: Sample grid-search model development HYPERLINK l _bookmark42 workflowTowards Automated Machine Learning: the theoryMachine learning is fundamentally about extracting and generalising knowledge from information. While many traditional investors dont have a good understanding of the types

20、of data available, and feel uneasy about adopting machine learning methods, we pointed out in our Big Data Primer that these are not new concepts. On a limited basis, many investors already deal with alternative datasets and some form of machine learning. For instance Sam Walton, founder of Walmart,

21、 in the 1950s used airplanes to fly over and count cars on parking lots to assess real estate investments. The current extensive use of satellite imaging is a more technologically advanced, scalable and broadly available extension of the same idea.Beyond strategies based on alternative risk premia,

22、a new source of competitive advantage is emerging with the availability of alternative data sources as well as the application of new quantitative techniques of machine learning to analyse these data. With the increasing prominence of these techniques a typical hard-stop is found in sourcing machine

23、 learning talent. By many estimates the global demand for machine learning and AI solutions exceeds the supply of data scientists in the world (see HYPERLINK l _bookmark1 Figure 3).The global demand for machine learning & AI solutions greatly exceeds the production capacity of all the data scientist

24、s in the world, and this gap is growing exponentially.Figure 3: The demand for machine learning and AISource: J.P. Morgan Quantitative and Derivatives Strategy, DataRobot Inc.Moreover, a potentially more acute problem arises in exploring / evaluating the space of available machine learning models. T

25、raditionally machine learning is formed of two areas: supervised learning and unsupervised learning. Supervised learning attempts to learn salient features and focusses on accurate prediction, whilst unsupervised learning aims to parsimoniously explain data. Given the maturity of their respective fi

26、elds, there are a vast array of techniques underlying each. A top- level, topographical description of these areas and machine learning can be found in HYPERLINK l _bookmark2 Figure 4.Figure 4: Machine learning topographySource: 2012 David Barber, Bayesian Reasoning and Machine Learning, Cambridge U

27、niversity Press, Reproduced with permission.There is no question that techniques of machine learning yielded some spectacular results when applied to problems of image and pattern recognition, natural language processing, and automation of complex tasks such as driving a car. What is the application

28、 of machine learning in finance, and how do these methods differ from each other? How do we decide which machine learning model to use? Do we use one model or blend models together?Typically to answer these questions you would need to test a suite of models, tune the parameters, rinse and repeat. Wh

29、ile applying a machine learning method to a dataset is a science, choosing and calibrating a specific model has elements of art. Computer scientists often refer to the No Free Lunch Theorem that states that there is no one machine learning algorithm that gives the best result when applied to differe

30、nt types of data. In particular some models may over fit the data: they may look good on a backtest but perform poorly on out-of-sample data. Stability of out- of-sample forecast is a challenge often encountered with Risk Premia strategies. Big data and machine learning strategies are not exempt fro

31、m this challenge.Figure 5: Variance-Bias TradeoffSource: J.P. Morgan Macro QDSThe science of machine learning is relatively straightforward especially with the prominence of machine learning toolboxes and packages in popular languages. The art part of machine learning is selecting a model that will

32、find the optimal balance between in-sample error and model instability (tradeoff between model bias and model variance). In almost all cases of financial forecasts, we can model the future only to a certain degree. There are always random, idiosyncratic events that will add to the error of our forec

33、asts. The quality of our forecast will largely be a function of model complexity: more complex models will reduce in-sample error, but will also result in higher instability. This is illustrated in Figure 5.Enter automated machine learning (AutoML). AutoML is a relatively novel and impactful area of

34、 machine learning which aims to empower users to focus on the value-add than the minutiae of machine learning model selection. AutoML models take input data, determine the type of problem at hand, run machine learning models on the data, and benchmark the output, providing a lists of candidate solut

35、ions. These final models can then be analysed, tuned and interpreted to see whether we have meaningful output. The aim is to focus the data scientists time on the e.g. economic reasoning behind the model rather than having to build and test models.To meet the demand for machine learning and AI solut

36、ions, we need to become more productive, which can be done using automated solutions.Figure 6: Automated machine learning offers solution to supply / demand gapSource: J.P. Morgan Quantitative and Derivatives Strategy, DataRobot Inc.So what can be automated? Taking a look at a typical data science w

37、orkflow ( HYPERLINK l _bookmark3 Figure HYPERLINK l _bookmark3 7) there are a number of tasks which can be automated. Primarily these are items to do with model development and analysis, insights and validation. Elements which are fully automatable include:Pre-processing ModelselectionParametertunin

38、g DeriveinsightsOut-of-sampletestingAdditionally, there is scope for some automation in the data step though needs human input / oversight:Exploration FeatureengineeringExtraction DiagnosticsMergingandjoining CommunicateresultsTransformation DocumentationAggregation Internal + external modelvalidati

39、onMunging,wrangling, Ninja-ing. MonitoringFigure 7: A typical data science workflowWhile there are discussions as to the order and recursion, the workflow captures a typical streamSource: J.P. Morgan Quantitative and Derivatives Strategy, DataRobot Inc.We used the DataRobot AutoML platform in our Q4

40、 19 DeepFin Investor Tutorial. More details about DataRobot can be found on the following page.DataRobot simplifies model development by performing a parallel heuristic search for the best model or ensemble of models, based on both the characteristics of the data and the prediction target. While som

41、e machine learning techniques tend to consistently outperform others, it is rarely possible to say in advance which will perform best for a given business problem. Therefore, during the modelling process, DataRobot develops dozens of independent models, exposes the details of how these models were b

42、uilt and how they perform, and enables the user to select the best model for the particular problem being addressed.The fundamental workflow within DataRobot for model development is as follows:Figure 8: DataRobot Model Development - Theoretical framework and methodologySource: J.P. Morgan Quantitat

43、ive and Derivatives Strategy, DataRobot Inc.Automated Machine Learning in practice AutoML with DataRobotWe have hosted a series of global conferences and DeepFin tutorials in different regions to a growing audience of portfolio managers, asset owners and analysts keen to understand the investment im

44、plications of ML, AI, Alt-Data and other technologies for their stocks, sectors, portfolios and asset classes. The most recent session in EMEA saw the data science team at DataRobot discuss and walk through theirAutoMLsolutionswithparticipantsbuildingmachinelearningmodelshands-on.The 4 hour DeepFin

45、workshop looked at how recent developments in automated machine learning and interpretability can help quantitative and fundamental investors build, test and understand powerful AI models that support their investment process, and contrast these approaches with traditional quantitative research.Usin

46、g DataRobot participants built their own machine learning models to address a quantitative finance problem. In the first part of this session, we built a set of initial models, picked a promising model, analysed and interpreted it. Participants incorporated insights from the first round of modelling

47、 into a second round of model building, picked a final model, interpreted it and considered how we could deploy this model and incorporate it into our investment processes.Problem statementThe problem we tackle in the session is given a dividend cut, what drives share price performance?Figure 9: Dee

48、pFin problem statementSource: J.P. Morgan Quantitative and Derivatives Strategy, DataRobot Inc.To work on the problem i.e. to feed it into the machine, we need to translate the problem statement into the language of data science. Which inputs are we searching on for drives? How are we measuring shar

49、e price performance? Is it local currency of common currency e.g. USD? Are we using absolute or relative returns? How do we define after? When is after? What horizon is after? 1M, 3M, 6M performance? What dividend cut are we looking at? Are we looking at announced cuts or expectations? How much of a

50、 cut are we considering? A summary of these considerations is illustrated in HYPERLINK l _bookmark7 Figure 10.Figure 10: Translating the problem statementSource: J.P. Morgan Quantitative and Derivatives Strategy, DataRobot Inc.For the course of the tutorial, we looked at stocks which had a month-on-

51、month cut in their FY0 consensus DPS expectation (I/B/E/S/) and their subsequent 3M forward local currency excess return (rel. to local MSCI country benchmark).The dataWe took 53k observations of stocks in GDM from Jan 08 Jul 18. For each observation we have stock reference data, categorical data e.

52、g. MSCI GICS classifiers, alongside typical risk factors HYPERLINK l _bookmark9 1 and the target variable of 3M fwd. absolute return and 3M fwd. relative local currency excess return.Figure 11: Dividend cut data description - 61 features, 2 potential targetsSource: J.P. Morgan Quantitative and Deriv

53、atives StrategyThe first step in the AutoML process is to load the data. Once loaded, we can explore the data in the platform and are ready to run the analysis and build models.Taking a look at the data we can see some interesting relationships between features and subsequent performance by inspecti

54、ng histograms. In Appendix II we detail the Feature Associations matrix which provides information on association strength between pairs of numeric and categorical features.1 Detailed definitions can be found in Appendix I.Taking a look at the change in consensus mean price target, the data suggests

55、 consensus becoming increasingly bearish on the near-term outlook of a stock is a good predictor of return following a dividend cut.Weak stock price performance (12M price momentum) into a dividend cut is a good predictor for decent performance after the cut.A quick glance at the relationship betwee

56、n volatility and performance following a dividend cut would suggest the higher the vol into a cut, the greater the performance post. However it is important to note the long right tail of the histogram indicating a weaker level of conviction given the reduced number of observations of this perceived

57、 relationship.Figure 12: 6 month Change in Target Price and Avg. 3M Fwd. Excess ReturnSource: J.P. Morgan Quantitative and Derivatives Strategy, DataRobot Inc.Figure 13: 12M Price Momentum and Avg. 3M Fwd. Excess ReturnSource: J.P. Morgan Quantitative and Derivatives Strategy, DataRobot Inc.Figure 1

58、4: Volatility and Avg. 3M Fwd. Excess ReturnSource: J.P. Morgan Quantitative and Derivatives Strategy, DataRobot Inc.Hands-on Modelling I: the nave approachWe run the AutoML process: 71 models in 30 minutes! (In DataRobot this was done by simply pressing the Start button i.e. one-click) To put that

59、into perspective it would take a reasonably experienced data scientist approximately an hour to code up and hit run per model (approximately 71 hours), not accounting for the time taken to select the models, and crucially wait for the results.Using AutoML we prepare, build, run and receive results f

60、rom 71 models in under 30 minutes. A typical data scientist coding + running the same models manually would take in excess of 80 hours.Once we have completed the Model Development stage we move onto Model Evaluation and Selection of the AutoML workflow (from HYPERLINK l _bookmark4 Figure 8). Below w

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論