Analysis of Big Data: an Overview and Demonstrations

 
 
主講人: 蔡瑞胸院士(美國芝加哥大學商學院H.G. B. Alexander講座教授)
主持人: 陳恭平主任(中央研究院人文社會科學研究中心特聘研究員兼中心主任)
主辦單位: 中央研究院人文社會科學研究中心
時間: 2016 年 08 月 16 日(二)上午 10:00 至 下午 12:00
相關連結: http://www.rchss.sinica.edu.tw/app/news.php?Sn=1795
地點: 中央研究院人社中心第一會議室
主講:蔡瑞胸院士(美國芝加哥大學商學院H.G. B. Alexander講座教授)。蔡院士為國際知名計量經濟及統計學者。美國統計學會、美國數理統計學會及英國皇家統計學院院士。學術專長為財務計量,經濟預測及風險管理。

摘要:Big data are common in many scientific fields and have attracted much research interest in machine learning, computer science, optimization and statistics. The goal of the analysis is to extract useful information effectively and timely from massive data. In this talk, we introduce methods available for analyzing big data, especially for dependent data, and discuss their pros and cons. We compare the concepts of sparsity and parsimony in modeling. The role of regularization is emphasized. Examples are used to demonstrate the analysis. The methods discussed include (a) various penalized likelihood methods for statistical modeling such as LASSO regression, (b) tree-based methods for classification and prediction such as bagging, boosting, and random forests, (c) methods for discriminant analysis such as supporting vector machine, and (d) methods for analyzing big dependent data. The demonstration is carried out using the R software. Cross-validation (both leave-one-out and K folds) is used to select the penalty (or smoothing) parameter. We also discuss some limitations in analyzing big data.