演讲人: 微软亚洲研究院资深研究员 王海勋 博士
演讲人：微软亚洲研究院资深研究员 王海勋 博士
Haixun Wang Lead Researcher
Haixun joined Microsoft Research Asia in Beijing, China in 2009, and he leads research in semantic search, graph data processing systems, and distributed query processing. Before joining Microsoft, he had been a research staff member at IBM T. J. Watson Research Center for 9 years. He was Technical Assistant to Stuart Feldman (Vice President of Computer Science of IBM Research) from 2006 to 2007, and Technical Assistant to Mark Wegman (Head of Computer Science of IBM Research) from 2007 to 2009. He received the Ph.D. degree in computer science from the University of California, Los Angeles in 2000. He has published more than 120 research papers in referred international journals and conference proceedings. He was PC Vice Chair of KDD’10, ICDM’09, SDM’08, and KDD’08, and he served as demo/workshop/sponsor Chair of various conferences, including SIGMOD’08, ICDM’08, ICDE’09, ICDM’11, etc. He serves on the editorial board of IEEE Transactions of Knowledge and Data Engineering (TKDE), and Journal of Computer Science and Technology (JCST).
In this talk, I will discuss how to obtain useful information and knowledge from the web data and the search log data, and how to use such data to build killer applications. Web-scale data and the so-called peta-byte age is changing many aspects of business practice and scientific research. On the science frontier, it challenges methodologies in many established fields including statistics, machine learning, natural language processing, etc. However, a fundamental challenge we are facing is that web data is noisy and it is only good for “head queries”. I will use some results from a system we recently built to demonstrate the challenges and opportunities in this field.