Big Data vs. Right Data - Institute for Experiential AI

Big Data vs. Right Data

Ricardo Baeza-Yates

Hosted by NEOACM’s Remote Distinguished Speaker Lecture

Big data nowadays is a fashionable topic, independently of what people mean when they use this term. But being big is just a matter of volume, although there is no clear agreement in the size threshold. On the other hand, it is easy to capture large amounts of data using a brute force approach. So, the real goal should not be big data but to ask ourselves, for a given problem, what is the right data and how much of it is needed. For some problems, this would imply big data, but for most problems less data is needed. Hence, in this presentation, the opportunities and the challenges behind big data will be covered. Regarding the challenges, the trade-offs involved with the main problems that arise with big data: scalability, redundancy, bias, the bubble filter and privacy will be explored.

Biography

Ricardo Baeza-Yates is Director of Research at the Institute for Experiential AI of Northeastern University. He is also a part-time Professor at Universitat Pompeu Fabra in Barcelona and Universidad de Chile in Santiago. Before he was the CTO of NTENT, a semantic search technology company based in California and prior to these roles, he was VP of Research at Yahoo Labs, based in Barcelona, Spain, and later in Sunnyvale, California, from 2006 to 2016. He is co-author of the best-seller Modern Information Retrieval textbook published by Addison-Wesley in 1999 and 2011 (2nd ed), that won the ASIST 2012 Book of the Year award. From 2002 to 2004 he was elected to the Board of Governors of the IEEE Computer Society and between 2012 and 2016 was elected to the ACM Council. Since 2010 is a founding member of the Chilean Academy of Engineering. In 2009 he was named ACM Fellow and in 2011 IEEE Fellow, among other awards and distinctions. He obtained a Ph.D. in CS from the University of Waterloo, Canada, in 1989, and his areas of expertise are web search and data mining, information retrieval, bias and ethics on AI, data science and algorithms in general.