Juliana Freire, Professor of Computer Science and Engineering at NYU-Poly, addressed the practical difficulties that arise from Big Data analysis. “Big Data is not a new concept,” she said, “new are its inexpensive and easy accessibility and the growing number of data enthusiasts.” Until recently, data access was limited to experts; today, it is accessible to everyone.
According to Freire, the problem is not the data volume, but “the human in the loop. The line from data to knowledge is not a straight line; it requires human interpretation.” Data integration and exploration are complex and time-consuming processes. Tedious tasks should, therefore, be automated to empower data enthusiasts, who, sometimes without expertise, want to make sense of the data. To date, the right tools for optimal usability are still missing. Although data visualization tools, such as Google Fusion Tables, are powerful utensils for data exploration, they require better integration with data management systems.
Gerhard Weikum, Scientific Director at the Max-Planck Institute for Informatics, presented the three dimensions of Big Data analysis – length, width, and depth, using sustainable traffic as an example. The length represents the action of collecting and comparing data. In his example, the accumulation and comparison of bike traffic data of different cities provides an initial overview. Yet, “it is problematic that the sources vary in quality,” said Weikum. The data expert is required to carefully select the useful content. The second stage, width, relates to discovering and integrating the data. Here, structured and unstructured content are matched and linked together. In the traffic example, data scientists search for additional information on specific questions, for example, whether cyclists wear helmets when cycling. Their search includes additional data pools, such as news, blogs, or social media outlets. The analytical interpretation of the data in the depth dimension can then lead to deeper insights, for instance, the energy cost per commute in different cities, and long-term guidance.
Raghu Ramakrishnan’s presentation focused on the Big Data applications spectrum and the ethical and legal issues arising from them. “Big Data is best thought of in terms of what it enables,” said the Microsoft Technical Fellow. It transforms business models and scientific disciplines and allows unprecedented new apps in every area. Heterogeneous data, real-time analysis, and ‘instant on’ cloud access are innovations resulting from this technology. Used on the web for applications, such as targeted marketing, or in companies to monitor user login times, Big Data is meant to assist data experts in analyzing reoccurring patterns and predicting future behavior.
“Data is the new gold; it is a powerful tool that offers many possibilities.” But, according to Ramakrishnan, the pace and low costs of its development aren’t characteristics of a normal technological evolution – “it is a technology revolution.” As with everything that develops too fast, it is difficult to control this almost unlimited data gathering. To date, regulations are still lagging behind the resulting legal and ethical issues.
The panel discussion was moderated by Claudio T. Silva, Head of Disciplines, Center for Urban Science and Progress (CUSP) and New York University Professor of Computer Science and Engineering, Polytechnic Institute of New York University (NYU-Poly). Dr. Joann Halpern, Director of the German Center for Research and Innovation, provided introductory remarks. The Max Planck Institute for Informatics, NYU-Poly, and the German Center for Research and Innovation co-hosted the event.
To watch the event video, click here.