Everyone using the internet nowadays can call up information literally by their fingertips when they type search terms on the various search engines available online. Many of those who are connected find and consume information via the results presented by Google or Bing or some other variant.
This “information-at-your-fingertips” is a result of the search engine company (Google or Microsoft) crawling different information sources—the world wide web, books, magazines, periodicals and others—and collating, ranking and presenting these depending on what you are looking for. This processing and presentation of massive information is the domain of Big Data.
This capability has made Google able to harness massive information and present it in new ways to produce what is now known as a data product: insights or goods and services with added value. Google, for example, was able to predict the onset of flu in the United States by observing and collating search terms for flu symptoms and remedies.
In our research at the National Institute of Physics, I have students studying large real world data sets such as ten-year hospital admissions, climate records, bird songs, laser signals and even the bills in the House of Representatives since the 8th Congress. They apply different ways to analyze it such as network theory, signal and time series analysis and the usual methods of statistics and correlation.
This line of research turned out to be very interesting and we have had methods that can successfully predict (within a certain limit) future values and trends. My students can also show causal relationships between several data sets—i.e. which causes what—based on these historical data that they collect.
This predictive capacity and the new insights we can gain from it is what makes Big Data (with capital letters) an exciting field to study. Aside from predictive capacity, we can find not only correlation from events but can also find meaningful relations that is not immediately obvious when you look at things one by one.
One problem that we face is that we don’t have a lot of data online. True, thirty five percent of Filipinos are connected online and more than nine out of 10 go on social networking sites such as Facebook but Big Data is not only from social media. We can take what is the cyber-zeitgeist by following everyone’s posts and tweets but we ignore a large bulk of information that can enhance our insights if we rely only on social media alone.
Data such as health, housing, economic data are usually reported already in aggregates. Even agricultural output, climate information and a host of other potentially useful data is not accessible. One of my students had to encode eight years worth of hospital admissions by hand in order to analyze long term climate correlations. It was only in recent years that we have health institutions going digital and putting these information into computer-usable format.
This is where Open Data comes in. Government should make available information not only because it promotes transparency as in the proposed Freedom of Information bills in Congress but also because it would promote a more objective way to look at how government works. Putting the budget online is one step but we hope that all data, transactions included, are accessible and usable.
Aside from the FOI bill, we need proposals such as the Open Source Bill (HB 1473) of Bayan Muna Representatives Carlos Isagani Zarate and Neri J. Colmenares together with ACT Partylist Representative Antonio Tinio would require government to use a common data format that can be opened, processed and presented both by open source and propreitary programs alike.
Of course, with Big Data we face other important issues like privacy. In the hospital example above, we had to go through their ethics board for approval and had to follow strict privacy protocols in order to protect patient information. Google has already this uncanny way to “learn” what you like and present it as you search.
Last August 26 action in Luneta was a proof that when people act on information presented to them, it can produce meaningful results. We only have to realize how we can harness this information—Big Data—and make it accessible not only to those online but to the 65% that are offline in order to help everyone improve their situation in life. After all, poverty and social change are real world offline problems where Big Data should be helping to address.