• Data and privacy

    Ma. Lourdes N. Tiquia

    Ma. Lourdes N. Tiquia

    THE Commission on Elections (COMELEC) will have to come clean on the brewing issue of the Vote List leak that happened in March 2016. With the National Privacy Commission now ruling on the breach, the “we apologize this happened” will no longer hold water. Why? Because we have laws on the collection of data, privacy and security. These laws are: the Cybercrime Prevention Act (RA 10175), Data Privacy Act (RA 10173), and the E-Commerce Act (RA8792).

    Government agencies collect all sorts of data—the SSS, GSIS, Pag-ibig, LTO, BIR, SEC, Philhealth, the Bureau of Census, among others. And then there is the Credit Bureau which is also trying to build its database to establish the credit history of every Filipino. The use and protection of the data collected is governed by trust. Once breached, trust is betrayed. There are also laws to make government agencies answerable for their acts or wanton recklessness such as the Code of Ethics and Ethical Standards (RA 6713) and the Anti-graft and Corrupt Practices Act (RA3019)

    If government is the data aggregator, what kind of security does it have in order to protect and ensure our privacy? There is already talk of the Credit Bureau selling information to buyers of data. One cannot do data analytics without first having clean data. “Analysis of data is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision-making.”

    Unlike in mature democracies, the Philippines has very strict privacy protocols that one cannot just mine data. “Data mining is a data analysis technique that focuses on modeling and knowledge discovery for predictive rather than purely descriptive purposes.” Data analysis plays an important role in business intelligence and statistical applications.

    Business intelligence relies heavily on aggregation, focusing on business information. While statistical applications divide data analysis into descriptive statistics, exploratory data analysis (EDA), and confirmatory data analysis (CDA). EDA focuses on discovering new features in the data and CDA on confirming or falsifying existing hypotheses. Predictive analytics focuses on application of statistical models for predictive forecasting or classification, while text analytics applies statistical, linguistic, and structural techniques to extract and classify information from textual sources (unstructured data). All are varieties of data analysis. Data integration is a precursor to data analysis, and data analysis is closely linked to data visualization and data dissemination. The term data analysis is sometimes used as a synonym for data modeling.

    When COMELeak happened on March 27, 2016, we were all surprised to see that our election infrastructure was not robust. Several servers were actually releasing the Voters List and since 1992, the debate was already focused on the Voters File and the Voters List. There are two Books of Voters that the COMELEC maintains. One has basic information and the other, a complete profile of each Filipino voter. At the time, a public apology was made, the leaks stopped and we heard nothing more of it.

    A total of 55 million registered voters’ information was compromised. An investigation by TrendMicro found that the “passport details of 1.3 million overseas Filipino voters and 15.8 million fingerprint records were included in the massive data breach. Also leaked was a list of COMELEC officials that have admin accounts.”

    When COMELEC chair Andy Bautista said he is not to blame, then who is responsible and accountable for the breach? Because surely the hackers were not responsible for having “crucial data in just plain text and accessible to everyone.” So, what Chairman Bautista was suggesting is that we run after Anonymous Philippines and LulzSecPilipinas? Anonymous Philippines hacked the COMELEC system while LulzSecPilipinas “released the poll body’s entire database online. Three more mirror links were later added where the database could be downloaded.”

    Mind you, the data breach made it the biggest government-related data breach in history, surpassing the 2015 hacking of the US Office of Personnel Management which revealed fingerprints and social security numbers of 20 million Americans. And we don’t see heads rolling!

    For purposes of transparency and accountability, Chairman Bautista should make public what they did about the breach from the time it was discovered to the time the mirror sites were removed, if these were really removed. Nine months after, Chairman Bautista must make public what steps they took in terms of security. Were protocols adopted? Is there now a manual in case of another breach? Since March 27, no report has been made public. It’s as if the COMELEC is back to its slumbering ways. They rest on their laurels because of the claim that 2016 was the fastest, most secure and efficient election in history. And yet, they have not made public their report to Congress on the 2016 elections too. No suggestions are yet being made as to what system we will use in 2019. In fact, the aggregation of results have not been completed at the national level. The COMELEC does not even go through the process of reconciling data sets.

    The election system in our country should be declared a critical infrastructure and during the off election years, COMELEC should not sleep on its job of cleaning and putting security protocols on the Voters File, back end and learning from automation.

    These days, the word “leak” is front and center. Communication is sacrosanct, it is privileged. But then when security settings are not consciously looked into, privileged becomes a shout-out for the whole world to see. The other word drilling our collective thoughts is “fake”. It could be news, electronic entries, etc. The third is “trending.” How would you answer a reality observed widely in social media that fake news trends? As Bianca Villanueva opened our eyes last Friday on artificial intelligence, machine learning and deep learning, she asked, how do machines learn? Boils down to data and privacy and the ethics of use.

    A shout-out to the National Privacy Commission, you guys rock!


    Please follow our commenting guidelines.


    1. Ma. Lourdes Tiquia, you are the only Ma. Lourdes I am so impressed with…. EDA, CDA, AI, data mining, data warehousing, predictive analytics, machine learning, deep learning…you seem to be very knowledgeable in these exotic topics. WOW ! The other Ma. Lourdes (more famous ? or infamous ?) I read about, I detest !

    2. These incredibly complex cyber crimes could not have been pulled off without the complicity of the gov’t IT computer institutions which should also all be under investigation and purged of their yellow co-conspirators.

      Just as likely the cyber crime hacking of hundreds of millions of dollars from Bangladesh was instigated by the gov’t IT agencies given access to the security codes, supercomputers and expertise to pull off the scam.