A New Way to Protect Personal Survey Data

Organizations are constantly collecting confidential consumer data, but how long does it stay private? Although datasets are supposed to be anonymized or encrypted for confidentiality, proprietary information has a way of getting out. In fact, Verizon confirmed 3,950 data breaches worldwide in its 2020 annual “Data Breach Investigations Report,” with 30 percent of those executed by internal actors such as employees.

“Encryption definitely helps, but it does not prevent a data breach,” says Matthew Schneider, PhD, assistant professor of business analytics in the LeBow College of Business. “It’s similar to safeguarding your email password; an internal actor with access to the encryption key or real data could easily cause a data breach.”

Privacy is also a problem for local governments and other entities that conduct confidential surveys of their constituents that they are legally required to share with the public. It’s relatively simple for an unethical actor to use public datasets to identify a particular respondent and figure out their revealing private responses.

To solve this, Schneider and his research partner Dawn Iacobucci of Vanderbilt University proposed a new methodology that permanently alters survey datasets to protect consumers’ privacy when the data is shared, whether intentionally or through a breach.

Their methodology, published in the Journal of Marketing Analytics, was built upon a technique found in genomic sequencing applications that was able to disguise the identity of survey respondents and their sensitive responses while maintaining the accuracy of insights within 5 percent.

“Our method would essentially ‘shuffle’ the demographic data in a survey dataset,” says Schneider. “But, unlike previous methods, ours only shuffles data when it maintains the correlations between important variables that are essential to analysts. The protected data is generated on a consumer level and still valuable to the end user. This can also be done for employee surveys. If this dataset got out, then only the organization’s insights would be known.”

This story was originally published in Drexel’s EXEL Magazine with the title “When Data Gets Loose.”

In This Story

Related Stories

How Consumer Feedback Can Predict Product Sales

Matthew Schneider, assistant professor of business analytics, explores how customer reviews can help predict product sales in his latest publication.

A man stands behind a five star rating

LeBow Researchers Join a Movement to Benefit Society Using Forecasting Models

Professor Matthew Schneider, PhD, and PhD student Jade Zhang are partnering with Forecasting for Social Good to improve privacy and effect positive change.

Researchers at the International Symposium on Forecasting in June 2021

Zoom 5.0.0 Released

Zoom released version 5.0.0 of their desktop and mobile application on Monday, April 27, 2020 which includes a number of security enhancements.