Millions of Americans were using telehealth company and prescription drug provider GoodRx—yet probably didn’t know that it was sharing their prescription medications and health conditions with Facebook, Google, and other third parties. The adult and child users of a popular “family safety” app likely didn’t realize that the company was quietly selling their location data. These and many other cases have a common thread: data brokerage, or the multibillion-dollar, virtually unregulated ecosystem of companies collecting, inferring, aggregating, and then selling and sharing Americans’ data.
Data brokers have been around for years. These companies have not received as much attention as the Facebooks and Googles (or TikToks) of the world—but there’s some indication that that may be changing. Last week, I testified in a congressional hearing on the subject, which ended up being a strongly bipartisan discussion of an underexplored privacy problem that affects hundreds of millions of Americans. U.S. data brokers surreptitiously gather and sell personal information ranging from people’s health and mental health conditions to their income and credit score, political affiliation, and smartphone locations. For example, Arkansas-based data broker Acxiom advertises data on 2.5 billion people worldwide. Health insurance companies, financial institutions, marketers, law enforcement agencies, criminal scammers, abusers, and other actors can buy these prepackaged data sets to profile, track, and target the people in them.
Data brokers acquire information about people in three main ways. Many brokers gather information on individuals directly, such as by acquiring companies, apps, and websites that collect information on people, which is then fed into data brokers’ databases. These companies also sometimes pay app developers to include their software development kits, SDKs, or pre-made software toolkits, in apps—which then allows the broker to “sit” within the apps and siphon data on users. When a user installs an app, they might agree to the app accessing their phone’s location or contacts without realizing that a data broker SDK is acquiring that data too.
The second collection vector is indirect. Data brokers will scrape public records, including voting registries and property records, to acquire data points like home address and phone number. People-search or white pages websites are data brokers that scrape these records, extract the data, and post it online for search and sale. Indirect collection also includes data brokers paying app developers to sell data on their own users (without bothering to include data broker SDKs in their apps). For example, Life360, marketed as a family safety app, was selling location data about its parent and child users to data brokers (and in 2020 made about 20 percent of its revenue from that activity, according to reporting by the Markup).
The third way brokers get data is through inference—using algorithms and other techniques to predict data points they haven’t technically gathered, such as using purchase and ZIP code data to make predictions about household income. The practice of inference means that information that consumers may have never typed into a form, and have no idea is collected by data brokers, is for sale on the open market.
Sometimes inference is simple, like seeing which apps are installed on a device. If someone has a Christian news app or a Muslim prayer app on their phone, that’s likely a strong indicator of their religion. The presence of LGBTQ+ dating apps on a phone, similarly, is likely a strong indicator that said person belongs to the LGBTQ+ community. A data broker could purchase user lists from the app, or acquire data about a device’s app downloads another way, and use that single data point to mark down a person’s religion and sexual orientation, which it can then sell. Other times, “inference” could be more complex: Part of the invasiveness of gathering location data is that it can be used to follow individuals as they visit medical facilities, divorce attorneys, payday loan offices, gay bars, places of worship, their children’s school, and much more. Data brokers with thousands of data points on a single individual can also build models to derive additional information from that data.
The harms of this data collection, inference, and sale are clear. Data brokers have for decades scraped public records and published Americans’ home addresses and other information for search and sale online. Abusive individuals have then bought this data and used it to hunt down and stalk, harass, intimidate, assault, and even murder other people, predominantly women and members of the queer community. These companies have also for years sold data to criminal scammers, who then targeted groups such as World War II veterans and stole millions of dollars from elderly Americans and people with Alzheimer’s.
Health insurance companies have purchased data from data brokers—including data on race, education level, marital status, net worth, social media posts, payments of bills, and more—to profile consumers and predict the costs of providing health care to those people. Selling data on people suffering from depression, anxiety, bipolar disorder, attention disorder, and more threatens to enable incredibly predatory targeting of people who already face stigma and barriers to accessing mental health care. Scammers have bought payday loan applicants’ financial information, which at least one data broker illegally sold, to steal millions of dollars from those people. Law enforcement and security agencies have purchased broker data on U.S. citizens, ranging from home utility data to real-time locations, without warrants, public disclosure, and robust oversight.
Although brokers gather, infer, and sell data about hundreds of millions of people in the United States, their activity is largely unregulated. For example, the Health Insurance Portability and Accountability Act, often referred to as the U.S.’ health privacy law, applies only to certain “covered entities” (including health care providers and health care clearinghouses) and their business associates. If you tell your doctor your throat hurts, they can’t write that up and sell it on the street corner; in kind, if your hospital makes a mobile app to schedule appointments, that is also HIPAA-covered. But HIPAA does not cover many apps, websites, data brokers, social media companies, advertising technology startups, and other actors that do not have business relationships with covered entities. Hence, it’s entirely legal for a non-HIPAA-covered mental health app to collect data on 14 to 17 year olds with depression and then sell that information, with names attached, to all kinds of third parties. Some prescription drug providers could even legally sell your prescription information, so long as they are not doing so “deceptively,” at which point there is no way for you to know what’s happening with the data.
The harms and risks affect every American, but they fall hardest on vulnerable populations, including Black and brown communities, the poor, elderly Americans, people with Alzheimer’s and dementia, survivors of gendered violence, pregnant people, kids, people with mental health conditions, and veterans afflicted by trauma or struggling to make ends meet.
Last Congress, the American Data Privacy Protection Act received strong bipartisan support and, by many accounts, was the closest the U.S. legislature had ever come to passing a comprehensive privacy law. Debates about a newly introduced version are ongoing. On top of putting controls on data brokerage in a comprehensive privacy law, the U.S. needs specific, total bans on the sale of health and location data, which are incredibly intimate to individuals, easily linked to specific people, and highly susceptible to abuse. This should be paired with legislation to stop data brokers from using “inference” to circumvent those controls—making nonsensical semantic distinctions to comply with the law but still derive health data or location data anyway.
The shadowy collection, inference, and sale of data on Americans has gone on for far too long. While there are some personal-information deletion services that consumers can use to get their information removed from people-search websites—temporarily and partially at best—those services cannot help individuals remove their information from privately held data broker databases (except, in a few narrow cases, in California and other states with privacy laws). But the real point is simple: The burden should not be on individuals to fight an opaque, multibillion-dollar surveillance industry. Congress needs to act, or the harms to Americans will only persist.
Future Tense
is a partnership of
Slate,
New America, and
Arizona State University
that examines emerging technologies, public policy, and society.