A data breach at analytics giant Mixpanel leaves a lot of open questions



A cybersecurity incident at analytics provider Mixpanel announced just hours before the U.S. Thanksgiving holiday weekend could set a new standard for how not to announce a data breach. 

To recap: In a bare bones blog post last Wednesday, Mixpanel chief executive Jen Taylor announced that the company had detected an unspecified security incident on November 8 that affected some of its customers, but didn’t say how they were affected, nor how many, only that Mixpanel had taken a range of security actions to “eradicate unauthorized access.”

Mixpanel’s CEO, Jen Taylor, did not respond to multiple emails from TechCrunch, which included over a dozen questions about the company’s data breach. We asked Taylor if the company had received any communication from the hackers, such as a demand for money, along with other specific questions about the breach, including whether Mixpanel employee accounts were protected with multi-factor authentication.

One of its affected customers is OpenAI, which published its own blog post two days later, confirming what Mixpanel had failed to explicitly say in its own post, that customer data had been taken from Mixpanel’s systems. 

OpenAI said it was affected by the breach because it relied on software provided by Mixpanel to help understand how OpenAI users interact with certain parts of its website, such as its developer documentation

OpenAI users affected by the Mixpanel breach are likely to be developers whose own apps or websites rely on OpenAI’s products to work. OpenAI said its stolen data included the user’s provided name, email addresses, their approximate location (such as city and state) based on their IP address, and some identifiable device data, such as the operating system and browser version. Some of this information is the same kind of data that Mixpanel collects from people’s devices as they use apps and browse websites.

For its part, OpenAI spokesperson Niko Felix told TechCrunch that the breached data taken from Mixpanel “did not contain identifiers such as Android advertising ID or Apple’s IDFA,” which may have made it easier to personally identify specific OpenAI users or combine their OpenAI activity with usage from other apps and websites. 

OpenAI said in its blog post that the incident did not affect ChatGPT users directly and terminated its use of Mixpanel as a result of the breach.

While details of the breach remain limited, this incident draws fresh scrutiny of the data analytics industry, which profits from collecting reams of information about how people use websites and apps. 

How Mixpanel tracks taps, clicks, and watches your screen 

Mixpanel is one of the largest web and mobile analytics companies that you might have never heard of, unless you work in the app development or marketing space. According to its website, Mixpanel has 8,000 corporate customers — one less now, following OpenAI’s early exit. 

With each Mixpanel customer having potentially millions of users of their own, the number of ordinary people whose data was taken in the breach could be significant. The type of breached data is likely to vary by each Mixpanel customer, depending on how each customer configured their data collection and how much user data they collected.

Companies like Mixpanel are part of a booming industry providing tracking technologies that allow companies to understand how their customers and users interact with their apps and websites. As such, analytics companies can collect and store vast amounts of information, including billions of data points, about regular consumers.

For example, an app maker or website developer can embed a piece of code from an analytics company like Mixpanel inside their app or website to gain that visibility. For the app user or website visitor, it’s like having someone watch over your shoulder without your knowledge as you browse a website or use an app, while it constantly shares every click or tap, swipe, and link press with the company that develops the app or website.

In Mixpanel’s case, it’s easy to see the types of data that Mixpanel collects from the apps and websites that its code is embedded in. Using open source tools like Burp Suite, TechCrunch analyzed the network traffic flowing in and out of several apps with Mixpanel code inside — such as Imgur, Lingvano, Neon, and Park Mobile. In our various tests, we saw varying degrees of information about our device and in-app activity uploaded to Mixpanel while using the apps. 

This data can include the person’s activity, such as opening the app, tapping a link, swiping a page, or signing in with their username and password, for example. This event logging data is then attached to information about the user and their device, including the device type (such as iPhone or Android), the screen width and height, if the user is on the phone network or Wi-Fi, the user’s cell network carrier, the logged-in user’s unique identifier for that service (which can be tied to the app user), and the precise timestamp for that event. 

The collected data can sometimes include information that should be off-limits. Mixpanel admitted in 2018 that its analytics code inadvertently collected users’ passwords.

Data collected by analytics companies is meant to be pseudonymized — essentially scrambled in a way that it doesn’t include identifiable details, such as a person’s name. Instead, the collected information is attributed to a unique but seemingly random identifier that’s used in place of a person’s name; an ostensibly more privacy-preserving way of storing the data. But pseudonymized data can be reversed and used to identify people’s real-world identities. And, data collected about a person’s device can be used to uniquely identify that device, known as “fingerprinting,” which can also be used to track that user’s activity across different apps and across the internet.

By tracking what you do on your device across various apps, analytics companies make it easier for their customers to build up profiles of users and their activity.

Mixpanel also allows its customers to collect “session replays,” which visually reconstruct how the company’s users interact with an app or website so that the developer can identify bugs and problems. Session replays are meant to exclude personally identifiable or sensitive information, such as passwords and credit card numbers, from any collected user session, but this process isn’t perfect, either. 

By Mixpanel’s own admission, session replays can sometimes include sensitive information that should not have been logged, but are collected inadvertently. Apple cracked down on apps that use screen recording code after TechCrunch exposed the practice in 2019.

To say that Mixpanel has questions to answer about its breach is perhaps an understatement. Without knowing the specific types of data involved, it’s not clear how big a breach this is or how many people might be affected. It may be that Mixpanel doesn’t yet know. 

What is clear is that companies like Mixpanel store huge banks of information about people and how they use their apps, and are clearly becoming a focus for malicious hackers.

Do you know more about the Mixpanel data breach? Do you work at Mixpanel or a company affected by the breach? We would love to hear from you. To securely contact this reporter, you can reach out using Signal via the username: zackwhittaker.1337




Source