Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thanks, really appreciate the heads up. I put devs who do this on a personal black list for life.

I think also that this would be better as an mcp tool / resource. Let the model operate and query it as needed.



It's the email/username harvesting that you mean right? Or do people also have something against anonymised product analytics?


I have something against opt-out analytics over TCP/IP or UDP/IP period, because they aren't anonymized, they include an IP address by virtue of the protocol.

But I definitely only posted that original complaint of the email/username (not the person you responded to initially).


> anonymised product analytics?

They're not anonymous, they're just pseudo-anonymous. It's incredibly easy to collect pieces of data A thru Z that, on their own, are anonymous but, all together, are not. It's also incredibly easy to collect data that you think is generic but is actually not.

Do you query the screen size? I have bad news for you. But, all of this is besides the point: when that data is exfiltrated to a third-party service, you have no idea how it's being used. You have a piece of paper, if you're lucky, telling you the privacy policy, which is usually "you have no privacy dumbass".

Even if data appears completely anonymous to humans, it can be ingested by machine learning algorithms that can spot patterns and de-anonymize the data.

I mean, we have companies who's entire business model is "how do we string together bits of data and tie it to real-world identity?": namely Google. Turns out it's remarkably easy when you have your hands in a lot of different pots. Collect a little anonymous data here, a little there, and boom: now you know that Billy Joe who lives on First Street loves to go to Walmart at 1 AM and buy Ben and Jerry's ice cream in a moment of weakness.


Ad agencies are using the contact tracing algorithms made for covid to track people.


Yes to both.


how do you build a product without analytics? how do you measure the success and failure of every change?


Many users tend to be pretty vocal when changes break things they like, you don't need to spy on them for that. Mail readers > analytics frameworks.


"not breaking things they like" is a very low bar for building a great product

To be honest building things this way seems like such a competitive disadvantage I don't see how it could ever work at scale. Certainly all the big players are using them. If we shake our heads at the little players doing the same, we're just going to widen the moat


Spying on your users does not give better feedback than simply asking your users (surveys, focus groups) and responding to the considered comments you receive. Spying and trying to infer intent is such a low bar to improve upon.


> Spying on your users does not give better feedback than simply asking your users

If that's true, there are many companies paying thousands -hundreds of thousands unnecessarily. Why are they choosing to throw away their money?


Companies blow money on bad ideas all the time. Middle managers love analytics because it lets them win internal arguments, not because it actually solves problems.


It is not an either or. Surveys are almost always ignored. Micro improvements cannot be done with just surveys and asking users. Often users do not know how to describe a problem. Product analytics, if anonymized with opt-out gives a pretty good picture of intent, especially in B2B software.


Analytics cannot be anonymized.


Why?


Any complex dataset has enough revealing information as to make deanonimization possible. To truly muddle the waters enough to make such attempts impossible would require injecting enough noise as to make the analytics useless to learn from.

This is a fundamental property derived from information theory, but also confirmed time after time in practice: https://www.theguardian.com/technology/2019/jul/23/anonymise...

Data anonymization is a myth sold to politicians to whitewash data collection.


Sure, but that is broader than product analytics and applies to all data collection. The word I should have used is "pseudonymize". The goal for capturing product analytics is not to deanonimize but understand usage trends/bottlenecks.


Pseudonymous is not what is wanted here though. For your spying on my usage to be acceptable, it would have to be truly anonymous. Pseudonymous means that instead of you putting "HN user adastra22" in your database for everything I do, you instead use "fffa366bc5d3." So any human being looking at the database record won't immediately see that it is me.

But in any sufficiently complex real-world database, it is a trivial step to map these pseudonymous tags to actual users, and thereby undo the obfuscation. It provides no actual privacy protection.

And the privacy IS the issue here.


Isn't that an argument against any piece of ethics? Am I missing something or are you arguing that gaining an advantage by being a bad actor means you shouldn't be a good actor because then you'd be at a disadvantage?

I get that I am making a general statement from your original narrow scope so correct me if I'm wrong that you mean THIS bad thing is fine but other bad things are still bad.


You know that generations of engineers built and sold products without spying on their users.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: