Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One of the difficulties this kind of output has is that it runs very quickly into issues of semantics and model labeling.

We humans each build models kind of like these when we interact with one another, so when I ask somebody, "do you think <name> is an agreeable person?" and they reply "sure I think he is!" they're consulting that model to provide me an answer. Humans can even do a kind of pairwise sorting on that model and tell you if person1 is more or less agreeable than person2.

However, even if our individual models may differ a bit, and the results of these kinds of questions to each other might differ a bit, there's an inherent "humanness" to the results because people generally have a pretty similar semantic understanding of what "agreeableness" means.

However, what does Watson think agreeableness means? I have no idea, nobody really knows. Watson can't really explain it. All we know is that there's a model that produces a scored (and thus rankable output) when asked to score a corpus on that model and somebody somewhere labelled that model as the "agreeableness" model, perhaps based on some heuristics or parameters that were intended to define that notion.

It's thus very hard for humans to trust scoring like this because when it doesn't make sense, it doesn't make sense for reasons that no human would have about the matter. For example, I would personally say pg is far more agreeable than I am, yet Watson scores our respective collection of comments exactly the same. I can't explain it, Watson can't explain it, and thus it feels "wrong" and now I can't trust the scores that Watson provides me.



Well, the real question this tool answers is not really "do you think X is Y", but "do you think X's comments show Y".

There's also a lack of documentation from IBM as to what the results mean exactly and how solid they are.


Well, in a sense, most of us all only know each other through our comments, and that's all we can ever base an assessment like this on. By proxy we have to assume that when people's inner thoughts leak out into the Internet on a forum like this (and in a sustained enough way to make them a top-100 karma earner) that their aggregate corpus of comments will be a reasonable insight into who they are.

So for all purposes that you, I or Watson can demonstrate, "do you think X is Y" and "do you think X's comments show Y" are functionally the same.

edit

I just checked what Watson thinks are my needs. Apparently I don't have many, and everybody on HN has an extreme need for Challenge.

I almost feel like these results require a lot of interpretation, and that interpretation is about as reliable as a horoscope.


> I almost feel like these results require a lot of interpretation, and that interpretation is about as reliable as a horoscope.

I got a similar feeling from this - that's why I'd love to see some hard data behind the algorithm, or at least bits and pieces about the methodology used to arrive upon it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: