I thought so too. But I would rather use docstrings and comments because:
- Readable by LLMs/others in the context of the code
- Searchable
- Version controlled
I develop AI for a living and I don’t understand the internals of it either, just as I don’t understand the internals of Intel architecture. My job is to build, not to fit information into my mind.
This 1000 times. It takes courage to open up to mistakes. As a relatively young industry, we have a lot to learn still to move away from the instinctive blaming culture surrounding such failures. In this case, it's only a file being downloaded a couple of times, nobody died or got injured.
For those interested in this topic, and how other industries (e.g. Airline industry) deal with learning from or preventing failure: Sidney Dekker is the authority in this domain. Things like Restorative Just Culture, or Field guide to understanding human error could one day apply to our industry as well: https://sidneydekker.com/books.
Yeah one of the first open-source recommendation engines I ever worked with was called Voogo[1] and I believe it was based on k-means. This was back in 2008 or so?
For someone who had never been exposed to any of the math behind this kind of thing, it was an interesting implementation, and the source code was very readable.
The original website seems to be gone and I couldn't find a Git link so apologies for Sourceforge.
I'm no expert in ML, but beyond the research and emerging work in unsupervised learning clustering seems to be the most common approach, and there's nothing conceptually new here in the past 10+ years. Don't get me wrong, computationally we can do stuff with a ridiculous # of dimensions that was hard/impossible before and there are new algorithms but I was doing KMeans and DBScan/HDBscan and Gaussian mixtures in grad school 15 years ago in relation to databases, I had never heard of "Machine Learning" and we were in the glacial stage of the AI winter. There's some newer work that is based on human judgement for results but clustering still seems to be the mainstream "data validated" approach...
I have a data set from the 1980s and the programs to apply these techniques to it, written in compiled BASIC (predates Microsoft QuickBASIC). They ran their analyses on run of the mill IBM PCs. Really wish I could get it all into a modern framework…
The authors published a respected book based on the data and used it as the foundation for a bunch of other applied research. Sometimes I wish I could resurrect those researchers, or at least have one of their ghosts stop by and see what we can do with computers today.
Might have been k-nearest-neighbors rather than k-means. Knn can be used for "recommended because you bought X" or "users like you also bought X" type recommendations that relate user to user or item to item.
K-means could potentially be helpful to group together common users/items if e.g. you're memory constrained and don't want to give each user a fully unique embedding entry so that's also possible.
it's still widely used and can be validated without significant human judgement. Implementations are more efficient, and computatal complexity is through the roof but the approach is still legit.
I was taught coding on Matlab. It had everything built in, and no package management to worry about. As green freshmen, we were making games in week 2 of the course, and running science simulations soon after.
I generally tend to think that complaints about dependency management in Python are way overblown -- it's usually not a problem for me. But lately I've been trying out some off-PyPI projects to investigate time series foundation models, and it would have made my life so much easier if the implementers of these libraries hadn't decided to pin to extremely specific versions of ten or twenty different dependencies. No, your library does not need exactly NumPy 1.22.3. You're just throwing unnecessary obstacles in the way of people using it.
That pinning wouldn't really be a problem ordinarily, where I control the environment, but I'm running this particular code in a managed environment that I can't easily modify.
I really hope Apple comes out with an AW that has more battery life in lieu of all the outdoorsy/sports features of the AWU. I see lots of wealthy desk warriors wearing them, partly for the better battery life and partly for the status (of having the most expensive AW). I know some people who use some of the sports-related features, but most people I've talked to would prefer more battery than all the ruggedness. I know I certainly would. Get me to a week of battery life and I'm sold.