Having written a slightly more involved version of this recently myself I think you did a great job of keeping this compact while still readable. This style of library requires some design for sure.
Supporting higher order derivatives was also something I considered, but it’s basically never needed in production models from what I’ve seen.
The issue in my mind is that this doesn’t seem to include any of the critical library functionality specific eg to NVIDIA cards, think reduction operations across threads in a warp and similar. Some of those don’t exist in all hardware architectures. We may get to a point where everything could be written in one language but actually leveraging the hardware correctly still requires a bunch of different implementations, ones for each target architecture.
The fact that different hardware has different features is a good thing.
A long time ago I tried a version of this (https://github.com/brandonpelfrey/complex-function-plot). Can you add texture lookup to yours? Escape time could map to one texture dimension and you can arbitrarily make up another dimension for texture lookup. Being able to swap in random images can be fun nice demo!
They also depend on the design space to somewhat friendly in nature and can be modelled by a surrogate, so that exploit/explore can be modelled in an acquisition function.
Also successive halving e.g. build on assumptions how the learning curve develops.
Bottom line is that there is hyperparams for hyperparam searches again. So one starts building hyperparam heuristics on top of the hyperparam search.
In the end there is no free lunch. But if hyperparam search strategy somewhat works in a domain it is a great tool. Good thing is that one can typically encode the design space in Blackbox optimization algorithms more easily.
Great article. I still feel like very few people are viewing the Deepseek effects in the right light. If we are 10x more efficient it's not that we use 1/10th the resources we did before, we expand to have 10x the usage we did before. All technology products have moved this direction. Where there is capacity, we will use it. This argument would not work if we were close to AGI or something and didn't need more, but I don't think we're actually close to that at all.
Correct. This effect is known in economics since forever - new technology has
- An "income effect". You use the thing more because it's cheaper - new usecases come up
- A "substitution effect." You use other things more because of the savings.
I got into this on labor economics here [1] - you have counterintuitive examples with ATMs actually increasing the number of bank branches for several decades.
DeepSeek is bullish for the semiconductor industry as a whole. Whether it is for Nvidia remains to be seen. Intel was in Nvidia position in 2007 and they didn't want to trade margins for volumes in the phone market. And there they are today.
I personally don't have a problem with this, but this really made me feel like I don't understand the community of this forum sometimes. HN every day has multiple posts which drive so many comments about how privacy is lost and everything needs full E2EE, trust no one, etc. Then there is this post which is also a breach of privacy (much more than some things complain about), and yet the reaction is "wow, this is so pure and amazing to view into these candid moments". It feels like some cognitive dissonance. Still, personally I thought this was a cool post.
There are a few things that come to mind that make this different:
1. This isn't really privacy breaching. For someone who taps the "share to youtube" button without knowing what it means, sure, but even that is pretty explicit that you're sharing it. Not sure why the article itself says people didn't know what the button would do before tapping it, so I'd like some further explanation of this point.
2. It's opt in, not opt out. Spending time with most "normal" people has shown me that very few people give a crap about going into settings menus to configure exactly how ther data is used or collected, or otherwise switching to a service that gives them that control. When HN complains about privacy being dead, they are complaining about this apathy end how it gets exploited. This feature does not exploit that apathy.
3. This gives us something that we actually want. When most services invade your privacy, it's usually for things like advertising, targeting content, and data brokering. Things that I know I personally have a lot of issues with, and I feel I'm not alone. This button doesn't do those things, it just gives us interesting videos. So much so that most of the fascination with these videos is that you can feel the absence of those issues.
If it's that easy to upload a video from your camera roll to YouTube (Two clicks) it's not that hard to imagine that this can happen by mistake or by someone who doesn't know that it uploads as "public" by default.
Maybe they just wanted to send this video to a friend and didn't have the technical understanding that this will then be visible for everyone searching for it on YouTube.
I helped implement production and labor planning software on top of FICO xpress some years ago. The paradigm of LP/ILP was all new to me though I was very much into math. Our software was solving hierarchical/nested optimization problems involving millions of variables and constraints every 15 minutes. It felt like magic that this was possible. I would really encourage anyone that has never worked with these tools before to explore them as it can open a palette of tools and ideas you may not have thought of before. A great free way to get into this is to read the examples and use PuLP for Python.
Incredible result! This is a tremendous amount of work and does seem like RV is at its limits in some of these cases. The bit gather and scatter instructions should become an extension!
Supporting higher order derivatives was also something I considered, but it’s basically never needed in production models from what I’ve seen.