I'm sick and tired of the "No..., no ..., (just) ..." LLM construction. It's everywhere now, you can't open a social media platform and get bombarded by it. This article is full of it.
I get it, I should focus just on the content and whether or not an LLM was used to write it, but the reaction to it is visceral now.
Yes, there are some exceptions where it clearly states that a thinking model has been chosen like for kimi, but there is no such indicator for the GPT family from OpenAI and other major models.
I think it is quite reasonable to tell incompetents that they can't just cover their ass by claiming "you can't demand perfection".
These are the same kind of incompetents who want the pay but not the responsibility of the position. Who think that building a giant haystack of all of the data is the solution so they can illogically claim to have prevented something that because you had that needle in there somewhere! Except you never found it in time because you were too busy building the tower of Babel out of hay! It is just utterly idiotic double-think. (Cough, cough NSA!)
It's one thing for your blog post to be full of faux writing style, but also that letter to the organization... oof. I wouldn't enjoy receiving that from someone who attached a script that dumps all users from my database and the email, as well as my access logs, confirm they ran it
For determining the maximum performance achievable, the performance per watt is what matters, as the power consumption will always be limited by cooling and by the available power supply.
Even if we interpret the NVIDIA claim as referring to the performance available in a desktop, the GPU cards had power consumptions at most double in comparison with CPUs. Even with this extra factor there has been more than an order of magnitude between reality and the NVIDIA claims.
Moreover I am not sure whether around 2010 and before that, when these NVIDIA claims were frequent, the power permissible for PCIe cards had already reached 300 W, or it was still lower.
In any case the "100" factor claimed by NVIDIA was supported by flawed benchmarks, which compared an optimized parallel CUDA implementation of some algorithm with a naive sequential implementation on the CPU, instead of comparing it with an optimized multithreaded SIMD implementation on that CPU.
Well, power envelope IS the limit in many applications; anyone can build a LOBOS (Lots Of Boxes On Shelves) supercomputer, but data bandwidth and power will limit its usefullness and size.
Everyone has a power budget. For me, it's my desk outlet capacity (1.5kW); for a hyperscaler, it's the capacity of the power plant that feeds their datacenter (1.5GW); we both cannot exceed Pmax * MIPS/W of computation.
How is caching implemented in this scenario? I find it unlikely that two developers are going to ask the same exact question, so at a minimum some work has to be done to figure out “someone’s asked this before, fetch the response out of the cache.” But then the problem is that most questions are peppered with specific context that has to be represented in the response, so there’s really no way to cache that.
From my understanding (which is poor at best), the cache is about the separate parts of the input context. Once the LLM read a file the content of that file is cached (i.e. some representation that the LLM creates for that specific file, but I really have no idea how that works). So the next time you bring either directly or indirectly that file into the context the LLM doesn't have to do a full pass, but pull its understanding/representation from the cache and uses that to answer your question/perform the task.
I wrote awhile ago on here that he should stick to his domain.
I was downvoted big time. Ah, I love it when people provide an example so it can finally be exposed without me having to say anything.
Unfortunately this is a huge problem on here - many people step outside of their domains, even if on the surface it seems simple, but post gibberish and completely mangled stuff. How does this benefit people who get exposed to crap?
If you don't know you are wrong but have an itch to polish your ego a bit then what's stopping you (them), right.
People form very strong opinions on topic they barely understand. I'd say since they know little the opinions come mostly from emotions, which is hardly a good path for objective and deeper knowledge.
I added the following at the top of the blog post that I wrote yesterday: "All words in this blog post were written by a human being."
I don't particularly care if people question that, but the source repo is on GitHub: they can see all the edits that were made along the way. Most LLMs wouldn't deliberately add a million spelling or grammar mistakes to fake a human being... yet.
As for knowing what I'm talking about. Many of my blog posts are about stuff that I just learned, so I have many disclaimers that the reader should take everything with a grain of salt. :-) That said: I put a ridiculous amount of time in these things to make sure it's correct. Knowing that your stuff will be out there for others to criticize if a great motivator to do your homework.
I get it, I should focus just on the content and whether or not an LLM was used to write it, but the reaction to it is visceral now.
reply