This shouldn’t be surprising. Let’s start off with the obvious. What does “real-world fact-check claims” mean? So we’re using the same list of “fact check claims” on each model. The problem is (unless I’m missing it) the authors aren’t exposing the list of 1K questions they used in the experiment. That’s a huge problem. Are the authors assuming the 1K claims they used are “provably true”? If so, that’s a huge bias, and opens up a philosophical debate about what it a fact? Or what’s makes something true/ false?
As Marc Andreessen puts it: a particular domain is either explicitly “provable” or not “provable”. Provable domains include math, physics, chemistry, biology, engineering, even code. That not be the whole list, but everything else is essentially “unprovable”. At least as far as a language model is concerned. They are questions that require a human value judgement. Politics are an obvious example. So back to the “1K fact check claims“. How many of these are political, or current events questions? How many are STEM questions that can be laid out in a formal proof?
Models can be trained to answer either way on claims that require a value judgement, but that’s obviously not beneficial to anyone except who controls the model. If the expectation is that all these frontier models should answer the same way on value judgement questions, then that’s never going to happen. What the models ARE good at though is breaking down the nuances of a topic and arguing both sides. This is how these tools should be used, as a way to analyze the claim and let us humans in the end make our own value judgement. If you’re trusting the model to make the value judgement for you and just accept it as a fact, then you are entering a a very dangerous territory.
Seems cool, and the motion controls and surprisingly precise & difficulty honestly. You really need a steady hand. I personally don’t mind something with a steep learning curve, but maybe offering an “easy” mode where the controls are dialed down a little in intensity would make sense? I like the UI choice where the play field gets smaller as time elapses, giving a sense of urgency. I also like the “double hits” and “wall hits” concept. Some refinement in the scoring system to account for those would make sense. Meaning, right now it seems to be what “level” you achieved with the minor details underneath. Maybe work out a scoring system that accounts for highest level, double hits, wall hits, etc… and calculates and actual “high score” based off that?
I’ve been thinking about exactly this for a while. Non-technical people just don’t understand all the moving pieces, let alone what action they should take. I agree with you, we need more conversation and guidance detailing what easy steps people can take to harden their security posture, and why that matters.
I’ve been a long time vim user, and I honestly never really bought into the efficiency claims. That gets repeated over and over, but If you’re a slow typer then no editor can really make much of a difference, and development in reality is a lot of reading code and thinking about code when it comes down to it.
I’ve never used it because I thought it would make me some lightning fast super developer. I’ve always used it because it’s simply fun. It’s makes editing into this interesting sort of game. You start out with a simple set of skills from vimtutor, and inevitably brute force your cursor around the screen for a while. Little by little your movements become more complex and efficient, and the journey to figuring that out is fun and interesting.
It makes you think about typing in a totally different way. It makes it into a some kind of interesting game where your goal is to accomplish a task in the fewest keystrokes possible. That problem solving aspect scratches an itch inside my brain that has always kept me coming back. It’s just fun, and I don’t think that gets talked about enough
That's fair, although now and then I have to do some repetitive task and using bufdo or a macro has saved a decent amount of time. And compared to something like notepad, all the little details probably save time. My average time savings has probably increased significantly after I stopped spending a lot of time creating custom vim scripts and syntax files.
If I spend more than a couple minutes in an arrow-keys-and-mouse text editor, I often find myself unconsciously reverting to vi-language and getting confused. "Oh, I want to go change that sentence up the page that starts with 'Looking at...'" so I type "?Looking at" into the text editor and then stare at it for a few seconds before hitting backspace and reaching for the mouse like a caveman.
Vim is definitely more efficient when it comes to navigation and manipulation (esp via macros), which are the two things we do the most as programmers.
The added benefit I found is that Vim’s purely keyboard based design is much, much easier on the wrists. Heck, I pushed myself to learn Vim because I started to feel wrist pain due to KB and mouse switching.
I’ve been a long time vim user, and I honestly never really bought into the efficiency claims. That gets repeated over and over, but If you’re a slow typer then no editor can really make much of a difference.
Little by little your movements become more complex and efficient, and the journey to figuring that out is fun and interesting.
The slight contradiction in your comment has a lot of truth in it.
It’s just fun, and I don’t think that gets talked about enough
Yes yes yes. Vim can absolutely lead to more efficient text editing, but I agree it has more to do with the fun journey than with typing speed.
vi definitely doesn't scratch that "itch" for everyone in the same way. But for me, it's as though I found a cheat code. Getting better at vi feels like getting better at a game - only practicing this game makes you better at any number of tasks that are relevant to your daily work.
(although if you also want to get better at typing speed, there are surprisingly fun roguelikes on Steam for just this purpose)
Haha. I spent a significant amount of time getting vim keybinds everywhere, and eventually had to make exceptions for certain software. But, I had my share of making exceptions I guess and got frustrated when I couldn’t map Win+L in Windows to anything and I decided to solve it once and for all. Got a QMK programmable keyboard and now I use hjkl everywhere I would use arrow keys. Did it save me time? No. After 8 months of usage on split keyboard I am back at my original speed and don’t need a cheatsheet for my symbols layer, but it made me less frustrated and feel more free. I don’t need a AHk script or key remapping and their restrictions, this is wayy easieer especially with live VIA configurator.
Readline supports Emacs bindings by default, and so do many textboxes (e.g. Ctrl-backspace, ctrl-arrow key), so that argument is stronger for Emacs than vi.
I like vi/vim, but it gets me all too frequently because I'm not precise enough of a typist. I'm busy typing away and I hit the wrong modifier key or hitting caps lock and I end up pulling things up or making changes that I never intended to make. Worse, in the split second after it happens and my muscle memory tries to correct it my immediate intuition of what mode I was in is wrong or which modifier key I'm pressing or caps lock is on and I only make the situation worse. Any improvements in performance because of the quick key interface are long gone.
I don't think it makes me more efficient at writing code. Its because the act of coding is like 80% reading existing code, only rarely adding new code. You spend far more time moving around and exploring the code, and in vim the keys to do that are single keypresses, or sequences like `]p`. Every other non-modal editor requires you to hit chords like Ctrl-Shift-F to move around, because the "easy" keys are all taken up by "add this character to the buffer", 100% of the time.
vi was pretty efficient in the (terminal) era it launched in.
specifically, the keys used by vi were there on every (terminal) keyboard.
In comparison, emacs heavily depended on modifier keys, which were pretty non-standard and if found, frequently were in different places on each keyboard.
That said, more complex editing tasks using modes/automation/etc might be more efficient with emacs.
personally, I have "thresholds". quick/dirty can be vi, more involved goes to emacs. same with scripting, quick/dirty shell script, more involved to something like python. ymmv
This article is bringing up a point that I’ve thought about for a long time, even outside of the context of LLMs. It’s very surprising to me HTML never caught on more just as a common document for sharing information outside of a website/ web server. Just something to pass around like we do Word or Excel documents.
#1 and #3 are big ones for sure and related. That has to do with most people aren’t developers & don’t code. I get that’s the biggest reason it’s not common. I’m just saying even among developers, I would think it would be a little more common. You can obviously do so much more with it than PDF, it opens a lot of interesting options. I don’t think #2 is a concern at all. The “universal” way to do it is double clicking the file :) every computer in the world will open an HTML document in the default browser (not the case on mobile though).
I’ve recently gotten obsessed with local first app architecture, so I’m really digging into CRDT and trying to get familiar with it. So this looks very interesting to me. Thanks for posting!
To be fair, the recent Axios supply chain attack was North Korea based, and probably cost them very little money. So it illustrates that you don’t have to “spend a lot of money” to get into our systems.
The thing getting overlooked is all of the recent moves by Trump all lead back to China. Venezuela, Cuba, now Iran. These are all tentacles of China. The aggression against these 3 countries is not a coincidence. It’s a concerted and indirect attack on China in an attempt to weaken their subsidiaries. In the eyes of this administration, this is unpleasant, but necessary housekeeping that should have been done decades ago but no one was willing to spend the political capital to do it.
In Iran, Trump was clearly hoping (and verbally requested) the same thing you say about Sadam. I think we actually do know how unpopular the regime is, the mass protests demonstrated that. But the religious hardliners are the ones with the guns. And they clearly aren’t afraid to use them. So while there was some momentum, after everyone got gunned down in the streets by the IRGC it quickly deflated. So asking unarmed protesters to step up again is kind of big ask, without any material support.
Iranian protesters were not calling for US interference. Let's be very clear about that. They were doing it for their own regime change, not some US imposition. What they think of the US or whether they are for this war or supposed regime change by the US is a totally different consideration.
> The thing getting overlooked is all of the recent moves by Trump all lead back to China.
Are you trying to frame the twice accidental president as some sort of visionary? He doesn’t even remember what he said 5 mins ago. If he had planned or even had any clue about wars, we’d not be in this mess. He insulted Zelenskyy last year but ended up asking for his help.
Do you recall orange phenomenon was asking for China’s help just last week, let’s wait for it, to act against their friends, which you called their subsidiaries :-). You can’t script this horror show, even if you wanted to.
And rightfully so. China isn't killing and kidnapping world leaders, supporting genocides in Gaza, launching military operations, threatening its allies of annexation or overtly interfering in their democratic process.
As Marc Andreessen puts it: a particular domain is either explicitly “provable” or not “provable”. Provable domains include math, physics, chemistry, biology, engineering, even code. That not be the whole list, but everything else is essentially “unprovable”. At least as far as a language model is concerned. They are questions that require a human value judgement. Politics are an obvious example. So back to the “1K fact check claims“. How many of these are political, or current events questions? How many are STEM questions that can be laid out in a formal proof?
Models can be trained to answer either way on claims that require a value judgement, but that’s obviously not beneficial to anyone except who controls the model. If the expectation is that all these frontier models should answer the same way on value judgement questions, then that’s never going to happen. What the models ARE good at though is breaking down the nuances of a topic and arguing both sides. This is how these tools should be used, as a way to analyze the claim and let us humans in the end make our own value judgement. If you’re trusting the model to make the value judgement for you and just accept it as a fact, then you are entering a a very dangerous territory.
reply