> It is the first model to get partial-credit on an LLM image test I have. Which is counting the legs of a dog. Specifically, a dog with 5 legs. This is a wild test, because LLMs get really pushy and insistent that the dog only has 4 legs.
I wonder if “How many legs do you see?” is close enough to “How many lights do you see?” that the LLMs are responding based on the memes surrounding the Star Trek episode “Chain of Command”.
I started with desktop applications, so my go-to for GUI has been Qt, especially QML. It works on Windows / MacOS / Linux as well as iOS and Android. I think there’s now a way to compile QML to webassembly as well. It also has a ton of support classes that are loosely analogous to the various *Kit things supplied on iOS and Android.
The downside is that the core of Qt is in C++, so it’s mostly seen (or used for?) embedded contexts.
I recently used Slint as well, which isn’t anywhere near as mature, but is at least written in Rust and has some type-safety benefits.
SwiftUI is pretty good too, and I wish I got to work on Apple platforms more.
To me, the simplicity of creating a “Button” when you want a button makes more sense, instead of a React component that’s a div styled by layers of CSS and brought to life by JavaScript.
But I’m kind of bummed that I started with that route (well, and writing partial UI systems for game / media engines a few times) because most people learned web apps and the DOM, and it’s made it harder to get the kind of work I identify with.
So it’s hard for me to recommend Qt due to the career implications…but at the same for the projects I’ve worked on, it’s made a smaller amount of work go a longer way with a more native feel than electron apps seem to have.
Yes. And everyone is glossing over the benefit of unified memory for LLM applications. Apple may not have the models, but it has customer goodwill, a platform, and the logistical infrastructure to roll them out. It probably even has the cash to buy some AI companies outright; maybe not the big ones (for a reasonable amount, anyway) but small to midsize ones with domain-specific models that could be combined.
Not to mention the “default browser” leverage it has with with iPhones, iPods, and watches.
Unified memory and examples like the M1 Ultra still being able to hold it's own years later might be one of the things that not all Mac users and non-mac users alike have experienced.
It's nice to see 16 Gb becoming the minimum, to me it should have been 32 for a long time.
Slint does not use a browser. Instead, it has its own runtime written in rust and uses a custom DSL to describe the UI.
It has API for different programming language.
For Javascript, it uses node or deno for the application logic, and then spawn the UI with its own runtime without the browser.
In a way it is the opposite which took the JS runtime out of electron to replace it with a Rust API, while Slint keeps the JS runtime but swaps the browser for its own runtime (for the JS dev point of view)
Whose interests corporations act in is not arbitrary, it’s tied to how they make money.
Meta and Google make their money primarily from advertisers, Apple makes money from consumers buying iPhones. One of the upsides to paying for something is that the company is incentivized to keep you paying or get you to pay more.
Something I remind people who buy cheaper Android phones and then complain about ads - the OS development is being subsidized by those ads. From Google’s perspective, securing their revenue stream is the justification for Chrome and Android’s existence. It’s not a purely altruistic move to fund their open source development.
Charts of the revenue stream for some major tech companies:
Personally, I’d really like a crossplatform declarative package manager in a mainstream or mainstream-style language, where the nixpkgs equivalent can be JITed or AOTed including the shell scripts, so it isn’t painful to work with and can switch into an environment almost instantly.
Though nix the language syntactically isn’t that complex, it’s really the way that nixpkgs and things like overrides are implemented, the lack of a standard interface between environments and Darwin and NixOS, needing overlays with multiple levels of depth, etc that makes things complex.
The infuriating thing about nix is that it’s functionally capable of doing what I want, but it’s patently obvious that the people at the wheel are not particularly inclined to design things for a casual user who cannot keep a hundred idiosyncrasies in their head memorized just to work on their build scripts.
> It would be very expensive to run such a trial, over a long period of time, and the administrators would feel ethically bound to unblind and then report on every tiny incidentaloma, which completely fucks the training process.
I wonder if our current research product is only considered the gold standard because doing things in a probabilistic way is the only way we can manage the complexity of the human body to date.
It’s like me running an application many, many times with many different configurations and datasets, while scanning some memory addresses at runtime before and after the test runs, to figure out whether a specific bug exists in a specific feature.
Wouldn’t it be a lot easier if I could look at the relevant function in the source code and understand its implementation to determine whether it was logically possible based on the implementation?
We currently don’t have the ability to decompile the human body, or understand the way it’s “implemented”, but that is something that tech is rapidly developing tools that could be used for such a thing. Either a way to corroborate enough information aggregated about the human body “in mind” than any person can in one lifetime and reason about it, or a way to simulate it with enough granularity to be meaningful.
Alternatively, the double-blindedness of a study might not be as necessary if you can continually objectively quantify the agreement of the results with the hypothesis.
Ie if your AI model is reporting low agreement while the researchers are reporting high agreement, that could be a signal that external investigation is warranted, or prompt the researchers to question their own biases where they would’ve previously succumbed to confidence bias.
All of this is fuzzy anyway - we likely will not ever understand everything at 100% or have perfect outcomes, but if you can cut the overhead of each study down by an order of magnitude, you can run more studies to fine-tune the results.
Alternatively, you can have an AI passively running studies to verify reproducibility and flag cases where it fails, whereas now the way the system values contributions makes it far less useful for a human author to invest the time, effort, and money. Ie improve recovery from a bad study a lot quicker rather than improve the accuracy.
EDIT: These are probably all ideas other people have had before, so sorry to anyone who reaches the end of my brainstorming and didn’t come out with anything new. :)
I didn't even think about the replication part of the value proposition.
Do a detailed enough study of an entire population and you get very strong hypothesis testing for all sorts of diseases & treatments simultaneously. You don't have to spend tens of millions of dollars and multiple PHD generations running a blinded study to replicate a specific untested first-principles part of modern medicine's treatment for a rare disease, you get that shit for free and call it up in a SQL query.
Yeah, Rust tends to have enough type-safety that you can encode project-specific constraints to it and lower the learning curve to that of Rust itself, rather than learning all the idiosyncratic unconscious and undocumented design constraints that you need to abide by to keep from destabilizing things.
I wonder if “How many legs do you see?” is close enough to “How many lights do you see?” that the LLMs are responding based on the memes surrounding the Star Trek episode “Chain of Command”.
https://youtu.be/S9brF-wlja8