Heape-Johnson, A., McGee, J. B., Wolf, P. J., May, J. F., & Maloney, L. D. (August 2023). Charter School Funding: Little Progress Towards Equity in the City. School Choice Demonstration Project.
In some states and cities the difference is more extreme than in others.
I attended high school in the US in the early 00’s and cell phones were absolutely banned from classrooms. You could keep them in your locker and use them between classes, but that was it.
I attended college in the late 00’s, and I don’t think I took a single digital exam. Quizzes, sure, but for final exams even CS was pencil and paper (or a final project, which admittedly will have issues in the post-LLM era).
I was also in high school in that time period and had a similar experience. As I recall it, pretty much every student had a phone by 2007ish (flip phones back then), and using a phone in class was grounds to have it confiscated for the day and get a detention. This was absolutely enforced.
My college experience was similar to yours as well. All exams were paper (often blue books). Having a phone out would get you kicked out of the exam hall. But by the time I did med school, it was all digital.
I was in high school in the late 2010s. No cell phones allowed during most of class time, and it was somewhat enforced. I definitely recall students being chewed out for having their phones out in class, but I also recall some students having their phones out with no repercussions.
FWIW there is a new-ish kind of intermediate genre between classic LAN/ranked multiplayer and single player, which is the whole “survival” genre. Generally speaking, they can be played as single player games, but also allow for small-scale co-op, synchronously or asynchronously. So even if you and a buddy have different schedules, you can make progress separately but still occasionally play together.
Valheim, Grounded, Ark, Satisfactory are a few among many others.
I keep jumping between Preview and Skim.
Preview for simple tasks and editing, Skim for advanced reading (think huge pdfs, where I want to see two parts of the file at the same time).
I contend it is the only way to move forward on the goal of “automating” mathematics. Although we’ve seen natural language approaches do well at IMO, the human effort required to verify higher level proofs is too great with hallucinations being what they are. With something like Lean, you don’t need a human verifier.
> With something like Lean, you don't need a human verifier.
Human verification can never be ruled out entirely with these sorts of systems: you always have to check that the definitions used in the final statement mean what you think they mean, and that all of the base axioms are acceptable.
And of course, there's always the possibility of bugs in the kernel. I even recently found a bug [0] in a verifier for Metamath, which is designed to be so simple that its only built-in logic is typed string substitution. But such bugs should hopefully be unlikely in non-adversarial settings.
That’s a fair point. But it greatly limits the scope of human-introduced error. I think already for FLT, the surface area for error in the kernel and in axiom translation is substantially smaller than the entirety of the literature which Wiles’s proof recursively depends on.
> With something like Lean, you don’t need a human verifier.
The purpose of a proof is to show yourself and someone else why something is true. I don’t know what it would mean to be writing them for computers to verify. Unless the only thing you are interested in is y/n
Humans make mistakes. The more complex our mathematics become, the higher the chance that mistakes creep in. If you want mathematical foundations to be solid you need to minimize the number of wrong theorems we build on.
To elaborate, this is a foundation model. This basically means it can take an arbitrary image and map it to a high dimensional space H in which ~arbitrary characteristics become much easier to solve for.
For example (and this might be oversimplifying a bit, computer vision people please correct me if I’m wrong) if you’re interested in knowing whether or not the image contains a cat, then maybe there is some hyperplane P in H for which images on one side of P do not contain a cat, and images on the other side do contain a cat. And so solving for “Does this image contain a cat?”becomes a much easier problem, all you have to do is figure out what P is. Once you do that, you can pass your image into DINO, dot product with the equation for P, and check whether the answer is negative or positive. The point is that finding P is much easier than training your own computer vision model from scratch.
Caveat: I am not an expert, so this is a semi-educated guess.
I imagine it would depend on whether DINOv3 captures the information of whether a given person is in the image, which I think is really a question about training data. So naively, I would guess the answer is yes for celebrities and no for non-celebrities. Partially for data/technical reasons, but also maybe due to the murkier legal expectation of privacy for famous people.
Foundation models like DINO learn representations of their inputs. That is, they generate very high-dimensional numerical descriptions of what you put into them. The models aren't trained on labelled data, but they're trained on some pretext task like "given this image with a cutout, fill in the cutout" (see Masked Auto-Encoders). So the basic output from a model is a vector - often called an embedding. Literally a 1D list of numbers, O(1k)-dimensional. Your goal is to get an embedding that assigns (well) linearly separable vectors for all the things you want to classify.
Vision transformers also output patch tokens, which can be assembled into a low-resolution feature map (w/32, h/32 is common). So what you do with that data depends on the task. Classification can be as simple as linearly classifying the (whole image) embedding. A semantic segmentation task can do the same, but for every pixel. This is why the DINO authors show a PCA representation of a bunch of images, which show that semantically similar objects are grouped together by colour. Object detectors are more complicated, but the key thing is that once you have these pixel-level features, you can use them as input into existing architectures.
Now to your question: face recognition is a specific application of object re-identification (keyword: Re-ID). The way most of these models work is from the whole-image embedding. Normally you'd run a detector to extract the face region, then compute the embedding, put it in a vector database and then query for nearest neighbours using something like the cosine distance. I've only worked in this space for animals, but humans are far more studied. Whether DINOv3 is good enough out-of-the-box I don't know, but certainly there's a lot of literature looking at these sorts of models for Re-ID.
The challenge with Re-ID is that the model has to be able to produce features which discriminate individuals rather than similar looking individuals. For example with the vanilla model, you probably have a very good tool for visual search. But that's not the same task, because if you give it a picture of someone in a field, you'll get back pictures of other people in fields. That usually requires re-training on labelled imagery where you have a few examples of each person. The short answer is that there are already very good models for doing this, and they don't necessarily even need ML to do a decent job (though it might be used for keypoint detection for facial landmarks).