Hacker Newsnew | past | comments | ask | show | jobs | submit | philipkglass's favoriteslogin

Don't use all-MiniLM-L6-v2 for new vector embeddings datasets.

Yes, it's the open-weights embedding model used in all the tutorials and it was the most pragmatic model to use in sentence-transformers when vector stores were in their infancy, but it's old and does not implement the newest advances in architectures and data training pipelines, and it has a low context length of 512 when embedding models can do 2k+ with even more efficient tokenizers.

For open-weights, I would recommend EmbeddingGemma (https://huggingface.co/google/embeddinggemma-300m) instead which has incredible benchmarks and a 2k context window: although it's larger/slower to encode, the payoff is worth it. For a compromise, bge-base-en-v1.5 (https://huggingface.co/BAAI/bge-base-en-v1.5) or nomic-embed-text-v1.5 (https://huggingface.co/nomic-ai/nomic-embed-text-v1.5) are also good.


Cuckoo filters can do even better with the small adjustment of using windows instead of buckets. See "3.5-Way Cuckoo Hashing for the Price of 2-and-a-Bit": https://scispace.com/pdf/3-5-way-cuckoo-hashing-for-the-pric.... (This significantly improves load factors rather than changing anything else about the filter, and ends up smaller than the semi-sorted variant for typical configurations, without the rigmarole.)

My fairly niche use case for these kinds of data structures was hardware firewalls running mostly on SRAM, which needed a sub one-in-a-billion false positive rate.


Personal anecdote time, which enough time has passed that it can finally be told.

About 30 years ago, a family came down from the mountains near San Luis Obispo to ask whether my mother could teach them piano. They were an unusual family -- a mother and a number of children; apparently their father wouldn't leave his homestead up in the mountains. The children were all homeschoooled. They were perhaps a bit raggedy, but all quite brilliant and free-thinking, and quickly became excellent piano players. Our family became friends with theirs, and eventually we were invited to visit their homestead up in the mountains.

The homestead was an off-grid hand-built house and working organic dairy farm, lovingly stuffed to the rafters with various arts and crafts, including a large collection of medieval-style musical instruments which the patriarch of the family, Hal, had built by hand. Hal was an enigma within an enigma: he refused to talk about his past, looked like a Santa-clause mountain man, wouldn't engage with the outside world in person, but was relentlessly curious about it -- able to keep up with conversations about the latest in politics and technology. He also had a keen interest in the archaeology of the upper Colorado plateau, and soon we were making trips to the Cal Poly library to check out the latest archaeology books on his behalf. One day, on a whim, we looked for his name in the index of one of those books, and that's when we found out that we already knew who he was.

Haldon Chase[1] had been at the absolute epicenter of the Beat movement. He was the one who introduced Allen Ginsberg to Jack Kerouac, and most of the other Beats to each other. He'd gone by pseudonym "Chad King" in "On the Road". At the time he didn't have a Wikipedia entry, and at the time all anybody knew is that he had vanished at some point. Of course my family felt privileged to know the rest of the story.

Thinking now about Hal's life, in the few retrospectives I've seen of it, he's framed as having rejected the whole Beat lifestyle. I'm not sure that's accurate. In many ways the life he managed to carve out for himself was the apotheosis of much of the beat philosophy: genuinely free-thinking, self-reliant, non-conformist, creative, and in his way, spiritual. All very Beat. What he certainly rejected was the the limelight. The publicity, the drama, the ego. He wanted absolutely nothing to do with any of that. So he managed to get away and just live a good (if unconventional) life. His kids have all gone on to live really good, non-messed-up lives as well.

So when reading stories about messed-up Beats and their messed-up kids, it's worth considering that there's a kind of anti-survivor-bias at play: where everything worked out, where the trauma didn't explode dramatically or get passed down the generations, you're probably not going to hear about it.

1: https://en.wikipedia.org/wiki/Haldon_Chase -- mostly but not entirely accurate.


I went down an epic rabbit hole the other day—a rabbit labyrinth really—learning about what happened to the children of the Beats. It started here:

https://www.theparisreview.org/blog/2025/10/24/the-female-pi...

That's an intro to a novel by Jan Kerouac—Jack's daughter—which is newly reprinted. It (the intro) is well written and her (Kerouac's daughter's) story is incredible.

That led me to this classic piece, "Children of the Beats", written in 1995 by the son of one of Kerouac's lovers:

https://web.archive.org/web/20220408162741/https://www.nytim...

He tracked down and interviewed several of his literary 'cousins': other children of Beat writers and scenesters. If, like me, you are fascinated by how the lives of artists intertwine with family dynamics, that article is unputdownable. And profoundly sad. All of this material is tragic.

Through that I started reading about Lucien Carr, the golden boy of the Beats who had been their lead shaman—a few years before Neal Cassady showed up—until he stabbed a man to death under murky circumstances that a Hacker News comment is too short to get into:

https://en.wikipedia.org/wiki/Lucien_Carr

That led me to reading about the children of Lucien Carr, one of whom—Caleb Carr—was a military historian who later became an accidental celebrity by writing "The Alienist", a 90s classic of the historical-serial-killer genre. Caleb Carr became an excellent writer, though as far from a Beat as a writer could be. He talks about the trauma field that he and his peers grew up in with painful eloquence.

https://www.salon.com/1997/10/04/cov_si_04carr/

He said this about his father and his buddies Ginsberg and Burroughs: "The one thing that their lifestyle did not factor in was family." To hear about that milieu from a child who had to deal with it all, decades later, is to me a entirely compelling thing.

He used the money from his bestsellers to buy a small mountain in rural New York and built himself an 18th century manor house refuge:

https://web.archive.org/web/20150529181658/https://www.nytim...

https://www.youtube.com/watch?v=OCrt8Pir7jA

He died last year a month after his last book came out. His publishers thought they were getting another serial killer bestseller. Instead he delivered a memoir about his cat, whom this interviewer pushes him to agree was the love of his life:

https://www.youtube.com/watch?v=9zqGaXl1Zg0#t=173

His mother left Lucien Carr and married a man who had three daughters, who grew up with Lucien's three sons in what Caleb (middle son) called a "dark Brady Bunch".

Lucien lived for 11 years with Alene Lee, another former lover of Kerouac, and her daughter. A few years ago a blogger who is into Beat history did this interview with her (the daughter), which of all these pieces is probably the saddest, and which again I couldn't stop reading. If you can read this without your heart feeling assaulted, you're more resilient than I am:

https://lastbohemians.blogspot.com/2022/04/christina-mitchel...

The last rabbit-subhole I went down was the story of the son of William Burroughs, also named William Burroughs, who also wrote drug-phantasmagoric novels (one called "Speed"), had a liver transplant before he was 30, and died at the side of a road in Florida:

https://en.wikipedia.org/wiki/William_S._Burroughs_Jr.

I was never attracted to the Beats aesthetically, except for Burroughs in a cobra-hypnotized way. But the mythology of the Beats as Bohemian free spirits has carried a lot of sway. There's a principle that the shadow side of the artist works itself out in the family. If you ever wanted to learn how this works, the Beat constellation is quite the case to study.

Here is what the son of Neal Cassady, the icon of beatific spontaneity, said in the 1995 interview I linked to above:

"By the 60's, Dad was so burned out, so bitter," John Allen says. "He told me once that he felt like a dancing bear, that he was just performing. He was wired all the time, talking nonstop. I remember once, after a party, about 2 A.M., he went in the bathroom, turned on the shower and just started screaming and didn't stop. I was about 15 then and I knew he was in deep trouble, that he was really a tortured soul. He died not too long after that."


The Library of Congress has a bunch at

https://guides.loc.gov/travel-posters/sample-images

and this site got some traction here recently:

David Klein's TWA Posters - https://news.ycombinator.com/item?id=44952696 - Aug 2025 (9 comments)


This. I had 12 contractors come out for an estimate. I insisted to each that I would only consider estimates accompanied by a Manual J (aka show your work). I got 4 estimates with a manual J, and one of them the vendor said ‘despite that the math says you need a 4 ton outdoor unit, I’m giving you two,’ and refused to budge on that.

I went with a vendor who did the math and sized accordingly and my system works great - great comfort year round and very low energy usage.




I recommend McGill's Back Mechanic book, which is an end-user focused distillation of his academic work.

It suggests simple tests to discover exactly where your pain is coming from and then appropriate exercises to mechanically strengthen the right area and a few workarounds to avoid stressing that area in regular life e.g. alternate ways to pick up light items from the floor.

McGillcs big three are three simple exercises that are generally good for those with no patience for ordering a book and intros to them can be found all over YouTube.


> His books, many of which are annotated with margin comments,

I'm not saying that he did, but this along with being the right age to have read How to Read a Book by Mortimer J. Adler strongly suggest that he used that book to grasp a lot more of his books than most people can.

That book gives you a very good strategy for reading books that are beyond you normally. In the three years since I've read it I've managed to finish books that I couldn't read even when I was doing my PhD and it was my full time job to understand them.

The funny thing is that I only ran into that book when I was trying to figure out how to build knowledge graphs for complex documents using LLMs. Using multiple readings to create a summary of each chunk, then a graph of the connections between the chunks, then a glossary of all the terms and finally a critique of each chunk gave better than sota results for the documents I was working on.


To me, this feels like part of a cooking change we'll remember in 20 years.

We were all taught to cook dry pasta in a giant pot full of boiling salted water, the more water the better. No! Not optimal!

A trivially simple change fundamentally alters the process for the better: soak the pasta in cold water for a couple hours (as far in advance as you like, for convenience). The pasta rehydrates and takes on the texture (but not flavor) of cooked pasta.

Cook it in any hot liquid, quickly (3-4 minutes). Done.

The Ideas In Food book (which is a-m-a-z-i-n-g and nerdy) plays around with this technique in a bunch of interesting ways. But they didn't manage to turn box pasta into ramen noodles. Turns out: not so difficult if you use the modern technique.

This article gets even cooler than making ramen at home. Read it! Strong recommend! Extremely hacker-y!

Also: Lucky Peach is pretty great.


Consumer Lab does this testing. It's great, you just have to pay for a membership.

It's the same Jevons paradox reason as why LLMs are so big despite massive diminishing returns. If we can output 4096Ds, why not use all the Ds?

Like LLMs, the bottleneck is still training data and the training regimen, but there's still a demand for smaller embedding models due to both storage and compute concerns. EmbeddingGemma (https://huggingface.co/google/embeddinggemma-300m), released just yesterday, beats the 4096D Qwen-3 benchmarks at 768D, and using the 128D equivalent via MRL beats many 768D embedding models.


> It has nothing to do with stupidity.

In addition to being rude, its not a particularly clear word.

So I coined "idiodidact", to specifically describe people who have personal selectivity with regard to being teachable. (Greek/English usage: "idios"/personal choice + "didact"/taught)

Any resemblance, to any other word, would be a coincidence.


Plex/Jellyfin for streaming, the *arr suite for cataloging/downloading [0]

[0] https://wiki.servarr.com/


There are utilities to help, waybackpack comes to mind, but I haven't looked in a while. https://github.com/jsvine/waybackpack

That boots model is fascinating compared the the actual boot market in the USA. I can get an excellent pair of made in America, vibrax/goodyear welted extremely sturdy boots for 200 USD - maybe less. Redwing, danger, etc other PNW brands all exist and sell at this price point.

Compare to popular fashion boots like timberlands which are also 200 USD and reasonably sturdy but no Goodyear welt or proper sole so they fail in 5 years or less of regular wear.


I now write all of my bots in javascript and run them from the Chrome console with CORS turned off. It seems to defeat even Google's anti-bot stuff. Of course, I need to restart Chrome every few hours because of memory leaks, but it wasn't a fun 3 days the last time I got banned from their ecosystem with my kids asking why they couldn't watch Youtube.

I thoroughly recommend the Dover books reprint of Henry Mayhew. "London Labour and the London poor" it is absolutely fascinating. People who had a brass farthing to rub together could enjoy eating 2 day old leftovers from the posh banquets, sold on the streets. (2 days because much of it was premade and so a day or so old by the time it hit the banquet table) Trifle and Lobster for anyone. Clothes rental was a thing. Dirt collecting had specialities with "pure" dog dung fetching extra prices for leather tanning.

I think Mayhew may have fed into Sydney and Beatrice Webb which in turn much later leads to Labour party policy, and ultimately the Beveridge report and birth of the welfare state.

Also, if you enjoyed Doctor Doolittle by Hugh Lofting you will be familiar with Matthew the cats meat man.


> Wait ... but this is true.

(It's been a while since this has come up, so maybe I'll write a longer reply, in case it's useful to you and/or others.)

There are two responses, both important.

The first is that your comment included things that the site guidelines ask commenters to avoid: internet tropes, snark, shallow dismissals (all of which are in "but hey, the guy wrote a couple of fun tweets") as well as outright flamebait ("most likely criminal behavior"). None of that is about being true or not, and if your comment hadn't included those things, I wouldn't have responded.

The second, deeper issue is that correctness—though good in principle—is neither sufficient nor necessary to make a good HN comment. For example, true statements can be used as weapons; or they can be off-topic; or they be ammunition for putdowns, and so on. In such cases, a statement being true can make the comment worse, not better.

For example, consider telling a teenager about the acne on his or her face—pretty brutal, no? yet true. Or, to take an old example of pg's (https://news.ycombinator.com/item?id=6539403), consider telling an old person that they're going to die soon. Also true, also not ok in many circumstances.

Context and intention matter, and a good HN comment needs to be in the intended spirit of the site. That's why correctness isn't a sufficient condition for a good comment, and cannot justify a bad one. If you think about it, it isn't a necessary condition either—people are often simply mistaken, and that's part of good conversation (https://news.ycombinator.com/item?id=32697044).

What the "just the facts" or "but it's true" defense misses is that there are infinitely many facts and truths, and they don't select themselves. Humans do that, according to their motives, and a motive is not a fact.

Here are some other links making similar points in case anyone wants further explanation:

https://news.ycombinator.com/item?id=35145770 (March 2023)

https://news.ycombinator.com/item?id=32909407 (Sept 2022)

https://news.ycombinator.com/item?id=32697044 (Sept 2022)

https://news.ycombinator.com/item?id=32628939 (Aug 2022)

https://news.ycombinator.com/item?id=31996470 (July 2022)

https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...


Apparently they're getting very good: https://emschwartz.me/new-life-hack-using-llms-to-generate-c...

I try not to use them too much because I want to build the skill of using SMTs directly for now.


I am absolutely not trying to sustain the other commenter's claim that modern fiction is better lol. I also don't tend to stay on top of contemporary fiction, a lot of stuff gets hyped and well reviewed but just isn't that great. Or it had a lot of resonance for the specific time it was written that lessens as time passes. And of course the extreme selection bias, not all good books "survive" but most of the ones that do are good on at least some qualities.

But there is definitely still excellent fiction being written now. The last sumurai by helen dewitt, or the gray house by mariam petrosyan I would place with the likes of middlemarch and anna karenina.


I know someone who uses FLTK for cross platform development. That one has a dated look too, and it's roots go back to that SGI Indigo Magic desktop I wrote about elsewhere in this discussion.

Sometimes it can be hard because people want to see the same GUI. And this is true even when it is input equal, I mean click to keystroke identical!

Most will say "modern" when they mean, "the one I drive daily."

And in many cases, those are one and the same. Modern simply is the daily driver, be it MacOS or Windows, or...

Doing that cross platform is hard!

With FLTK, a solo dev can write once, build for almost anything and it will work great. This is especially true for a C++ developer, which they are.

Over time, we have found building a modern GUI either takes a ton of time tweaking FLTK to look damn close, or it requires essentially different builds and dependencies, one set per platform supported, or...

Don't do it.

The thing is, we really value being able to bring the application to the user on their platform of choice. And today doing that is damn near free.

Basically, we just need to build FLTK on the target once, and the app will build with few to no problems. Easy peasy.

Users currently use Win 7 through 11 (yes, there are more of those out there than we may want to admit), Mac Intel and MX Apple Silicon, and a Linux user or two.

No Droid yet.

I wish it were different, but it isn't.

This tool seems high value because, like the other one mentioned here "dialog", which simply does the GUI in a terminal, command line tools are often the easiest, most portable build.

From there, one needs to adopt GUI foundation tools, which are specific to an OS, or like FLTK, look different.

Or, spend a ton more money, hire more devs and brute force cross platform tools.

I wish it were different. Is it?

Are we missing somethng?


But your eyeball, retina, all of it, would be producing the same light. Maybe if you had a special eye scrotum of low light producing tissue that hung away from the body.

If we all work on this, I think we can seed the chemtrails-verse with the belief that ancient hunter gatherer men saw nocturnal prey with their testicles, and that you can learn to do it now with a combo of ice baths and bow hunting naked


The USB HID protocol is designed to support basically any device that regularly reports a set of values; those values can represent which keys are pressed, how a mouse has moved, how a joystick is positioned, etc. Now, different devices have different things that they support: joysticks have varying numbers of axes, mice have different sets of buttons, some keyboards have dials on them, etc. So, there's no single format for a report that simultaneously efficiently uses bandwidth and supports all the things a human interface device might do. To solve this, the HID protocol specifies that the host can request a "report descriptor" that specifies the format and meaning of the status reports. This is great for complex devices running a full OS; there's plenty of memory and processing power to handle those varying formats. However, these HID devices needed to also work in very limited environments: a real mode BIOS, microcontroller, etc. So, for certain classes of device such as keyboards and mice, there is a standard but limited report format called the "boot protocol". IIRC, the keyboard version has space to list 6 keys that are pressed simultaneously (plus modifiers), all of which must be from the same table of keys in the spec, and the mouse has an dX and dY field plus a bitfield for up to 8 buttons (four of which are the various ways you can scroll). To implement a more complex device, you'd want to be able to specify your own report format, which the ESP driver doesn't seem to allow you to do.

Burritos are just monoids in the category of endotacos.

> computer boys are really like "imagine a boot so big that logically we must start licking it now in case it might possibly exist someday"

I think this shitpost is all the needs to be said on this topic.


I'm from a similar bicultural household as rayiner, though from comment history I'm guessing I come down more on the American side. I've got enough of a background in both cultures to parse out and explain the differences though.

It's not perceived as "wasting a life" or "not enjoying it" by the parent, and oftentimes not by the child either. Rather, it's different values, different time preferences, and different conceptions of self. Western cultures have a conception of self that is very rigid and individualistic. There's a hard boundary between your wants and everyone else's wants, and you're responsible only for your own desire. This is encoded in our structures of law, in contemporary business culture, in the concept of individual rights, in the goals of Western psychotherapy, and in the relationships between family members that we view as normal.

In most traditional Asian cultures, there is much more of a soft boundary between members of the same family. You are expected to consider the welfare of everyone in the family. And that leads to a sense of obligation between parent and child, and then between child and parent as they get older, and between sibling to sibling when it comes to dealing with the outside world. There is a comparatively stronger boundary between the family and the state, eg. many Asian cultures feel like it's okay to snub the rules of the wider society for the benefit of the family, while in American society this is considered grift, nepotism, and corruption.

Likewise, there is a difference in time perception. Americans have a hard boundary between the present and the future or past. This shows up in popular culture through lines like in Rent ("No day but today", "How do you feel today? Then why choose fear?", "Forget regret, your life is yours to live") or through popular aphorisms to "Let go of the past", "Live for the present", "The future is yours to write", etc. Asian cultures often consider the past, present, and future as one: the past informs the present, which becomes the future, and the "you" of today will soon become the you of tomorrow. As a result, it is perfectly natural to preference "future you" over "present you". And that shows up through things like savings rates (where Asians are consistently higher than Americans), long-term investments, business continuity, and willingness to invest in family and raise the next generation. Denying present pleasures for future gains is not a lifestyle that they don't enjoy; it's simply being smart, and the enjoyment comes from the anticipation of the future payoff.

There's a good illustration of the difference in the two cultures from two movies that both came out in 2018/2019, Crazy Rich Asians vs. The Farewell. Crazy Rich Asians is foremost a Chinese-American film. When the grandmother (who is considered the villain in the film) smugly says "We know how to build things that last", she's exemplifying the values and time preferences of Old China. And the film's climax and resolution is all about choosing present happiness over an indeterminate future, basically a victory of American values over traditional Chinese ones. The Farewell, however, more closely depicts the web of obligations in a traditional Chinese family, and is comedic to American audiences simply because the farces that the family goes through to preserve the feelings of the matriarch make no sense to Americans. Sure enough, Crazy Rich Asians was a smash hit in the U.S. but an utter flop in China, while The Farewell was a sleeper hit in America but did very well with Chinese audiences.


I'm the editor of Spectrum's "Hands On" DIY column: thank you so much! The general goal is to have projects that can be done in a weekend or three for less than roughly $300 and which point to something interesting beyond just the build itself. A lot of credit has to go to David Schneider who is the author of this piece, and has contributed many of Spectrum's citizen science projects.

BTW, If you want to see just the DIY projects instead of all our DIY-related coverage (which can include e.g. interviews or news articles) another handy link is:

https://spectrum.ieee.org/type/hands-on/


~35 years ago, my mother worked in a clinic in central America. One of the more-memorable patients was a 15-year old who arrived without a left index finger. Unlike most people who'd have appeared in that condition, he was jubilant. He'd been out in the jungle cutting chicle (preparing trees to collect sap out of which to make rubber), when he was bit on the tip of that finger by Fer-de-lance:

https://en.m.wikipedia.org/wiki/Bothrops_asper

(As I understood the story, it had been lying unseen on top of a branch he grabbed.) He immediately, as in within seconds, laid his finger against a tree and whipped it off at the root with his machete. No one questioned the correctness of his decision, because he subsequently survived a 45-minute walk to the trail-head, and a 2-hour drive to the clinic, without any sign of further envenomation. (Indeed, the clinic didn't keep any anti-venom, so he'd have had a further 40 minutes to go to the hospital in the nearest small city.) His chances of living would have been dicy at best, and the loss of at least his entire arm a certainty.

The local (Maya-language) slang for the Fer-de-lance translated to something like "fifteen-steps", because that was supposedly how far you'd walk before you'd die from a bite. Brave, quick-thinking dude.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: