More

fzimmermann89 · 2025-09-25T20:21:47 1758831707

The switch by Artificial Analysis from per-token-cost to per-benchmark-cost shows some effect! Its nice that labs are now trying to optimize what I actually have to pay to get an answer - It always annoys me to have to pay for all the senseless rambling of the less-capable reasoning models.

svantana · 2025-09-25T21:12:00 1758834720

Did they? I'm looking at the Artificial Analysis leaderboard site now and I only see price as USD/1M tokens.

fzimmermann89 · 2025-09-14T21:38:05 1757885885

..and for complex valued tensors, you need to conjugate.

tripplyons · 2025-09-14T23:51:40 1757893900

I've only accounted for real numbers. I'm not sure how to cleanly account for conjugates when some of the einsums would need them and others wouldn't.

For example, a matrix product would need a complex conjugate, but a Hadamard product wouldn't.

If there is an elegant way to extend this to complex numbers, let me know!

fzimmermann89 · 2025-09-01T07:04:44 1756710284

How foreign is the language - was it likely included in pre training to some degree? Does it use grammar, syllables, and logic similiar to one of the large languages? Your approach assumes there is an easy to learn mapping between context in your target language and concepts in a prettained llm.

Can you get more text written in the low resources language?

Are you ok to share the name of the language?

philomath868 · 2025-09-01T10:29:32 1756722572

Thank you!

The language is Hasidic Yiddish (which is by now different enough from YIVO Yiddish to almost be considered a different language). The amount of (all kinds of) Yiddish included in pre training is probably very little, but not nothing. Also, it's a Germanic language with Hebrew script and roots, and some Slavic roots and suffixes. Most concepts and structure are probably not *very* foreign to a good model.

As I wrote in another comment, I have thought about initializing the new embeddings based on equivalent tokens in the old ones (e.g. by translating a token to English and finding the closest old token), but I'm starting to rethink the feasibility.

I will probably get more text sometime in the future, but I have to build the first version now.

agentcoops · 2025-09-01T11:21:25 1756725685

Not an answer to your original question, but I think you’d be surprised how much high quality historical linguistic content was hiding in the dusty old corners of the internet. I’ve been doing some work recently with LLMs on historical languages (various forms of Latin, Ancient Greek and medieval European languages) and the out-of-the-box performance of state of the art LLMs is shockingly good. It isn’t that surprising when you remember all these archive digitization projects that took place in the early 00s, but ended up either as stale links, preserved only by archive.org, or stored in arcane CRMs essentially unusable by humans. I assume the same is especially true for various historical Yiddish corpora.

I ran some tests and, without fine-tuning, GPT can translate medieval German, for example, considerably better than well-known scholars today.

mathiaspoint · 2025-09-01T11:24:56 1756725896

Why would you throw out the original embedding layer? That seems like a step backwards to me. It's likely it was partly trained on Yiddish and without it you're throwing out a lot of information in the rest of the model.

bc569a80a344f9c · 2025-09-01T11:02:06 1756724526

I strongly suspect you’re overvaluing how far Hasidic Yiddish has drifted, and that fine-tuning an existing model as a dialect will work just fine, particularly given that the languages the different loan words are from will be present in such a model, and that you’re going to a dialect with a simpler grammar.

There’s plenty of guides online for fine-tuning for dialects. 2GB still isn’t a huge amount of data, but it seems like it would definitely be worth a concerted try (including fiddling with it a bit) given how expensive training from scratch is.

philomath868 · 2025-09-01T11:20:15 1756725615

Perhaps. But I don't think there is an existing (open weights) model that really knows YIVO Yiddish, either, so what should I base this fine-tuning on?

yorwba · 2025-09-01T12:46:00 1756730760

You might be able to start with German, since German-Yiddish cognates tend to have fairly regular spelling correspondences (not exactly one-to-one, but often few-to-one).

So given a Latin-script token from a model that does OK in German (bonus points if it also does Hebrew), generate several candidate Hebrew-script tokens with some regex search-and-replace, then use the resulting vocabulary to tokenize your Yiddish corpus and for each original token keep the candidate replacement that was used most often in the tokenization.

This vocabulary replacement should give you a model that does OK in German-in-Hebrew-script. I think that would be a better base for a Yiddish model than training from scratch, but of course that's just a hunch that might turn out to be wrong.

bc569a80a344f9c · 2025-09-01T12:36:36 1756730196

Qwen3 lists Eastern Yiddish (presumably YIVO) as one of the 119 training languages. It’s available at various sizes including rather small ones to experiment with cheaply, and has good documentation for suggested fine-tuning pipelines. I’d start with that.

bc569a80a344f9c · 2025-09-06T10:43:00 1757155380

If you’re still looking at it, there’s a new open weights model that is focusing on multi-linguality: https://news.ycombinator.com/item?id=45108401

agentcoops · 2025-09-01T11:25:52 1756725952

For a similar project, I worked with GPT to create an extensive dataset of translations from a historical language. I could then use this both to evaluate base capacity of other models in the language, i.e. giving the model the task of translating the various passages and evaluating the results with GPT, as well as for fine-tuning.

fzimmermann89 · 2025-09-01T07:15:01 1756710901

Also, for an auto complete I think a small llm trained from scratch should already work well. Have you tried on if the tinystories(also only 3gb..)/nanogpt speed runs without any fancy loss terms etc as a baseline?

fzimmermann89 · 2025-07-18T06:20:57 1752819657

Contacting support obviously interfered with Apple services. Duh.

fzimmermann89 · on Jan 30, 2025

If I am not mistaken, this is done by modulation in Fourier space. We have already been using this in optical setups for ages - at the speed of light.

The interesting part imo is the implementation of this idea in their work and the efficiency and physical size.

fzimmermann89 · on Oct 17, 2024

Sad to hear that they removed most of their content as well.

fzimmermann89 · on May 31, 2024

If only they would fix the memory leak and freeze on resume from hibernate that has been an issue for the last year at least...

fzimmermann89 · on Nov 14, 2023

I thought the same, until I noticed a really annoying WSL2 bug: On two machines I own, waking up from hibernate or standby causes a wsl related process (vmmem) to consume 100% CPU, with wsl becoming completely unresponsive (including wsl terminate etc).

You have to kill all wsl processes, which requires admin rights. So without elevated rights, Ubuntu on windows is not usable on these laptops.

The issue is known for years and has hundreds of comments on GitHub without a fix (https://github.com/microsoft/WSL/issues/6982)

nyantaro1 · on Nov 14, 2023

I ran into this issue as well. I also spent a few hours debugging problems with a database just to discover that I could not reach a server because WSL2 does not support IPV6 (https://github.com/microsoft/WSL/issues/4518)

fzimmermann89 · on April 27, 2023

*shoes.

fzimmermann89 · on March 2, 2023

I got access by providing an academic email adress without mentioning any relevant publications etc.. Took maybe 2-3 days..