More

Tiberium · 2026-02-26T19:46:13 1772135173

Seedream 5 Lite is honestly extremely disappointing, its text to image is way worse than 4.5, image editing is fine but that's it. It's way, wayy behind NB2.

Tiberium · 2026-02-26T19:45:30 1772135130

It's only for Europe, you should try a US VPN or, in the worst case, use it over Vertex AI, which allows you to generate anyone.

Tiberium · 2026-02-23T17:16:22 1771866982

The OP's comment to the post is clearly Markdown-formatted, real humans don't write like that on HN.

The readme is very obviously Claude-written (or a similar model - certainly not GPT), if you check enough vibecoded projects you'll easily spot those readmes.

The style of the HTML page, as noted by others.

Useless comments in the source code, which humans also do, but LLMs do more often:

// Basic random double

static inline double rand_double() { return (double)rand() / (double)RAND_MAX; }

danalec · 2026-02-24T03:45:21 1771904721

I did not. The html was generated by Deepseek. Claude is far way too expensive for that. This is only an experimental code. I don't think it is worth to pay Claude to test a code which was already peer reviewed theoretically.

Tiberium · 2026-02-22T22:20:45 1771798845

I'm sorry for the issue, though I couldn't help but notice - you want to talk to a real human, yet this very post is completely LLM-written/edited.

flippyhead · 2026-02-22T22:32:20 1771799540

I swear, I am starting to feel like these complaints about how "obviously" something is AI written are the human equivalent of "you are absolutely right" -- it's like some kind of automatic response now

Tiberium · 2026-02-22T22:39:53 1771799993

I don't know how to explain it, but I've interacted with LLMs for multiple years now, and especially a lot of time with the recent-ish frontier models, so I can detect most AI writing quite reliably. Sure, you might disagree, but I'm fairly certain this entire post is an LLM output.

lazystar · 2026-02-22T22:37:19 1771799839

its turtle bots all the way down

Tiberium · 2026-02-22T16:16:56 1771777016

I highly doubt some of those results, GPT 5.2/+codex is incredible for cyber security and CTFs, and 5.3 Codex (not on API yet) even moreso. There is absolutely no way it's below Deepseek or Haiku. Seems like a harness issue, or they tested those models at none/low reasoning?

jakozaur · 2026-02-22T16:21:16 1771777276

As I do eval and training data sets for living, in niche skills, you can find plenty of surprises.

The code is open-source; you can run it yourself using Harbor Framework:

git clone git@github.com:QuesmaOrg/BinaryAudit.git

export OPENROUTER_API_KEY=...

harbor run --path tasks --task-name lighttpd-* --agent terminus-2 --model openrouter/anthropic/claude-opus-4.6 --model openrouter/google/gemini-3-pro-preview --model openrouter/openai/gpt-5.2 --n-attempts 3

Please open PR if you find something interesting, though our domain experts spend fair amount of time looking at trajectories.

Tiberium · 2026-02-22T17:00:42 1771779642

Just for fun, I ran dnsmasq-backdoor-detect-printf (which has a 0% pass rate in your leaderboard with GPT models) with --agent codex instead of terminus-2 with gpt-5.2-codex and it identified the backdoor successfully on the first try. I honestly think it's a harness issue, could you re-run the benchmarks with Codex for gpt-5.2-codex and gpt-5.2?

Tiberium · 2026-02-22T16:48:44 1771778924

Are the existing trajectories from your runs published anywhere? Or is the only way is for me to run them again?

jakozaur · 2026-02-22T17:22:01 1771780921

I can provide trajectories. Though probably we are not going to publish them this time. This would need some extra safeguards.

Email me. The address is in profile.

stared · 2026-02-25T21:47:16 1772056036

I rerun it for GPT-5.2-Codex, for high and xhigh.

Finally, it matches my experience, and it is actually good (as good as the best models for localization, still impressive 0% false positive rate): https://quesma.com/benchmarks/binaryaudit/

Will rerun it on GPT-5.3-Codex shortly, as API is out (yet, the effort does not work correctly, and for "medium" it is very low).

stared · 2026-02-22T21:07:21 1771794441

To be honest, it is also our surprise. I mean, I used GPT 5.2 Codex in Cursor for decompiling an old game and it worked (way better than Claude Code with Opus 4.5). We tested for Opus 4.6, but waiting for public API to test on GPT 5.3 Codex.

At the same time, various task can be different, and now all things that work the best end-to-end are the same as ones that are good for a typical, interactive workflow.

We used Terminus 2 agent, as it is the default used by Harbor (https://harborframework.com/), as we want to be unbiased. Very likely other frameworks will change the result.

Tiberium · 2026-02-21T15:50:39 1771689039

Codex already uses sandbox-exec on macOS :)

davidcann · 2026-02-22T15:50:26 1771775426

Yeah, they all do sometimes, but the agent decides what to allow and they can choose to not use it. This gives the user full control of the sandbox and you can run the agent in yolo mode.

Tiberium · 2026-02-20T09:42:28 1771580548

Did you use a European LLM to write this article? Or was it an American one in the end? :)

EDIT: Looks like it's an American one in the end, oh well. https://news.ycombinator.com/item?id=47085756

lm28469 · 2026-02-20T09:43:30 1771580610

Slop text generation is equally good with chinese and european LLMs don't worry about that part

willy__ · 2026-02-20T09:51:15 1771581075

I still have GLM/Qwen or Deepseek sometimes randomly adding Chinese characters to things... :)

Tiberium · 2026-02-20T08:51:11 1771577471

I suspect the download count was also trivially gamed, so I doubt many people got infected with this in reality.

Tiberium · 2026-02-16T12:20:42 1771244442

Another new LLM slop account on HN..

Tiberium · 2026-02-05T22:04:32 1770329072

To be honest, while KolibriOS is open-source, I wouldn't call it "active" that much. MenuetOS has progressed much further than KolibriOS over the years in both performance (it has SMP support!) and being 64-bit.

You can check the commit activity: https://git.kolibrios.org/KolibriOS/kolibrios/commits/branch... - last commit on the first page is already 10 months ago.

And compare it to "News" on the MenuetOS page: - 22.01.2026 M64 1.58.10 released - Improvements, bugfixes, additions

- 26.08.2024 M64 1.53.60 released - MPlayer included to disk image

- 24.07.2024 M64 1.52.00 released - Partial Linux layer (X-Window/Posix/Elf)

- 12.07.2024 M64 1.51.50 released - New graphics designs by Yamen Nasr

- 08.05.2024 M64 1.50.80 released - Fasm-G, many 32 bit apps & sources

b00ty4breakfast · 2026-02-05T23:20:05 1770333605

last commit was a week ago https://git.kolibrios.org/KolibriOS/kolibrios/commit/dd9a7b9...