More

statictype · 2025-12-10T08:10:08 1765354208

This article sums it up perfectly. I was interested in building a compiler long before going to college and this was the most accessible body of work.

Building a recursive descent parser from scratch was an eye opener to 17yo me on how a seemingly very complex problem that I had no idea how to approach can be made simple by breaking it down into the right primitives.

fuzztester · 2025-12-10T09:48:05 1765360085

>a seemingly very complex problem that I had no idea how to approach can be made simple by breaking it down into the right primitives.

https://en.wikipedia.org/wiki/Niklaus_Wirth

From the Publications section of that Wikipedia page:

>The April 1971 Communications of the ACM article "Program Development by Stepwise Refinement",[22][23] concerning the teaching of programming, is considered to be a classic text in software engineering.[24] The paper is considered to be the earliest work to formally outline the top-down method for designing programs.[25][26] The article was discussed by Fred Brooks in his influential book The Mythical Man-Month and was described as "seminal" in the ACM's brief biography of Wirth published in connection to his Turing Award.[27][28]

microtherion · 2025-12-10T18:41:34 1765392094

Wirth also wrote an extremely accessible book on Compiler Construction, using exactly the hand written recursive descent parsing approach discussed by OP.

The initial edition was published in 1976, in German, but the latest version is available online:

https://people.inf.ethz.ch/wirth/CompilerConstruction/Compil...

There are also parser generators like ANTLR (https://en.wikipedia.org/wiki/ANTLR) which take an input not unlike yacc, but generate a LL parser using explicit code, rather than the table driven LALR parsing of yacc.

fuzztester · 2025-12-11T01:08:40 1765415320

Thank you. Just to confirm, by "accessible", do you mean easy to understand?

Anyway, I think I had come across that book on the net, but did not check it out at the time. I don't remember the exact reason, maybe it was because I didn't want to go into the subject of compilers at the time, and was only interested in interpreters, because I prefer to take things one step at a time.

Now I will check it out.

microtherion · 2025-12-11T15:39:22 1765467562

Yes, accessible in the sense of being readable without extensive prior knowledge. If I recall correctly, I read the initial edition while still in high school.

fuzztester · 2025-12-23T01:08:04 1766452084

Got it, thanks.

xyproto · 2025-12-10T08:16:28 1765354588

"breaking things down into the right primitives" is the real key to programming. There are many books and web pages about algorithms, but I wish there were more searchable and browsable resources for how to approach problems through primitives.

zwnow · 2025-12-10T08:45:02 1765356302

The process of breaking a complex problem down into the right primitives requires great understanding of the original problem in the first place.

Whats blocking me during programming usually are edge cases I had no idea about. Its still hard to find good material on compilers if you are not into reading dry ass books. Thats a me problem though, I simply cant force myself to read boring factual only content (one of the reasons as to why I love beejs guides).

HarHarVeryFunny · 2025-12-10T15:10:21 1765379421

> The process of breaking a complex problem down into the right primitives requires great understanding of the original problem in the first place.

Yes, but with experience that just becomes a matter of recognizing problem and design patterns. When you see a parsing problem, you know that the simplest/best design pattern is just to define a Token class representing the units of the language (keywords, operators, etc), write a NextToken() function to parse characters to tokens, then write a recursive descent parser using that.

Any language may have it's own gotchas and edge cases, but knowing that recursive descent is pretty much always going to be a viable design pattern (for any language you are likely to care about), you can tackle those when you come to them.

seg_lol · 2025-12-12T03:33:44 1765510424

There is a good body of literature that is distinct from the academic compiler literature.

https://t3x.org/

https://compilerbook.com/

https://interpreterbook.com/

https://www.craftinginterpreters.com/

R. G. Loeliger Threaded Interpretive Languages Their Design And Implementation https://news.ycombinator.com/item?id=43973544

https://github.com/IUCompilerCourse/Essentials-of-Compilatio...

And check out Andy Keep's Nanopass Compiler work, https://nanopass.org/

scotty79 · 2025-12-10T10:43:11 1765363391

When I need to parse something nowadays I always end up with parser combinators. They just make so much sense.

mutkach · 2025-12-10T11:13:08 1765365188

What language do you use parser combinators in, and what kind of grammar do you parse usually? Nom was terribly verbose and unergonomic even by Rust's standards. Haskell's Megaparsec/Parsec is good but yeah, it's Haskell, you need to handle multiple monads (Parser itself is monadic, then your AST state, and maybe some error handling) at once and that's where I got confused. But I appreciated the elegance.

I experimented with PCs in Haskell and Rust (nom), then moved on to parser generators in Rust (pest.rs), Ocaml (Menhir), Haskell (Happy) and finally ended up with python's Lark - the speed of experimenting with different syntax/grammars is just insane.

scotty79 · 2025-12-10T12:06:36 1765368396

Parser combinators is more of a concept than a library. You could make your own supporting the stuff you need. I like writing programs in languages I don't know or I barely know. I usually just take one of the popular libraries in any given language.

For Rust I used Nom and I didn't mind it all that much although I noticed it's quite baroque. If I had more to write I'd probably make some wrappers or macros of my own for most commonly used Nom snippets.

VonTum · 2025-12-10T13:18:34 1765372714

I've used tree-sitter for generating my parsers in Rust, and just working with the untyped syntax tree it generates, and gives you error-tolerance for free. It's a bit of a setup at first tho, requiring an extra crate for the generated parser, but editing it from there saves so much time.

mutkach · 2025-12-10T13:32:21 1765373541

What do you mean exactly by "error-tolerance"? Is it like, each node is wrapped into a result type, that you have to match against each time you visit it, even though you know for a fact, that it is not empty or something like that?

I suppose that one of the pros of using tree-sitter is its portability? For example, I could define my grammar to both parse my code and to do proper syntax highlighting in the browser with the same library and same grammar? Is that correct? Also it is used in neovim extensively to define syntax for a languages? Otherwise it would have taken to slightly modify the grammar.

VonTum · 2025-12-10T15:56:41 1765382201

Oh nono, with tree-sitter, you get an untyped syntax tree. That means, you have a Cursor object to walk the tree, which creates Node objects as you traverse, that have a "kind" (name of the tree-sitter node), span, and children. (I recommend using the rust tree-sitter bindings itself, not the rust wrapper rust-sitter).

Yes, portability like that is a huge benefit, though I personally utilized it for that yet. I just use it as an error-tolerant frontend to my compiler.

As to how errors are reported, tree-sitter creates an ERROR or MISSING node when a particular subtree has invalid syntax. I've found that it never leaves a node in an invalid state, (so never would it create a binaryop(LeftNode(...), Op, ERROR) if RightNode is not optional. Instead it would create an ERROR for binaryop too. This allows you to safely unwrap known fields. ERROR nodes only really bunch up in repeat() and optional()s where you would implicity handle them.

For an example, I can only point you to my own use: https://github.com/pc2/sus-compiler

tree-sitter-sus has the grammar

sus-proc-macro has nice proc macros for dealing with it (kind!("binop"), field!("name"), etc)

src/flattening/parser.rs has conveniences like iterating over lists

and src/flattening/flatten.rs has the actual conversion from syntax tree to SUS IR

flufluflufluffy · 2025-12-10T16:41:42 1765384902

Error tolerance in this context means the parser produces a walkable AST even if the input code is syntactically invalid, instead of just throwing/reporting the error. It’s useful for IDEs, where the code is often in an invalid state as the developer is typing, but you still want to be able to report diagnostics on whatever parts of the code are syntactically valid.

userbinator · 2025-12-11T04:14:43 1765426483

That's just a generalisation of recursive descent.

scotty79 · 2025-12-11T10:26:06 1765448766

The right primitives and composability bring in immense value.

HarHarVeryFunny · 2025-12-10T14:44:42 1765377882

That's a good point - recursive descent as a general lesson in program design, in addition to being a good way to write a parser.

Table driven parsers (using yacc/etc) used to be emphasized in old compiler writing books such as Aho & Ullman's famous "dragon (front cover) book". I'm not sure why - maybe part efficiency for the slower computers of the day, and part because in the infancy of computing a more theoretical/algorithmic approach seemed more sophisticated and preferable (the cannonical table driven parser building algorithm was one of Knuth's algorithms).

Nowadays it seems that recursive descent is the preferred approach for compilers because it's ultimately more practical and flexible. Table driven can still be a good option for small DSLs and simple parsing tasks, but recursive descent is so easy that it's hard to justify anything else, and LLM code generation now makes that truer than ever!

There is a huge difference in complexity between building a full-blown commercial quality optimizing compiler and a toy one built as a learning exercise. Using something like LLVM as a starting point for a learning exercise doesn't seem very useful (unless your goal is to build real compilers) since it's doing all the heavy lifting for you.

I guess you can argue about how much can be cut out of a toy compiler for it still to be a useful learning exercise in both compilers and tackling complex problems, but I don't see any harm in going straight from parsing to code generation, cutting out AST building and of course any IR and optimization. The problems this direct approach causes for code generation, and optimization, can be a learning lesson for why a non-toy compiler uses those!

A fun approach I used at work once, wanting to support a pretty major C subset as the language supported by a programmable regression test tool, was even simpler ... Rather than having the recursive descent parser generate code, I just had it generate executable data structures - subclasses of Statement and Expression base classes, with virtual Execute() and Value() methods respectively, so that the parsed program could be run by calling program->Execute() on the top level object. The recursive descent functions just returned these statement or expression values directly. To give a flavor of it, the ForLoopStatement subclass held the initialization, test and increment expression class pointers, and then the ForLoopStatement::Execute() method could just call testExpression->Value() etc.

userbinator · 2025-12-11T04:12:54 1765426374

Rather than having the recursive descent parser generate code, I just had it generate executable data structures - subclasses of Statement and Expression base classes, with virtual Execute() and Value() methods respectively, so that the parsed program could be run by calling program->Execute() on the top level object.

That is basically an AST interpreter.

HarHarVeryFunny · 2025-12-11T15:35:55 1765467355

No - there was no separation of AST and interpreter. You could consider it just as a directly executable AST.

statictype · 2025-10-21T23:06:00 1761087960

Thanks for this. We resorted to setting ram thresholds in systemd.

Is earlyoom a better solution than that to prevent an erratic process from making an instance unresposnsive?

statictype · 2025-08-01T07:06:40 1754032000

You can do this with SES on AWS or Mailgun. We use SES for this exact use case. Giving agents programmatic access to an email inbox and programmatically creating inboxes.

Haakam21 · 2025-08-01T08:57:26 1754038646

Yes you can use SES or Mailgun. But you would also need to build inboxes, email threading, attachment parsing, semantic search, structured data extraction, and more.

statictype · 2025-06-17T09:44:36 1750153476

The free version doesn’t show ads. Come on.

konha · 2025-06-17T10:08:42 1750154922

The free version is a loss leader to build mindshare amongst people who are (later) in a position to buy licenses.

arccy · 2025-06-17T10:56:42 1750157802

They do try to upsell you hard though... 1st party ads are still ads

parthdesai · 2025-06-17T12:56:52 1750165012

How is that an ad? You get limited set of features for free, if you want to use other features, they tell you it's only available in paid version.

1123581321 · 2025-06-17T13:11:55 1750165915

They advertise in the sidebar, offer discounts for the first month, occasional free trials of Pro features. You don’t have to try to use a Pro feature to get the sales pitch.

statictype · 2025-05-23T08:52:42 1747990362

The problem is, it's not necessarily the job of the person who is tasked with doing it.

These are the kind of things that fall between the gaps in smaller companies and there's no expert to build this disaster recovery plan because there is no risk or compliance department.

It falls into the lap of whomever is dealing with the audits or whoever has a reputation for getting things done and unblocking people.

satisfice · 2025-05-25T00:51:14 1748134274

If I am tasked with it, then it's my task. If I think I can't do it, then I will say so.

What I won't do is "Tom Sawyer it" to an AI. That's the sort of thing that kids do. I'm assuming the OP is an adult.

statictype · 2025-02-16T12:40:49 1739709649

It’s all relative. Having lived and driven in India for many years, the driving in Sri Lanka is very organized and civil.

But I get what you’re saying.

statictype · 2025-01-24T01:36:42 1737682602

Optimizely also sold to private equity many years ago. Presumably the founderts aren't there any more.

statictype · 2025-01-16T03:38:27 1736998707

It is interesting that their 'About' page mentions a lot of their work but no mention of Adani - which would have arguably been their biggest.

statictype · on Oct 23, 2024

Do you have a link to the Snowflake whitepaper?

JosephRedfern · on Oct 23, 2024

Perhaps this: https://event.cwi.nl/lsde/papers/p215-dageville-snowflake.pd...

statictype · on Sept 15, 2024

The Hard Thing About Hard Things - Ben Horowitz

This is probably one of the best

Shoe Dog - Phil Knight. A great read about the struggles of how Nike started. But was left unsatisfied as the book ends with the company going public but doesnt cover its ascension in the late 80s/90s and how that went down.

atlasunshrugged · on Sept 15, 2024

I second The Hard Thing About Hard Things