Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ligatures in programming fonts (tinyletter.com)
108 points by matiasz on July 23, 2017 | hide | past | favorite | 82 comments


The first problem isn't really a problem, since ligatures are provided by fonts, not character encodings. From the Unicode FAQ on ligatures and digraphs (http://unicode.org/faq/ligature_digraph.html):

"The existing ligatures exist basically for compatibility and round-tripping with non-Unicode character sets. Their use is discouraged. No more will be encoded in any circumstances.

"Ligaturing is a behavior encoded in fonts: if a modern font is asked to display “h” followed by “r”, and the font has an “hr” ligature in it, it can display the ligature. Some fonts have no ligatures, some (especially for non-Latin scripts) have hundreds. It does not make sense to assign Unicode code points to all these font-specific possibilities."

The second problem still stands, though, especially since these sequences of characters can be tokenized differently in different programming languages. IMO, if you're going to have character replacement like this, it should be a configurable editor feature like syntax highlighting.


I don't know what it would do to editor rendering performance, but disabling the `liga` OpenType flag for selections detected as strings would solve the majority of instances of #2

All in all though, this is a purely local dev preference matter - your editor ligature settings never affect the committed code, so neither are really a problem in practice.


I'm a little miffed that the blog author doesn't have a good understanding on how Unicode and OpenType cooperates. Your first point is a great example of that.

As for your second point, I think what I outline here solves most of the problems that the other commenters are arguing about.

Actually, such features can be somewhat properly implemented in OpenType. You can tag these alternate glyphs as stylistic sets, with each stylistic set supporting so-and-so family of programming languages. Then, by default, an unaware editor would not perform the ligature substitution.

However, proper support will still require some standard awareness from the editor through some standard API, so that it selects the right stylistic set for the right text (e.g. comments v. code).

Emacs already does a form of character substitution through prettify; I use it all the time with LaTeX, and found it delightful to work with. It substitutes commands that stand for mathematical symbols with those mathematical symbols defined in Unicode. The limitation of this is that some of the features illustrated in Fira Code such as the prettified Markdown header don't have a corresponding Unicode code point, and thus necessarily has to be implemented as a ligature in a stylistic set.

A final note on the productivity of substituting input characters with more semantically representative symbols for display: when done well, it is not obtrusive and shouldn't hinder productivity. After all, the Chinese and Japanese do this all the time with a more clunky system (IME) in their digital input, and they get by well enough with it.


Isn't the choice of font inherently an editor feature like syntax highlighting?


I don't think they're talking about the choice of font, but instead the choice of replacing certain character sequences (e.g. ->) with glyphs (e.g. →).


This is exactly what I was going to say, and that one big issue I had when experimenting with ligatures for Hylang (a dialect of LISP) was that it did not keep the spacing of the original character combinations. While I thought the ligatures were much more concise, when they were disabled it shifted things around which ruined the indentation. It made the code ugly and hard to read for anyone not using them, so I had to give them up.

Hopefully one day we can all have language specific characters that make our code more concise. Until then I'll stick to fonts that keep the spacing of the original intact.


The article lists a bunch of non-issues. Yes, you can create confusion by abusing unicode.

But that's not new. C++ allows zero width spaces in identifiers. There's a guy on reddit who uses characters from Canadian Aboriginal Syllabics block to have angle braces in Go identifiers.

Yes, they are guaranteed to be wrong sometimes. The big one is the <<< ligature makes the merge conflict zipper look strange.

But it's incredibly obvious when they are wrong. So it's not an issue in practise.

The reality is that no one is making anyone use a ligature font, and some people like them. If it's causing a problem then you can spend ten seconds changing your font.


I gifted that guy gold for the sheer audacity of the thing.


How much gold?


Ugh to answer the last line of the author: I have been using them for that long and the up and downsides are very well known to me. He has an subjective opinion about them, states them as facts 'and everyone who thinks differently is just stupid' OK author...


Wow you weren't kidding. I have no opinion on the matter but that's just arrogant.


This isn't just some blogger though. Matthew Butterick wrote http://practicaltypography.com/ .


A self-published eBook that acknowledges a lack of funding on the main page?

Maybe it is required reading in typography circles, but to someone reading the submission who merely uses a font, this doesn't seem much of a credential.


"A self-published eBook that acknowledges a lack of funding on the main page?" is kind of a low blow.

He's doing something really interesting with the funding model of practicaltypography, that I believe should be praised.


I'm not meaning to deal any blows at all, low or otherwise.

I'm merely explaining that to the reader not a student of typography, there's not really any indication that this should be taken with more authority than 'some blog author' per comments above.


From a brief bit of clicking around, the author has: designed several typefaces[1], written a long text on typography[cited previously], and has done enough programming to have developed the publishing framework he uses in his books[2], and to have written another book on a programming topic[3].

While I don't agree with much of this blog post, I certainly think the author has put in enough work to have their work taken seriously in this particular domain, and generally seems like a pretty interesting person.

1. http://practicaltypography.com/equity.html

2. http://docs.racket-lang.org/pollen/

3. http://beautifulracket.com


I'll state just once more, in order to be absolutely clear that I harbour no ill intent: I have no reason to doubt, having been told, that the author can be regarded an authority on the matter. I'm simply saying that, approaching it fresh, it's not obviously the case, and other commenters' disregard should not be treated as that for a household name.


Sorry, I should have linked a better resource than just the homepage. Here's the about page: http://practicaltypography.com/end-credits.html#bio . He also wrote Typography for Lawyers: http://typographyforlawyers.com/about.html .


I agree with the article, but the arguments the author gives are not quite spot-on. Ligatures render code unreadable, there is no way to see how to enter a particular character sequence that is shown as a ligature. They might beautify code for some individuals, but they should never be used for showing code in an public or explanatory context, like on the web. Some operators are no longer recognisable, a few just look wrong - a clear case where simplicity and functionality is sacrificed for style.


> they should never be used for showing code in an public or explanatory context

This I can definitely agree with.

The readability issue, in my opinion, is fine and coarse grained.

If I'm scanning my own code, having ligatures on is quicker for me to understand and is ultimately more readable. But (and this is a large but), when editing my own code, ligatures are useless and in the way. Thus I set my editor to remove all ligatures on active/highlighted lines.


I do this too, and with other 'prettifiers' like indent line characters.

I use vim, so I find it great to have them in normal mode, and then disabled (set conceal =0) on entering insert mode.


> they should never be used for showing code in an public or explanatory context, like on the web

Or a book like "The TeX Book"[1]?

[1] I believe it's done as pre-render replacement rather than in the font itself but the principle is the same.


I use ligatures in atom and they are 100% aware of context. I disable them in contexts where they don't make sense e.g comments and I disable them on the line the cursor is on.

I've not had any issues. The => ligature looking like the right arrow character is just like a cyrillic A looking like a latin A - it's a problem that never manifests itself.

The author has a very subjective opinion that they try and present as fact.


The author's point about "dumb" ligatures doesn't really hold up: While the "fi" ligature will always mean "f followed by i" its use is not always correct.

For example, in German compound nouns, you do not set a ligature between the two nouns. For instance, "Kaufläche" (Kau: chewing, + Fläche: area) should be written with ligature, while in "Kaufleute" (Kauf: purchase, + Leute: people, = merchants) the ligature should be avoided.


Wow, that's fascinating -- do you have a reference for that? I'd love to learn more, especially as to why -- it seems like that would just result in ugly typography. Or is it solely about the bar of the initial "f" connecting to the next letter?


Well, typography is often about readability. Removing the word boundary in compound nouns isn't really helpful since it removes a boundary that conveys meaning.

There were some examples of nouns where you should avoid ligatures in the TeX docs iirc. Shelfful and selffullfilling are the only ones I remember.

In German (every single fff on any page that bothers to set their own fancy-pants font) and swedish text you see it all the time, often coming from self-proclaimed typesetting/font nerds.


Are there any font specifications that allow for this kind of distinction?


Every font with ligatures allows for this, because they also include glyphs for the component parts of the ligatures. It's the responsibility of the typesetting software to enable or disable ligatures as required, eg. with this TeX package for selective suppression of ligatures:

https://www.ctan.org/pkg/selnolig?lang=en


I use fonts with ligatures while programming because they're more expressive of intent. Many languages use a combination of characters to form a single meaningful token, such as JS with =>. This token is meant to appear similar to an arrow, and has nothing to do with = or >. In this case, I find it preferable to draw a ⇒.


> they're more expressive of intent.

You don't control which font will be used when viewing your source, so nothing extra is being expressed to other people about your intent. Using a font with ligatures in your editor when you program doesn't record anything extra in the source code.

> ⇒

I find it strange that you include the proper way of doing this instead of using font ligatures: use the proper Unicode code point. Some languages already support Unicode operators, such as Perl 6 which, for example, accepts « » as a synonym for << >>.

https://perl6advent.wordpress.com/2015/12/07/day-7-unicode-p...


Unicode sucks for this kind of thing, since it would go on disk. Even if they don't go on disk, you have to figure out how to break them up into multiple characters so they can be edited (so ⇒ must be two characters, which is messy). No, Unicode isn't the answer. It is the answer to deali with someone who isn't using a specific font, like readers on hackernews.

Ligatures are a cake and eat it to. To those who don't care for them, they simply don't see them. As they are under control of the font, editors don't need special support either.


> Unicode sucks for this kind of thing, since eh go on disk

It going to be hard to express the intent when programming if you never write that expression onto non-volatile storage.

> break them up into multiple characters so they can be edited

Several different methods exist for editing Unicode. (e.g. [1])

> As they are under control of the font, editors don't need special support either.

That's exactly the problem. Most editors used for programming already understand the full syntax for many programming languages, while font-ligatures only match short character sequences. As the article mention, this will incorrectly replace some things that happen to share the same sequence of characters. There are also problems[2] in editors with storage or drawing boundaries in the middle of a ligature. Mapping the correct characters to a replacement glyph is a lot easier when you understand the surrounding grammar.

[1] https://docs.perl6.org/language/unicode_entry

[2] The bug tracker for FiraCode has several reports of


The fact that they go on disk is a pro, not a con; and the fact that they're one character not multiple is also a pro and a not a con: what's the point of the individual characters? Those were merely introduces as a necessary workaround to cope with input limitations. Kind of like C trigraphs - and nobody uses those because they want to.


Unicode is a disadvantage for someone who wants to contribute to your code. That person now has to figure out how to write down Unicode arrows, taking her/him out of her/his flow.


Sure, and the Dane has to figure out how to write down ø, taking them out of their flow — until they learn to apply the correct tool for the job.


I'm not sure if you're sarcastic or not. However, I'm Dutch, and we have similar tokens.

Nobody minds if you write Danish code, but if you like contributors, it is asking a lot of them to change their editor or tools to collaborate with you. I hope you didn't mean your collaborators have to adjust to you like that.


Not at all; the point is that a Dane isn't actually taken out of their flow to type ø in Danish, because they're using a suitable keyboard layout. Likewise a programmer need not be taken out of their flow to type (a→t ≠ 0) or a ← b ∧ ¬c or (2=0+.=T∅.|T)/T←ιN.


Yeah, the APLer in me is strongly pro unicode, it's not like it would even be that hard for an editor to find di/tri/quadgraphs and replace it with a unicode glyph for when you just don't want to remember what incantation causes your keyboard to produce ⌹ (I'd go |:| for the trigraph).


> APL ... ⌹

I really wanted to learn APL after seeing Conway's Life implemented[1] as one expression with no loops or temporary variables (using ⌹). Then I realized I would never remember how to type all of the {di,tri}graphs.

[1] https://www.youtube.com/watch?v=a9xAKttWgP4


it's... easier than you might think? about half of them are relatively mnemonic and most of the others are close to a mnemonic (∩ is on 'c' but ∪ is on 'v'). If your on linux you can say set the windows key to shift over to an APL key binding (it's always present in KDE, in gnome enter gsettings set org.gnome.libgnomekbd.desktop load-extra-items true at a terminal then it will show up in the input menu)

and both Dyalog (free now for non-commercial use) and GNU APL (via Emacs) have secondary input methods until you get comfortable.


Honestly, this sounds like the kind of argument you could present about how syntax highlighting is a terrible idea. It might be wrong!

I don't use ligatures, but I really don't see a problem with other people using them. It's fine, it's a style preference thing, that, like fonts and colour schemes, is as much fashion and personal preference as it is anything. But it's fine.


I think this comment gets at the root of the issue. This is syntax highlighting, but it's done by the font rendering engine and it has no context.

Syntax highlighting without context is not really a good experience. The highlighter needs to know what language you're using, and a pretty good idea of how to parse it.

And the comment on ligatures below points out, this is actually true of ligatures for human languages as well, but I doubt the font engines are properly tuned for that either. I'd make an exception for the 'obvious' ligatures like gg gy etc where the descenders were overlapping without a ligature. That shouldn't be an issue in a monospaced context though.


If your syntax highlighting is wrong, it's buggy. Proper syntax highlighting is 100% correct.


Depending on your language, proper syntax highlighting (without parsing the entire program) is nearly impossible.

For example, in C, what highlight category do you give to '*c'? A declaration, a dereference, or a multiplication call?

In Lisp, is the first element of a list a macro or a function (or a value)? If there's a reader macro, it gets even harder.


Syntax highlighting doesn't necessarily have to work at the lowest granularity, though; for some uses, merely distinguishing between comments and non-comments is acceptable, and that's still 'syntax highlighting'. Of course, as you point out, true 100% syntax highlighting needs to fully parse the entire program; why not do that, though? I guess it would be too computationally expensive for certain sizes of program, but it could still update in near-realtime, no?


Syntax highlighting doesn't need to precisely classify every single character according to how the language would parse it. So with your `*c` example, I wouldn't actually expect a syntax highlighter to highlight that at all.

But every classification the highlighter does do must be accurate, or it's a buggy highlighter.

And FWIW, it's certainly possible to write a syntax highlighter that does parse the whole program. You'd normally find this in an IDE rather than a programmer's text editor. For example, writing Swift in Xcode, everything gets precisely highlighted, to the point where references to real types are highlighted whereas references to unknown types (e.g. typos) aren't. It's not practical to do this outside of IDEs, which is why most syntax highlighting only tries to highlight that which it can unambiguously determine.


Correct syntax highlighting is subjective, because color selection is a matter of taste.


Nope, you or I may think reserved words should be red or blue, but either the list of words that get highlighted is correct, or it isn't.


I work with someone who uses ligatures. Everytime I need to see his screen is a problem, I can never recognize what exactly the characters mean. Yes, you get used to it if you use it, but if you don't, working with other people becomes a problem.

I thought it was a bad idea the first time I saw it, and after seeing it in real code I still think the same thing.


Yes, I hate seeing code any way except precisely tuned emacs. I like small sized characters, but most of fonts looks ugly, you'll need use antialiasing to preserve shape, but some chars becomes unreadable with antialiasing. Except of font choosing I'd like to see familiar color theme. And when I see code in someone's else editor its hard to read.

But, you know, others didn't like my emacs setup, they says that its hard to read in such small sizes, and my color theme with dark background is bad for their eyes. Strange people, what on earth make them think, that I should be considering their eyes health while choosing font and color theme for myself?


I think ligatures can be a great feature, but they should not be decided by the font, but by the editor. So you can can destinguish between 'input >> var' and 'vector<vector<int>>' and render the literature in one case but not the other. This and more creative text decorations can be really helpful to read code. Other examples are rendering css colors inline, rendering of tables and formulas in emacs, ...


That's a hard one to get right; for a long time, C++ compilers would tokenize 'vector<vector<int>>' incorrectly and throw a syntax error -- the final two angle brackets had to be separated by a space for the code to compile.


They aren't implicitly decided by the font, they are supported (or not supported) by the font and explicitly decided by the editor, so your recommendation is the case in practice.

i.e. OpenType fonts optionally accept a range of settings flags to toggle various features. For standard ligatures this is the `liga` flag, this flag is off by default and an editor sets the flag to display ligatures (if the particular font supports that flag).


Really? That would be nice. Last time I checked, Atom was one of the only editors that supported it anyway, and it just applied ligatures indiscriminately. Same for the terminals that supported ligatures, I don't even know how e.g. vim would signal to a terminal to use a ligature.


Ligatures in programming fonts really seem like something you'd only get into in the midst of a hardcore procrastination bout.


Q: Should random people on the internet dictate what you do and like? A: Hell no

I quite like the ligatures that come with Fira Code, and most of the author's issues are not applicable to it.

Maybe they should spend more than 5 minutes trying things out.


It seems there is some confusion for the author between the display part and the on disk part. text will not be saved with the ligature, it will still be pure unadorned text when saved... and for the confusion that could arise between a simple quote and a typographic one, it's not coming from the font but from the editor rendering engine (word does those kind of change, IntelliJ does not for example, but both can display correctly ligature)


I'm sure the author is well aware of the difference between encoding and display, since they have apparently designed their own fonts.

However, their argument seems to be that since ligatures are just a hack to make plain text look like something fancy, you end up with confusing interactions due to the lack of context.

Now I don't think this is likely to cause lots of problems in practice, since most people won't use both ligatured => and Unicode ⇒ in the same Haskell code; but in principle the potential for confusion is there. Imagine if some clever font designer made ' look like ` when between a space and a letter, the font would be unusable in any language where that is an important distinction.

Personally, I'm not going to use ligatures for coding, firstly because I'm accustomed to the "normal" look and secondly because fcitx makes it really easy to type arbitrary Unicode when I want it.


I'm aware that Butterick is the author of several fonts, but this article has a surprising amount of vitriol from someone who should know that the choice of a programming font doesn't affect anyone but the programmer who chose it. During the article, he takes several opportunities to point out that his opinions are actually facts.

Typographers usually have a lot of responsibility because their decisions can affect the readability of text for many other people, and this is why sage advice from experienced typographers is usually very handy, but in this case, he's complaining about something that each individual developer has to manually opt into. This is outside the realm of normal typography that he has authority in. The editor settings of an individual developer are extremely personal. Developers don't follow trends just for the sake of it. They set up their environment in the way they believe is most productive for them, because they know only their output matters. If an individual developer believes they are more productive with Fira Code, that's nothing a typographer should lose sleep over. If I believe that I'm ten times more productive when I code in Papyrus, that's no one's business but my own.


> Developers don't follow trends just for the sake of it.

I disagree with this. The problem with lots of programming tools is the difficulty of judging which options are the best for your use case, before you have the experience of using them for a long time. So there is a real temptation to just go along with what everyone else seems to be doing; since they presumably know the benefits. There doesn't even have to be a real trend; it's enough when there's one in your perception.

I have certainly done quite a lot of trend-following myself. I have used relative line numbering in vim ever since I started using it, simply because the first person I ever saw using vim had that option active. Later, when I found myself doing mental arithmetic to get absolute line numbers, I was simply too lazy to make the switch. I have actually only just now gotten around to changing my config.

So someone who is thinking about switching to a ligatured code font might benefit from an article like this, if it turns out that the issues raised would be too annoying for them. If they don't think those will be a problem, however, they can just go right ahead and make the switch.


Am I the only one thinking that "www" using Fira Code is unreadable?


Interestingly, nobody mentioned Mathematica's (Wolfram langauge) approach of getting Unicode characters into code. "High level" mathematical symbols can be used in Wolfram language all over the place, for instance the arrow → indicating a Rule (http://reference.wolfram.com/language/ref/character/Rule.htm...) or ∞ indicating Infinity (http://reference.wolfram.com/language/ref/Infinity.html).

As Mathematica inherits the homoiconicity from LISP, ie. the paradigm "code is data", it abstracts the code representation (called "Forms", http://reference.wolfram.com/language/tutorial/FormsOfInputA...). Every shortened code full of greek symbols can be written in a completely similar form in ASCII. Thus it is solely to the Mathematica notebook (a Qt-based GUI) to render the Unicode. The GUI also allows to quickly enter any named symbols with an approach like typing "[ESC] alpha [ESC]" and an α appears (cf. http://reference.wolfram.com/language/tutorial/SpecialCharac...).

I think this is the right approach: Let the beautification to be done by the code viewers. The approach to enter named symbols probably stems from (La)TeX where one writes "$\alpha$" to get α and is typical to Computer Algebra Systems. For instance, SageMath as well as SymPy allow to define something like a=var("alpha") and render it like α.


Been awhile since I used MM, but I do remember this - personally, I like using Fira with my code, but that's because there's a disgusting amount of math in it and it makes the reading a lot easier - your implicit separation between representation and presentation is spot on IMO


Somewhat off-topic, but the explosion Unicode glyphs can lead to problems more serious than aesthetics:

Punycode exploit: https://www.xudongz.com/blog/2017/idn-phishing/

Greek question mark: https://stackoverflow.com/questions/26965331/javascript-pran...

Shameless plug: https://github.com/BourgeoisBear/A-E-S-T-H-E-T-I-C


> The problem is that ligature substitution is “dumb” in the sense that it only considers whether certain characters appear in a certain order. It doesn’t have any awareness of the semantic context.

True, which is why Iosevka has language-specific ligatures that a sufficiently smart editor (I think the "JS types", like Atom/VSCode, have CSS for this) can use to ligate (?) intelligently.


Premise: don't make => look like ⇒, because ⇒ is already a separate thing.

Then later: check out this font I made where 0 looks like Ø.


Ignoring the Unicode issues, I installed all of these, switched to the retina version of each in my editor and I found none of them any better than what I had been using (Menlo). I'm certain that YMMV but at least for me, there was no improvement (languages tested with were Ruby, and ERB formatted HTML).


Fortress (a dead JVM programming language by Guy Steele) had an interesting approach for special characters:

https://de.wikipedia.org/wiki/Fortress_(Programmiersprache)#...


Do ligatures really act like one character? I've never encountered not being able to select only the f ro the i in a 'fi'-ligature, but I don't really use ligatures outside of word processing.


Some of the ligatures in Fira Code are so radically different from the characters they're substituting, that single-character selection would be incredibly confusing. How would it behave for the characters on the final line of the sample, for example?


Ligatures are for personal usage only. If you are doing presentation to me and use ligatures, I will judge you.


I would argue it's not a one size problem - I use Fira for machine learning heavy code, which is primarily math, and ligatures for the various multicharacter operators make it more readable. Judge me, fine. But judge me in context please :-)


They call it ligatures, but prettifying --> to a long arrow is quite out of scope of historical and practical meaning of typographic ligature. It's abuse of typesetter to scratch one's typography / graphic design itch, no less.


I didn't even know those were a thing

But in a monospace font? No. No. No. no. No.

Epicly bad plan to use something like that to display code


Have you tried it? Its actually quite nice. You don't have like it yourself, but you don't need to declare it bad for everyone else


The context of this is a recent article https://news.ycombinator.com/item?id=14821446


What's bad about them? (I find the arguments in the link interesting, but not all that convincing)


I work with someone who uses them and I find it impossible to read his screen. Unless you work completely alone, it's a terrible idea.


*Unless all people that might ever look at my screen are used to them.

If you work with people over the internet, it doesn't matter because they can disable ligatures and still read your code. And obviously it's not a problem if people know them and are looking on your screen.

But yeah, if you're doing pair programming and first dev is used to them and the second dev is not, that's a problem. The question is, is the solution that the second dev learns them or that the first dev stops using them - both will obviously dislike changing the way they like to code...


Can't guarantee 80 characters length easily.


This isn't a problem with ligatures that are designed to take up as much space as the characters they replace. Fira Code seems to have been designed with this in mind, so for example, the === ligature is exactly three times as wide as a normal character.


mnm




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: