Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Bug in Lynx's SSL certificate validation – leaks password in clear text via SNI (openwall.com)
202 points by jwilk on Aug 7, 2021 | hide | past | favorite | 65 comments


I happened to go to the Lynx Wikipedia page on seeing this news - I never realised it was based on libwww, the original library written by Tim Berners-Lee and Jean-François Groff for HTTP clients back in 1992. Presumably the fork is pretty much unrecognisable at this point, but what an incredible heritage!


Let me say, I really appreciate OpenSSL, and it's made amazing progress in terms of security over the last few years. I make monthly donations to the developers as I believe it's critical infrastructure. Buutt...

From [0], the issue arises when they send "user:pass@host" to SSL_set_tlsext_host_name, which happily sets the SNI to whatever it's given [1]. As a point of comparison, when you create a Rustls client via ClientSession::new [2], you have to pass it a DNSNameRef, which will validate that there's no auth component in the string it wraps, and return an error if you try to set the server name to something involving auth details.

I'm sure there's reasons why OpenSSL is set up to work like this, but I can't see why anyone would ever want to send those auth details in the clear in the SNI, and I wish it provided an API that would anticipate this misuse, like in Rust. The OpenSSL docs don't indicate it's an issue you should think about when invoking this function [3].

I realise I'm picking on OpenSSL here, and GnuTLS appears to do the exact same thing. I'm just not certain anyone not wearing a hazmat suit and being watched by multiple other trained professionals should be handling OpenSSL code.

0: https://www.openwall.com/lists/oss-security/2021/08/07/7

1: https://github.com/openssl/openssl/blob/5cbd2ea3f94aa8adec9b...

2: https://docs.rs/rustls/0.19.1/rustls/struct.ClientSession.ht...

3: https://www.openssl.org/docs/man1.1.1/man3/SSL_set_tlsext_ho...


During the development of rustls, we also found a similar bug in Apple's TLS stack. We reported that to Apple and they fixed it.

So it is feasible -- at scale -- for TLS clients to validate that DNS names in the SNI extension really are DNS names, and are not IP addresses or bits of URL.


> During the development of rustls, we also found a similar bug in Apple's TLS stack.

Was Windows' Schannel also tested? Or this is simply due to the dev team having access to Linux and macOS?


Thank you so much for your work on rustls by the way, such a great library, and the results of the last audit were very impressive :)


OpenSSL isn't an HTTPS library. It's TLS and agnostic to any higher level protocol. It's not their job to filter strings with rules that may not apply for every use case.


SNI is specified to only take DNS hostnames. Not bits of URL. DNS hostnames only. OpenSSL and gnutls are faulty in this respect.


RFC 6066:

Currently, the only server names supported are DNS hostnames; however, this does not imply any dependency of TLS on DNS, and other name types may be added in the future (by an RFC that updates this document).


In practice this functionality is rusted in place. If you invented a new name type, that doesn't mean you can call your existing TLS stack's SNI methods with "@Twitter_handle" or "my@email.address" or "#TopicGoesHere" because those methods always write DNS names. There is a type parameter, there has only ever been one type (DNS names) and so that's what your implementation chooses and I haven't ever seen one that lets you change it, because what would you change it to?


You don't have to invent a new name type.

RFC 5280:

The subject alternative name extension allows identities to be bound to the subject of the certificate. These identities may be included in addition to or in place of the identity in the subject field of the certificate. Defined options include an Internet electronic mail address, a DNS name, an IP address, and a Uniform Resource Identifier (URI). Other options exist, including completely local definitions.


X509's subject alternative name extension has nothing to do with TLS' SNI extension. SNI does not have a mechanism for locally-defined name types akin to X509's OtherName.


SNI matches certificates by looking at the SAN list on a list of certificates and finding the best match.


Does this mean that they've never been parsing URLs correctly? Kind of crazy if that's the case since a web browser would be the last place I'd expect to find such a basic URL processing issue.


I would think that this was introduced at the same time as SNI was enabled for Lynx (2009-04-26 from the changelog, but I haven’t checked if it was actually vulnerable at that time). I don’t know if I’d call any URL processing issue basic though, I can safely say that I’m never even going to attempt to write a URL parser, just because of the number of potential edge cases. You can imagine when you’re enabling a (fairly simple, compared to the rest of the TLS spec) extension, you might not be thinking about all the different quirks of the URL spec which could occur and mess up what seems like a basic change.


URIs have a generic grammar defined in RFC 3986.

However, this doesn't cover DNS name validation for the hostname:

"This specification does not mandate a particular registered name lookup technology and therefore does not restrict the syntax of reg-name beyond what is necessary for interoperability"

It also doesn't constrain the userinfo ("username:password") part described in this vulnerability.

So yeah, it's non-trivial.


I don't see what's "non-trivial" about the decomposition:

  authority   = [ userinfo "@" ] host [ ":" port ]
described in section 3.2 of that specification. Looks pretty trivial to me.


It's non-trivial because the userinfo part is not structured in the grammar, but is instead just constrained to a particular character set.

The RFC reads:

> Use of the format "user:password" in the userinfo field is deprecated.

In the real world, for HTTP use cases, you can't just ignore this because it's deprecated.


> the userinfo part is not structured in the grammar, but is instead just constrained to a particular character set.

Yes. So, a correct implementation can't use this document to destructure it, but that's OK because it doesn't need to do so. It can however easily distinguish the userinfo from the rest of the authority.

Unless, like this code, it simply doesn't bother to try.

> In the real world, for HTTP use cases, you can't just ignore this because it's deprecated.

Then I guess you'll need to write code to further parse the userinfo. Seems pretty easy, but this document doesn't explain how because it's deprecated. No impact on whether you can find the host name in the authority since that's a separate field from userinfo.

Assuming, of course, that you bother to separate userinfo from the domain name in the authority, which Lynx did not.


You don't need to validate the hostnamez you can safely leave that for name resolution libraries. You just need to properly separate the user info from the host name and port. Which is pretty simply defined in the grammar.


That depends on your use case. If you're trying to check the hostname against some sort of whitelist or blacklist you're going to want to normalize it. Then you have all kinds of fun like IDN to deal with.


Somehow I have never got around to liking Lynx. But I love (the textmode) links and use it every day. Any reasons why Lynx can be a better choice?


My first exposure to the internet was via Edmonton FreeNet. You'd dial in, using a terminal program like a BBS, and it would dump you into Lynx.

They had a menu where you could access email (Pine), manage your small amount of file space (downloads went here, then you'd go to the file manager to transfer them to your computer via ZModem), change your password, etc.

I have fond memories of Lynx, but I haven't used it since I switched to a provider that gave me a "real" connection.


Lynx is more lightweight, has a slightly different keyboard shortcut setup, and is much more explicit about displaying various HTTP messages in a way the user can see them. It also has confirmations for cookies by default.


Not sure. I think it's just older.

Back then, when I still cared about textmode browsers, there was lynx. Then came w3m and links and they were the cool new stuff.


w3m is pretty good at layout and in particular at tables. I use emacs for my mail, and use w3m to render html email and it does about as well as I think is possible within the confines of fixed-font plain text.


For me, muscle memory. I used Lynx back in the day on my 386 running DOS. And it's been useful everytime I had to install a linux system without installer (last time was guix, if I remember correctly). First thing I do is put it in expert mode to recover 3 lines at the bottom of the screen (you press "o" for options)


Gopher holes with HTTP links everywhere, such as gopherddit or hngopher.


What kind of Saxon genitive is that? Please fix to "Lynx's".


Lol. Fixed. Thank!


Lynx has already been patched.


Any mitigation?


I guess if you use lynx to surf to http basic auth URLs you should switch to another browser until a security update is available. But I'm inclined to say while this is an interesting bug I don't think that's an incredibly common scenario.

Update: It seems there's a preliminary patch that will not fix http auth URLs, but will prevent the info leak: https://www.openwall.com/lists/oss-security/2021/08/07/7


Plain auth websites are pretty uncommon these days. That said, if using other authentication methods are annoying or inconvenient in Lynx (I don't know, I've never used it) then there may be some correlation between Lynx users and plain auth websites meaning this could impact a large fraction of Lynx the userbase.


Http basic authentication urls are never secure (that requires https) and are not supported in links by browsers these days.


This bug only affects HTTPS basic auth, given that the issue is with SNI (and, of course, that basic auth over HTTP doesn’t have any encryption to leak around).


This is my understanding from reading the OP. Lynx is a CLI browser and apparently it sends plain-auth credentials in the clear in the SNI/hostname part of the https/tls handshake. I suppose someone forgot that the authority part of a uri has a slot for both userinfo as well as host, and parsed it wrong. [0]

As far as mitigations go, I guess any of these would work: don't use Lynx; don't use SNI (???); don't use plain auth.

[0]: https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Sy...


Don't use urls like https://user:pass@example.com


> I ALWAYS SAID SNI IS A SHIT THING ONLY USED AS BAD EXCUSE FOR NAT [...] > I FEEL SO VINDICATED RIGHT NOW!

Thorsten Glaser appears to be another from the BSD school of "perfect" programmers who are held back by the inadequacies of literally everybody and everything else in the world. If only the TLS protocol was designed by Thorsten, and everybody else used it the way Thorsten thinks they should, then this program would be correct so it's not really his fault, it's our fault. See?

> Nah, SNI is a rather recent thing. But…

This "rather recent thing" was standardised in 2003 and thus is old enough that if it were human it could vote in a lot of the world. But it's true that in some versions of Lynx it wasn't available until July 2018. They only had three years to identify this gross parsing bug.


"Please don't fulminate."

https://news.ycombinator.com/newsguidelines.html

Also, please don't cross into personal attack. You can make your substantive points without that.


I was confused about the source of this quote because it's not in OP, so for anyone else curious:

https://lists.nongnu.org/archive/html/lynx-dev/2021-08/msg00...


This bug seems to have nothing to do with SNI or TLS, it is all about how Lynx parses the URL and what pieces it incorrectly passes to the underlying TLS library.


Please source your quotes.


The quote is from the next message in the thread, i.e.:

https://www.openwall.com/lists/oss-security/2021/08/07/2


Thanks!


We must overcomplicate everything until there can only be one implementation of each standard


okay, but being able to use only one SSL certificate for each IP fucking sucks


In particular, in the era before SNI support was widespread, your bulk host would charge extra to give you a dedicated IP address so that your HTTPS site worked. It's still an option at some bulk hosts today, you can have free HTTPS that works fine in every browser anybody actually uses or you can pay a few bucks extra so that it also works with the archaic system that one customer never updates.


a dedicated IP so you can use a non-SNI certificate still costs $600/mo on aws cloudfront.


Wow. IPV4 addresses aren’t free, but they’re not usually that expensive either.


With cloudfront it's not just one IP, it's potentially one at every edge node. But yeah, every time I see that pricing I get a little chuckle because it's so obviously deterrent pricing - it's not how much the IP address costs, it's the minimum amount Amazon wants for dealing with your bullshit.


On most hosts I've used, they're about $2/month.

I wish it were the year 2060 so IPv6 could finally be used reliably.


Beyond that, it was absolutely essential to decouple address from identity in order to move the vast majority of sites and services on the Internet to TLS.

Somehow I’ve avoided gaining an understanding of the details of the SNI protocol, so i can’t comment on its quality, but the achievement it has enabled is fairly profound.


SNI goes like this:

When a client connects to a TLS server it may (must in TLS 1.3 if it knows the name of the service) send a field labelled Server Name Indication that gives a name it intended to reach.

The SNI specification explains one type of name, a Fully Qualified Domain Name e.g. "news.ycombinator.com" (notice not "news.ycombinator.com." if you understand why that might matter) but leaves open the possibility that others could exist. They don't and in practice you likely couldn't add new ones now.

The server should look at this name & use it to decide what the client intended. For example if you're a bulk hosting site you might have fifty customers on a single physical machine and you can match the SNI name against the list of customer sites on that machine, then use this to present the appropriate certificates and use the right keys so the connection works and is trusted by the client for that name.

For HTTPS the server should further reason that if the SNI says news.ycombinator.com but then an HTTP/1.1 Host header says some.other.example that's nonsense and deserves an error. Likewise it should reason that if you send SNI for this.does.not.exist.example and it has no records of a this.does.not.exist.example site, it should just give you the TLS error saying it doesn't recognise the name and never get to HTTP at all.

In practice several popular web server programs (e.g. Apache) treat these two stages as entirely unrelated problems, so you can connect to a bulk host, use SNI to say you want corpA.example, and then in HTTP/1.1 ask for corpB.example and it's common that the web server will give you the corpB.example web site, but served with the corpA.example certificates and encryption... if you send SNI for this.does.not.exist.example you may get a randomly chosen or alphabetically first certificate and then an HTTP 404 error...

The more modern ALPN is similar but for protocols instead of names, this lets clients specify which "next" protocols they want to speak on top of TLS. So for example "h2" means you'd like to use HTTP/2 instead of HTTP/1.1 to talk to a web server. The server can reply to ALPN by specifying which of the list you offered it agrees to e.g. it can say it only speaks HTTP/1.1 -- or it can ignore your request entirely.


Super informative! Thank you!


> Somehow I’ve avoided gaining an understanding of the details of the SNI protocol

You put the requested hostname (e.g. example.com) into the Client Hello message in cleartext so that the server knows which SSL site to direct you to / which SSL cert to give you. And the server has a config that matches up server certs with hostnames (and a default server cert) to return.

That's it. It's why people want to encrypt the client hello message, because that leaks info.

https://en.wikipedia.org/wiki/Server_Name_Indication


Awesome, thank you!


Insert inevitable "this would be fixed if everyone switched to IPv6 already" comment here.


Under IPv6 the original pressure for "virtual hosts" and eventually SNI wouldn't necessarily have existed because there are plenty of addresses.

However I suspect that by now somebody would have spotted that we're smuggling the thing we actually wanted to convey via the IPv6 address. some.specialised.thing.example resolves to an address with a particular combination of low 64-bits which are then de-coded by server software listening on that entire subnet as some.specialised.thing.example. And somebody would have proposed just actually transmitting the text across the wire instead.

So I expect that today SNI would exist or at least, the exact same discussion that led to eSNI and today ECH would have happened for other reasons in the world where everybody has IPv6 and the fix for that would be under development.

If you have plentiful IPv6 addresses the privacy aspect still matters, but maybe it gets pushed out further and we're only talking about it now rather than earlier.


Yes there is still a 1-1 mapping from IPv6 to domain, but that mapping is pushed back into the DNS layer where it belongs instead of being smuggled (poorly) through TLS. DNS isn't perfect for privacy either, but at least there's a chance that it can be solved with DoH / private dns servers / etc, instead of the only solution (eSNI) requiring everyone voluntarily signing up to a giant mitm called "cloudflare".


The problem with just solving DNS privacy is that Winnie doesn't care whether you "privately" resolved the name or not, blocking the entire server works fine when each server corresponds to one name. The ability of users to privately resolve winnie-the-pooh.china.example to 10.20.30.40 doesn't help when Winnie can just block 10.20.30.40 entirely.

One of the things we see with the Great Firewall is that you can reach some brand new service, and then a few minutes later (after presumably some automation span up, examined it and didn't like what it found) it's blocked.

In contrast under ECH Winnie can choose to have 10.20.30.40 blocked, and if the only things on it are winnie-the-pooh.china.example and kick-putin-out.russia.example then why not. But if it also features popular-website.example then that's a difficulty.

If it helps while I expect Cloudflare will continue as before, ECH is actually carefully designed so that intermediates can be set up to be able to discern that you want winnie-the-pooh.example and make that work without in fact knowing how to answer for that name. In effect you can sign up to have some popular host (e.g. Google, Amazon, or indeed Cloudflare) provide their servers for your names, but not provide your services and not have any ability to MITM you, they're acting as a sort of IP proxy instead. And some of the big names are clearly enthusiastic about enabling this capability, albeit for a price.


It's really easy to make a mapping of domain names to IPs. If you want a chance at privacy, you need load balancer IPs that have a huge number of sites behind them.


^ Found the Google Chrome dev.


I would agree with you if the *BSD people didn’t tend to put out vastly higher-quality software than most of the other stuff out there.


As evidenced by this bug :)


Right, because one bug is evidence that the entire body of work is inferior. As we all know, Linux never has any bugs




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: