My NFS story: In my first job, we used NFS to maintain the developer desktops. They were all FreeBSD and remote mounted /usr/local. This worked great! Everyone worked in the office with fast local internet, and it made it easy for us to add or update apps and have everyone magically get it. And when the NFS server had a glitch, our devs could usually just reboot and fix it, or wait a bit. Since they were all systems developers they all understood the problems with NFS and the workarounds.
What I learned though was that NFS was great until it wasn't. If the server hung, all work stopped.
When I got to reddit, solving code distribution was one of the first tasks I had to take care of. Steve wanted to use NFS to distribute the app code. He wanted to have all the app servers mount an NFS mount, and then just update the code there and have them all automatically pick up the changes.
This sounded great in theory, but I told him about all the gotchas. He didn't believe me, so I pulled up a bunch of papers and blog posts, and actually set up a small cluster to show him what happens when the server goes offline, and how none of the app servers could keep running as soon as they had to get anything off disk.
To his great credit, he trusted me after that when I said something was a bad idea based on my experience. It was an important lesson for me that even with experience, trust must be earned when you work with a new team.
I set up a system where app servers would pull fresh code on boot and we could also remotely trigger a pull or just push to them, and that system was reddit's deployment tool for about a decade (and it was written in Perl!)
I was at Apple around 15 years ago working as a sysadmin in their hardware engineering org, and everything - and I mean everything - was stored on NFS. We ran a ton of hardware simulation, all the tools and code were on NFS as well as the actual designs and results.
At some point a new system came around that was able to make really good use of the hardware we had, and it didn’t use NFS at all. It was more “docker” like, where jobs ran in containers and had to pre-download all the tools they needed before running. It was surprisingly robust, and worked really well.
The designers wanted to support all of our use cases in the new system, and came to us about how to mount our NFS clusters within their containers. My answer was basically: let’s not. Our way was the old way, and their way was the new way, and we shouldn’t “infect” their system with our legacy NFS baggage. If engineers wanted to use their system they should reformulate their jobs to declare their dependencies up front and use a local cache, and all the other reasonable constraints their system had. They were surprised by my answer but I think it worked out in the end: it was the impetus for things to finally move off the legacy infrastructure, and it worked out well in the end.
NFS volumes (for home dirs, SCM repos, tools, and data) were a godsend for workstations with not enough disk, and when not everyone had a dedicated workstation (e.g., university), and for diskless workstations (which we used to call something rude, and now call "thin clients"), and for (an ISV) facilitating work on porting systems.
But even when when you needed a volume only very infrequently, if there was a server or network problem, then even doing an `ls -l` in the directory where the volume's mount point was would hang the kernel.
Now that we often have 1TB+ of storage locally on a laptop workstation (compare to the 100MB default of an early SPARCstation), I don't currently need NFS for anything. But NFS is still a nice tool to have in your toolbox, for some surprise use case.
> To his great credit, he trusted me after that when I said something was a bad idea based on my experience. It was an important lesson for me that even with experience, trust must be earned when you work with a new team.
True, though, on a risky moving-fast architectural decision, even with two very experienced people, it might be reasonable to get a bit more evidence.
And in that particular case, it might be that one or both of you were fairly early in your career, and couldn't just tell that they could bet on the other person's assessment.
Though there are limits to needing to re-earn trust from scratch with a new team. For example, the standard FAANG-bro interview of everyone having to start from scratch for credibility, like they are fresh out of school with zero track record, and zero better ways to assess, is ridiculous. The only thing more ridiculous is when companies that pay vastly less try to mimic that interview style. Every time I see that, I think that this company apparently doesn't have experienced engineers on staff who can get a better idea just by talking with someone, rather than fratbro hazing ritual.
> Now that we often have 1TB+ of storage locally on a laptop workstation (compare to the 100MB default of an early SPARCstation), I don't currently need NFS for anything.
While diskless (or very limited disk) workstations were one use case for NFS, that was far from the primary one.
The main use case was to have a massive shared filesystem across the team, or division, or even whole company (as we did at Sun). You wouldn't want to be duplicating these files locally no matter how much local disk, the point was to have the files be shared amongst everyone for collaboration.
NFS was truly awesome, it is sad that everything these days is subpar. We use weak substitutes like having files on shared google drives, but that is so much inferior to having the files of the entire company mounted on the local filesystem through NFS.
(Using the past tense, since it's not used so much anymore, but my home fileserver exports directories over NFS which I mount on all other machines and laptops at home, so very much using it today, personally.)
Other things that changed were the Web, and the popularity of Git.
For example, one of the big uses of NFS we had was for engineering documents, all of which could be accessed from FrameMaker or Interleaf running on your workstation. Nowaways, all the engineering documentation and more would be accessed through a Web browser from a non-NFS server, and no longer on a shared filesystem.
Another use of NFS we had was for collaborating on shared code by some projects, with SCM storing to NFS servers (other projects used DSEE and ClearCase). But nowaways almost everyone in industry uses distributed Git, syncing to non-NFS servers, with cached copies on their local storage.
I suppose a third thing that changed was CSCW distributed change syncing becoming popular at moving into other tools, such as a live "shared whiteboard" document editing that people can access in their Web browsers. I have mixed feelings about some of the implementations and how they're deployed, but it's pretty wild to have 4 remote people during Covid editing a document in real time at once, and NFS isn't helping with the hard part of that.
Right now, the use case for NFS that first comes to mind is individual humans working with huge files (e.g., for AI training, or other big data), where you want the convenience of being able to access them with any tool from your workstation, and maybe also have big compute servers working with them, without copying things around. You could sorta do these things with big complicated MLops infrastructure, but sometimes that slows you down more than it speeds you up.
Interesting. I self-host Forgejo or GitLab, with SSH or HTTPS access from workstations' local repos, to the "origin" Git server.
The advantage you find to NFS for this is that you share workspaces between the client machines? Or reduce the local storage requirements on the client machines?
Mainly so I don't need to run any source control server, it's all just files.
Same for mercurial. Most of my internal use repositories are mercurial since it's so much more pleasant to use than git and for my hobby time I want pleasant tools that don't hate me. But I digress..
It's the model I've used since the 90s in the days of teamware at Sun.
That was still in place at least when I left, and I'd be amazed if it got replaced. It was one of those wonderful pieces of infrastructure that you rarely even notice because it just quietly works the whole time.
NCSA also used it for some data archival and I believe for hosting the website files.
I looked up at one point whatever happened to AFS and it turns out that it has some Amdahl’s Law glass ceiling that ultimately limits the aggregate bandwidth to something around 1 GBps, which was fine when it was young but not fine when 100Mb Ethernet was ubiquitous and gigabit was obtainable with deep enough pockets. If adding more hardware can’t make the filesystem faster you’re dead.
I don’t know if or how openAFS has avoided these issues.
The Amdahl's Law limitations are specific to the implementation and not at all tied to the protocols. The 1990 AFS 3.0 server design was built upon a cooperative threading system ("Light Weight Processes") designed by James Gosling as part of the Andrew Project. Cooperative processing influences the design of the locking model since there isn't any simultaneous between tasks. When the AFS fileserver was converted to pthreads for AFS 3.5, the global state of each library was protected by wrapping it with a global mutex. Each mutex was acquired when entering the library and dropped when exiting it. To complete any fileserver RPC required acquisition of at least six or seven global mutexes depending upon the type of vnode being be accessed. In practice, the global mutexes restricted the fileserver process to 1.7 cores regardless of how many cores were present in the system.
AuriStor's RX and UBIK protocol and implementation improvements would be worthless if the application services couldn't scale. To accomplish this required converting each subsystem so it could operate with minimal lock contention.
This 2023 presentation by Simon Wilkinson describes the improvements that were made to AuriStor's RX implementation up to that point.
> In practice, the global mutexes restricted the fileserver process to 1.7 cores regardless of how many cores were present in the system.
So in theory the bandwidth could scale with single CPU and/or point to point bandwidth but cannot scale horizontally at all. Except on the new implementations.
Correct, and the point-to-point bandwidth is limited by the maximum RX window size because of the bandwidth delay product. As round-trip latency increases, at some point the window size becomes insufficient to keep the pipe full, at which point data transfers stall.
One site which recently lifted and shifted their AFS cell to a cloud made the following observations:
We observed the following performance while copying a 1g file from local disk into AFS.
AuriStor Client (2021.05-65) -> OpenAFS server (1.6.24): 3m.11s
AuriStor Client (2021.05-65) -> AuriStor Server (2021.05-65): 1m
AuriStor Client (2025.00.11) -> AuriStor Server (2025.00.11): 30s
All of the above tests were performed from clients located on campus to fileservers located in in the cloud.
There are many RX implementation differences between the three versions. It is important to note that the window size grows from 32 -> 128 -> 512.
I may be confusing two systems but I believe that AFS system was also encompassed the first iteration of “AWS Glacier” I encountered in the wild. A big storage that required queuing a job to a tape array or pinging an undergrad to manually load something for retrieval.
AFS implements weak consistency, which may be a bit surprising. It also seems to share objects, not block devices. Judging by its features, it seems to make most sense when there is a cluster of servers. It looks cool though, a bit more like S3 than like NFS.
The cephfs model of a file system logically constructed from an object store closely mirrors the AFS architecture. The AFS fileserver is horribly misnamed. Whereas AFS 1.0 fileserver exported the contents of local filesystems much as NFS and CIFS do, AFS 2.x/3.x/OpenAFS/AuriStorFS fileservers export objects (aka vnodes) which are stored in an object store. Each AFS vice partition stored zero or more object stores each consisting of the objects belonging to a single volume group. A volume group consists of one or more of the RWVOL, ROVOL and/or BACKVOL instances.
The AFS consistency model is fairly strong. Each client (aka cache manager) is only permitted to access the data/metadata of a vnode if it has been issued a callback promise from the AFS fileserver. File lock transitions, metdata modifications, and data modifications as well as volume transactions cause the fileserver to break the promise. At which point the client is required to fetch updated status information before it can decide it is safe to reuse the locally cached data.
Unlike optimistic locking models, the AFS model permits cached data to be validated after an extended period of time by requesting up to date metadata and a new callback promise.
An AFS fileserver will not permit a client to perform a state changing operation as long as there exist broken callback promises which have yet to be successfully delivered to the client.
Not everyone ignored it but unlike nfs it didn't come in the box with the operating system, and you had to pay for it. In addition, AFS provided strong cryptographic authentication and wire privacy which meant that it couldn't be licensed in many countries because the U.S. government did not grant appropriate export licenses.
I often wonder how the world would be different if AFS 3.0 could have been freely distributed world wide in 1989 precluding the need for HTTP to be developed at CERN.
There were a few technical obstacles which other people mentioned, but I think timing was biggest issue (remember--AFS dates to something like 1983-ish).
1) AFS, IIRC, required more than one machine in its original configuration. That meant hardware and sysadmins which were expensive--until, suddenly they weren't.
2) Disk, memory and bandwidth were scarce--and then they weren't. AFS made a bunch of solid architectural decisions and then wasted a bunch of time backing some of them down in deference to the hardware of the day and then all that work was wasted when Moore's Law overran everything, anyhow.
3) Everybody was super happy to be running everything locally to escape the tyranny of the "Mainframe Operator" (meaning no NFS or AFS or the like)--until they weren't. Once enough non-technical people appeared who didn't want to do system administration, like, ever, that flipped.
We lost the VMS filesystem in this timeframe, too. Which was also a distributed, remote filesystem.
Don't know about FreeBSD but hard hanging on a mounted filesystem is configurable (if it's essential configure it that way, otherwise don't). To this day I see plenty of code written that hangs forever if a remote resource is unavailable.
> Don't know about FreeBSD but hard hanging on a mounted filesystem is configurable (if it's essential configure it that way, otherwise don't).
In theory that should work, but I find that kind of non-default config option tends to be undertested and unreliable. Easier to just switch to Samba where not hanging is default/expected.
It's down to the mount options, use 'soft' and the program trying to access the (inaccessible) server gets an error return after a while, or 'intr' if you want to be able to kill the hung process.
The caveat is a lot of software is written to assume things like fread(), fopen() etc will either quickly fail or work. However, if the file is over a network obviously things can go wrong so the common default behaviour is to wait for the server to come back online. Same issue applies to any other network filesystem, different OS's (and even the same OS with different configs) handle the situation differently.
'After a while' usually requiring the users to wait with an unresponsive desktop environment, because they opened a file manager whilst NFS was huffing. So they'd manage to switch to a virtual terminal and then out of habit type 'ls', locking that up too.
After a few years of messing around with soft mounts and block sizes and all sorts of NFS config nonsense, I switched to SMB and never looked back
I heard rumors at first and later saw it once that the sparc lab at my university occasionally had to be shut down and turned on in a particular order to get the whole thing to spool back up after a server glitch. I think the problem got really nasty once you had NFS mounts from multiple places.
You probably gave bad advice. By the time Reddit existed, you could have just gotten an netapp filer. They had higher availability than most data centers back then, so “the NFS server hung” wouldn’t be anywhere near the top of your “things that cause outages or interfere with engineering” list.
These days, there are plenty of NFS vendors with similar reliability. (Even as far back as NFSv3, the protocol makes it possible for the server to scale out).
I guess I have to earn your trust too. I was actually intimately familiar with Netapp filers at the time, since that is what we used to drive the NFS mounts for the desktops at the first place I mentioned. They were not as immune as you think and were not suitable.
Also, we were a startup, and a Netapp filer was way outside the realm of possibility.
Also, that would be a great solution if you have one datacenter, but as soon as you have more than one, you still have to solve the problem of syncing between the filers.
Also, you generally don't want all of your app servers to update to new code instantly all the same time, in case there is a bug. You want to slow roll the deploy.
Also, you couldn't get a filer in AWS, once we moved there.
And before we moved to AWS the rack was too full for a filer, I would have had to get a whole extra rack.
FWIW, NetApps were generally pretty solid, and they should have no problem keeping in sync across datacenters. You pay handsomely for the privilege though.
Failover, latency, and so on are something you need to think about independently of what transfer protocol you use. NFS may present its own challenges with all the different extensions and flags, but that's true of any mature technology.
That said, live code updates probably aren't a very good idea anyway, for exactly the reasons you mention. Those are the reasons you were right at the time, not any inherent deficiencies on the NFS protocol.
100% this. Sometimes it's not even the filer itself. `hard` NFS mounts on clients in combination with network issues have led to downtimes where I work. Soft mounts can be a solution for read only workloads that have other means of fault tolerance in front of them, but it's not a panacea.
I haven’t seen these problems at much larger scales than are being discussed here. I’ve heard of people buying crappy nfs filers or trying to use the Linux server in prod (it doesn’t support HA!), but I’ve also heard of people losing data when they install a key value store or consensus protocol on < 3 machines.
The only counterexample involved a buggy RHEL-backported NFS client that liked to deadlock, and that couldn’t be upgraded for… reasons.
Client bugs that force a single machine/process restart can happen with any network protocol.
> You probably gave bad advice. By the time Reddit existed, you could have just gotten an netapp filer. They had higher availability than most data centers back then, so “the NFS server hung” wouldn’t be anywhere near the top of your “things that cause outages or interfere with engineering” list.
Or distributed NFS filers like Isilon or Panasas: any particular node can be rebooted and its IPs are re-distributed between still-live node. At my last job we used one for HPC and it stored >11PB with minimal hassle. OS upgrades can be done in a rolling fashion so client service is not interrupted.
Newer NFS vendors like Vast Data have all-NVME backends (Isilon can have a mix if you need both fast and archival storage: tiering can happen on (e.g.) file age).
NetApps were a game changer. Large Windows Server 2003 file servers that ran CIFS, NFS, and AFP simultaneously could take 60-90 minutes to come back online because of the resource fork enumeration scan required by AFP sharing.
I find it fascinating that the fact that NFS mounts hang the process when they don't work is due to the broken I/O model Unix historically employed.
See, unlike some other more advanced, contemporary operating systems like VMS, Unix (and early versions of POSIX) did not support async I/O; only nonblocking I/O. Furthermore, it assumed that disk-based I/O was "fast" (I/O operations could always be completed, or fail, in a reasonably brief period of time, because if the disks weren't connected and working you had much bigger problems than the failure of one process) and network-based or piped I/O was "slow" (operations could take arbitrarily long or even fail completely altogether after a long wait); so nonblocking I/O was not supported for file system access in the general case. Well, when you mount your file system over a network, you get the characteristics of "slow" I/O with the lack of nonblocking support of "fast" I/O.
A sibling comment mentions that FreeBSD has some clever workarounds for this. And of course it's largely not a concern for modern software because Linux has io_uring and even the POSIX standard library has async I/O primitives (which few seem to use) these days.
And this is one of those things that VMS (and Windows NT) got right, right from the jump, with I/O completion ports,
But issues like this, and the unfortunate proliferation of the C programming language, underscore the price we've paid as a result of the Unix developers' decision to build an OS that was easy and fun to hack, rather than one that encouraged correctness of the solutions built on top of it.
It wasn’t until relatively recently approaches like await because commonplace. Imagine all the software that wouldn’t have been written if they were forced to use async primitives before languages were ready for them.
Yes, it is to synchronous programming's great credit that it is simple, and to its great discredit that it is inefficient. Engineering tradeoffs, and all that.
Quote[0]:
> In Ingo's view, there are only two solutions to any operating system problem which are of interest: (1) the one which is easiest to program with, and (2) the one that performs the best. In the I/O space, he claims, the easiest approach is synchronous I/O calls and user-space processes. The fastest approach will be "a pure, minimal state machine" optimized for the specific task; his Tux web server is given as an example.
Granted, most software is not developed for the Linux kernel, but neither is asynchronous programming black magic. I think the software space has rather been negatively impacted by being slow to adopt asynchronous programming, among other old practices.
Imagine all the software that would've been written, or made much nicer, earlier on had Unix devs not been forced to use synchronous I/O primitives.
Synchronous I/O may be simple, but it falls down hard at the "complex things should be possible" bit. And people have been doing async I/O for decades before they got handholding constructs like 'async' and 'await'. Programming the Amiga, for instance, was done entirely around async I/O to and from the custom chips. The CPU needn't do much at all to blow away the PC at many tasks; just initiate DMA transfers to Paula, Denise, and Agnus.
I use NFS as a keystone of a pretty large multi-million data center application. I run it on a dedicated 100Gb network with 9k frames and it works fantastic. I'm pretty sure it is still use in many, many places because... it works!
I don't need to "remember NFS", NFS is a big part of my day!
On a smaller scale, I run multiple PC's in house diskless with NFS root; so easy to just create copies on the server and boot into them as needed, it's almost one image per bloated app these days (server also boots PC's into Windows using iSCSI/SCST and old DOS boxes from 386 onwards with etherboot/samba). Probably a bit biased due to doing a lot of hardware hacking where virtualisation solutions take so much more effect, but got to agree NFS (from V2 through V4) just works.
NFS is the backbone of my home network servers, including file sharing (books, movies, music), local backups, source code and development, and large volumes of data for hobby projects. I don't know what I'd do without it. Haven't found anything more suitable in 15+ years.
Same. The latest thing I did was put snes state and save files on NFS so I can resume the same game from laptop, to retropi (tv), and even on the road over wireguard.
As a unix sysadmin in the early 90s, I liked to understand as much as I could about the tech that supported the systems I supported. All my clients used NFS so I dug into the guts of RPC until I could write my own services and publish them via portmap.
Weirdly that nerd snipe landed me two different jobs! People wanted to build network-based services and that was one of the quickest ways to do it.
> There is also a site, nfsv4bat.org [...] However, be careful: the site is insecure
I just find this highly ironic considering this is NFS we are talking about.
Also, do they fear their ISPs changing the 40 year old NFS specs on the flight or what ? Why even mention this ?
We are still using it for some pretty large apps. Still have not found a good and simple alternative. I like the simplicity and performance. Scaling is a challenge though.
Unfortunately there doesn’t seem to be any decent alternative.
SMB is a nightmare to set up if your host isn’t running Windows.
sshfs is actually pretty good but it’s not exactly ubiquitous. Plus it has its own quirks and performs slower. So it really doesn’t feel like an upgrade.
Everything else I know of is either proprietary, or hard to set up. Or both.
These days everything has gone more cloud-oriented. Eg Dropbox et al. And I don’t want to sync with a cloud server just to sync between two local machines.
It's one of those tools that, unless you already know what you're doing, you can expect to sink several hours into trying to get the damn thing working correctly.
It's not the kind of thing you can throw at a junior and expect them to get working in an afternoon.
Whereas NFS and sshfs "just work". Albeit I will concede that NFSv4 was annoying to get working back when that was new too. But that's, thankfully, a distant memory.
Weird. I’ve done both at scale many times and NFS daemons have always been significantly less problematic (bar the brief period when NFSv4 was new, but I just fell back to NFSv3 for a brief period).
Samba can be set up easily enough if you know what you’re doing. But getting the AD controller part working would often throw up annoying edge case problems. Problems that I never had to deal with in NFS.
Though I will admit that NIS/YP could be a pain if you needed it to sync with NT.
> Weird. I’ve done both at scale many times and NFS daemons have always been significantly less problematic (bar the brief period when NFSv4 was new, but I just fell back to NFSv3 for a brief period).
Might just be bad timing then, most of my experience with it was in that v3/v4 transition period. It was bad enough to make me swear off the whole thing.
Anyway, we used it extensively in the UIUC engineering workstation labs hundreds of computers, 20+ years ago, and it worked excellently. I set up a server farm 20 years ago of Sun sparcs but used NFS for such.
DCE DFS (developed at Transarc) was originally supposed to be AFS 4.0 before it was contributed to DCE. After the contribution it became backward incompatible with AFS 3.x. The RPC layer, the authentication protocol, the protection service (user/group management) were all replaced to leverage technology contributions from other DCE participants.
IMO IBM/Transarc died for two reasons. First, there was significant brand confusion after the release of Windows Active Directory and Windows DFS since no trademarks were obtained for DCE service names. Second, the file system couldn't be deployed without the rest of the DCE infrastructure.
There was an unofficial effort within IBM to create the Advanced Distributed File System (ADFS) which would have decoupled DFS from the DCE Cell Directory Service and Security Service as well as replaced DCE/RPC. However, the project never saw the light of day.
I used to administer AFS/DFS and braved the forest of platform ifdefs to port it to different unix flavors.
plusses were security (kerberos), better administrative controls and global file space.
minuses were generally poor performance, middling small file support and awful large file support. substantial administrative overhead. the wide-area performance was so bad the global namespace thing wasn't really useful.
I guess it didn't cause as many actual multi-hour outages NFS, but we used it primarily for home/working directories and left the servers alone, whereas the accepted practice at the time was to use NFS for roots and to cross mount everything so that it easily got into a 'help I've fallen and can't get up' situation.
SMB is not that terrible to set up (has its quirks definitely), but apple devices don't interoperate well in my experience.
SMB from my samba server performs very well from linux and windows clients alike, but the performance from mac is terrible.
NFS support was lacking on windows when I last tried. I used NFS (v3) a lot in the past, but unless in a highly static high trust environment, it was worse to use than SMB (for me). Especially the user-id mapping story is something I'm not sure is solved properly. That was a PITA in the homelab scale, having to set up NIS was really something I didn't like, a road warrior setup didn't work well for me, I quickly abandoned it.
SMBv1 has a reputation for being an extremely chatty protocol. Novell ipx/spx easily dwarfed it in the early days. Microsoft now disables it by default, but some utilities (curl) do not support more advanced versions.
SMBv2 increases efficiency by bundling multiple messages into a single transmission. It is clear text only.
SMBv3 supports optional encryption.
Apple dropped the Samba project from MacOS due to gplv3, and developed their own SMB implementation that is not used elsewhere AFAIK. If you don't care for Apple's implementation, then perhaps installing Samba is a better option.
NFSv3 relies solely on uid/gid mapping by default, while NFSv4 requires idmapd to run to avoid squashing. I sometimes use both at the same time.
I'd use the Finder to browse files, and for that it is terribly slow. Also without extra config in SAMBA it does litter the whole disk with DS_Store crap. I remember it was very slow that way, but have set up extra config in SAMBA (pear extension I think). Its extreme slowness may also be related to the workarounds to avoid that crap being not fully correctly configured. Also copying is comically slow, to get 4 files totaling 50kbytes can take 20 seconds sometimes. Same from a windows laptop takes sub second time.
Overall I'm underwhelmed by MacOS/iOS, this being one minor annoyance in the list. Windows and Linux both perform well and work well out of the box with my proven simple setup. (No AD)
Samba in MacOS might speed up things, but I bought that machine to get stuff cone more effectively than from Linux, and so far it didn't prove its value. Right now I'll not bother with that, as I feel that would have even worse OS level integration.
Thanks for the advice nevertheless, much appreciated.
I mean the decent alternative is object storage if you can tolerate not getting a filesystem. You can get an S3 client running anywhere with little trouble. There are lots of really good S3 compatible servers you can self-host. And you don't get the issue of your system locking up because of an unresponsive server.
I've always thought that NFS makes you choose between two bad alternatives with "stop the world and wait" or "fail in a way that apps are not prepared for."
If you don't need a filesystem, then your options are numerous. The problem is sometimes you do need exactly that.
I do agree that object storage is a nice option. I wonder if a FUSE-like object storage wrapper would work well here. I've seen mixed results for S3 but for local instances, it might be a different story.
They do, but POSIX file system APIs don’t map to S3 APIs well. So you run the risk of heavily increasing your S3 APIs costs for any Stat() heavy workflows.
This is why I say there’s mixed opinions about mounting S3 via FUSE.
This isn’t an issue with a self hosted S3 compatible storage server. But you then have potential issues using an AWS tool for non-AWS infra. There be dragons there.
And if you where to use a 3rd party S3 mounting tool, then you run into all the other read and write performance issues that they had (and why Amazon ended up writing their own tool for S3).
So it’s really not a trivial exercise to selfhost a mountable block storage server. And for something as important as data consistency, you might well be concerned enough about weird edge cases that mature technologies like SMB and NFS just feel safer.
True. But for example a home server I absolutely love the simplicity. I have 6 Lenovo 720q machines, one of them as a data storage just running simple NFS for quick daily backups before it pushes them to a NAS.
9P? Significantly simpler, at the protocol level, than NFS (to the point where you can implement a client/server in your language of choice in one afternoon).
Well, kind of hard to say anything exhaustive in a quick comment, but roughly advantages:
- POSIX compliant, including dotting the i's. As opposed to, say, NFS which isn't cache coherent.
- performance and scalability. 1 TB/s+ sequential IO to a single file is what you'd expect on a large HPC system these days.
- Metadata performance has gotten a lot better over the past decade or so, beating most(all?) other parallel filesystems.
Downsides:
- Lots of pieces in a Lustre cluster (typically nodes are paired in sort-of active/active HA configs). And lots of cables, switches etc. So a fairly decent chance something breaks every now and then.
- When something breaks, Lustre is weird and different compared to many other filesystems. Tools are rudimentary and different.
My introduction to NFS was first at Berkeley, and then at Sun. It more or less just worked. (Some of the early file servers at Berkeley were drastically overcapacity with all the diskless Sun-3/50s connected, but still.)
And of course I still use it every day with Amazon EFS; Happy Birthday, indeed!
NFS v4.2. Easy to set up if you don't need authentication. Very good throughput, at least so long as your network gear isn't the bottleneck. I think it's the best choice if your clients are Linux or similar. The only bummer for me is that mounting NFS shares from Android file managers seems to be difficult or impossible (let alone NFSv4).
Yes, that's what at least the `nfs-server` service on Fedora does by default. And VLC also supports v3 on Android… maybe they use the same implementation as Kodi behind the scenes? It's weird the v4 support is so spotty still, even though it has been around for two decades. Even NFS v4.2 is almost ten years old at this point.
SMB is great for LAN, but its performance over internet is poor. It remains SFTP and WebDAV in that case. SFTP would be my choice, if there is client support.
I just use sshfs for most things today. It's by far the simplest to set up (just run sshd), has good authentication and encryption (works over the internet), and when I measured performance vs. NFS and Samba some years ago it seemed roughly identical (this is mostly for large files; it's probably slower for lots of small files – measure your own use case if performance is important). I don't know about file locking and that type of thing – it perhaps does poorly there(?) It's not something I care about.
> What are most people using today for file serving?
Google Drive. Or Dropbox, OneDrive, yada yada. I mean, sure, that's not the question you were asking. But for casual per-user storage and sharing of "file" data in the sense we've understood it since the 1980's, cloud services have killed local storage cold dead. It's been buried for years, except in weird enclaves like HN.
The other sense of "casual filesystem mounting" even within our enclave is covered well already by fuse/sshfs at the top level, or 9P for more deeply integrated things like mounting stuff into a VM.
As nice as WebDAV would've been it's probably a non-starter in many scenarios due to weird limits, like Windows has a default size-limit of 50mb.
I'm tinkering on a project where I'd like to project a filesystem from code and added web-dav support, the 50mb limit will be fine since it's a corner-case for files to be bigger but it did put a dent into my enthusiasm since I had envisioned using it in more places.
I have really mixed feelings about things like NFS, remote desktop, etc. The idea of having everything remote to save resources (or for other reasons) does sound really appealing in theory, and, when it works, is truly great. However in practice it's really hard to make these things be worth it, because of latency. E.g. for network block storage and for NFS the performance is usually abysmal compared to even a relatively cheap modern SSD in terms of latency, and many applications now expect a low latency file system, and perform really poorly otherwise.
Fairly obviously a 1Gbps network is not going to compete with 5Gbps SATA or 20Gbps NVME. Having said that, for real performance we load stuff over the network into local RAM and then generally run from that (faster than all other options). On the internal network the server also has a large RAM disk shared over NFS/SMB, and the performance PC's have plenty of RAM - so really it's a tradeoff, and the optimum is going to depend on how the system is used.
- It caused me to switch from Linux to FreeBSD in 1994 when Linux didn't have NFS caching but FreeBSD did & Linus told me "nobody cares about NFS" at the Boston USENIX. I was a sysadmin for a small stats department, and they made heavy use of NFS mounted latex fonts. Xdvi would render a page in 1s on FreeBSD and over a minute on Linux due to the difference in caching. Xdvi would seek byte-by-byte in the file.. You could see each character as it rendered on Linux, and the page would just open instantly on FreeBSD.
ZFS is amazing, used it since around Solaris 10, and yes, loved it for it's NFS capability, had many terabytes at the time on it, back when a terabyte meant a rack of drives! Now those same systems host petabytes, all upgraded in place. Solaris was pretty amazing too.
I’ve been using NFS in various environments since my first introduction to it in my university’s Solaris and Linux labs. I’ve run it at home, on and off, since 2005.
I’ve recently started using it again after consistent issues with SMB on Apple devices, and the deprecation of AFP. My FreeBSD server, running on a Raspberry Pi, makes terabytes of content available to the web via an NFS connection to a Synology NAS.
For my use case, with a small number of users, the fact that NFS is host based rather than user based, means I can set it up one on each device, and all users of that host can access the shares. And I’ve generally found it to be more consistently performant on Apple hardware than their in-house SMB implementation.
I looked into this a while ago and was surprised to find that no file explorer on Android seems to support it[1]. However, I did notice that VLC for Android does support it, though unfortunately only NFSv3. I was at least able to watch some videos from the share with it, but it would be nice to have general access to the share on Android.
[1] Of course, I didn’t test every single app — there’s a bucketload of them on Google Play and elsewhere…
Interesting, the readme for that library says that NFSv4 is supported. So that likely means that VLC is doing something wrong on their side, because only NFSv3 works?
Been a while, but if you root your phone and have access to the kernel source in order to build the NFS modules, would you be able to mount NFS shares then?
I'm considering NFS with RDMA for a handful of CFD workstations + one file server with 25Gbe network. Anyone know if this will perform well? Will be using XFS with some NVME disks as the base FS on the file server.
Quite some time ago I implemented NFS for a small HPC-cluster on a 40GBe network. A colleague set up RDMA later, since at start it didn't work with the Ubuntu kernel available. Full nVME on the file server too. While the raw performance using ZFS was kind of underwhelming (mdadm+XFS about 2x faster), network performance was fine I'd argue: serial transfers easily hit ~4GB/s on a single node and 4K-benchmarking with fio was comparable to a good SATA-SSD (IOPS + throughput) on multiple clients in parallel!
Yes, you might want to tune your NFS parameters, stick to NFSv4.2, consider if caching is appropriate for your workloads and at what level, and how much of your NFS + networking you can keep in kernel space if you decide to further upgrade your network's throughput or really expand it.
Also consider what your server and client machines will be running, some NFS clients suck. Linux on both ends works really well.
I still love NFS. It's a cornerstone to how I end up thinking about many problems. In my house it provides a simple NAS mount. In certain development environments, I use sshmount because of it.
But I really loved the lesser known RFS. Yes it wasn't as robust, or as elegent.. but there's nothing quite like mounting someone else's sound card and blaring music out of it, in order to drive a prank. Sigh...
Wait. I still export NFS mounts from my TrueNAS server and make them available to all other machines on my LAN (music, books, documents, photos, etc). The article and comments here give me the feeling that NFS is outadated and shouldn't be used anymore. Am I doing things wrong?
It was amusing to read the comment about the flat UID/GID namespace being a problem, identified as far back as 1984. This is something that DCE addressed by using a larger namespace (UUIDs), and Windows finally got right using a hierarchical one (SIDs).
I think NFS is the most sane storage provider for self hosted Kubernetes. Anything else seems over engineered for a home lab and is normally not a very good experience.
What I don't like is the security model. It's either deploying kerberos infrastructure or "trust me bro I'm UID 1000" so I default to SMB on the file server.
Isilon has their own filesystem that stores the data across multi-node clusters, you then export that out over NFS/SMB/S3, and the nodes load balance the I/O across the cluster.
What I learned though was that NFS was great until it wasn't. If the server hung, all work stopped.
When I got to reddit, solving code distribution was one of the first tasks I had to take care of. Steve wanted to use NFS to distribute the app code. He wanted to have all the app servers mount an NFS mount, and then just update the code there and have them all automatically pick up the changes.
This sounded great in theory, but I told him about all the gotchas. He didn't believe me, so I pulled up a bunch of papers and blog posts, and actually set up a small cluster to show him what happens when the server goes offline, and how none of the app servers could keep running as soon as they had to get anything off disk.
To his great credit, he trusted me after that when I said something was a bad idea based on my experience. It was an important lesson for me that even with experience, trust must be earned when you work with a new team.
I set up a system where app servers would pull fresh code on boot and we could also remotely trigger a pull or just push to them, and that system was reddit's deployment tool for about a decade (and it was written in Perl!)