NCSA also used it for some data archival and I believe for hosting the website f...

jaltman · 2025-10-07T02:01:32 1759802492

The Amdahl's Law limitations are specific to the implementation and not at all tied to the protocols. The 1990 AFS 3.0 server design was built upon a cooperative threading system ("Light Weight Processes") designed by James Gosling as part of the Andrew Project. Cooperative processing influences the design of the locking model since there isn't any simultaneous between tasks. When the AFS fileserver was converted to pthreads for AFS 3.5, the global state of each library was protected by wrapping it with a global mutex. Each mutex was acquired when entering the library and dropped when exiting it. To complete any fileserver RPC required acquisition of at least six or seven global mutexes depending upon the type of vnode being be accessed. In practice, the global mutexes restricted the fileserver process to 1.7 cores regardless of how many cores were present in the system.

AuriStor's RX and UBIK protocol and implementation improvements would be worthless if the application services couldn't scale. To accomplish this required converting each subsystem so it could operate with minimal lock contention.

This 2023 presentation by Simon Wilkinson describes the improvements that were made to AuriStor's RX implementation up to that point.

https://www.auristor.com/downloads/auristor-rx-hare-and-the-...

The RX tortoise is catching up with the TCP hare.

  Connecting to [10.0.2.15]:2345
  RECV: threads   1, times        1, bytes        2000000000:          881 msec   [18.15 Gbit/s]

hinkley · 2025-10-07T21:40:27 1759873227

Wow that’s a lot of info.

> In practice, the global mutexes restricted the fileserver process to 1.7 cores regardless of how many cores were present in the system.

So in theory the bandwidth could scale with single CPU and/or point to point bandwidth but cannot scale horizontally at all. Except on the new implementations.

jaltman · 2025-10-08T04:21:51 1759897311

Correct, and the point-to-point bandwidth is limited by the maximum RX window size because of the bandwidth delay product. As round-trip latency increases, at some point the window size becomes insufficient to keep the pipe full, at which point data transfers stall.

One site which recently lifted and shifted their AFS cell to a cloud made the following observations:

We observed the following performance while copying a 1g file from local disk into AFS.

  AuriStor Client (2021.05-65) -> OpenAFS server (1.6.24): 3m.11s

  AuriStor Client (2021.05-65) -> AuriStor Server (2021.05-65): 1m

  AuriStor Client (2025.00.11) -> AuriStor Server (2025.00.11): 30s

All of the above tests were performed from clients located on campus to fileservers located in in the cloud.

There are many RX implementation differences between the three versions. It is important to note that the window size grows from 32 -> 128 -> 512.

dotwaffle · 2025-10-06T13:36:54 1759757814

I know quite a few AFS systems that moved to AuriStor's YFS: https://www.auristor.com/openafs/migrate-to-auristor/auristo...

As I understand it, it mitigated many of those issues, but is still very "90s" in operation.

I've been flirting with the idea of writing a replacement for years, about time I had a go at it!

hinkley · 2025-10-06T23:48:10 1759794490

I may be confusing two systems but I believe that AFS system was also encompassed the first iteration of “AWS Glacier” I encountered in the wild. A big storage that required queuing a job to a tape array or pinging an undergrad to manually load something for retrieval.