First, the 3-character file extension predates CP/M. It goes at least as far back as various DEC operating systems like RT-11 and RSTS/E; probably there are antecedents further back in time. I've also seen old filesystems with 2-character extensions, but 3 has been common for quite some time.
I think that the original selection of 3 character extensions was made because it felt like a comfortable number of characters to make something meaningful while not being too burdensome to type. The fact that we still tend towards <= 3 character extensions is probably because this is still true, not because of a conscious attempt to maintain DOS compatibility.
I've only ever seen Windows-based systems adhere strictly to the 3-character extension. All 'Nix's I've worked with (Linux, BSD's, etc) usually have native files on the system with extensions longer than 3 characters. (.conf anyone?)
Also, probably should note, file extensions are largely a Windows-based OS thing as well. On the 'Nix's, an extension isn't really needed (OK, it's not needed on Windows-based OS's either, but the OS prefers it). the "file" command will tell you all you need to know about a file without an extension. Or, you can just open it with a text editor. Or, execute it if it has execute permissions, etc.
So, it seems, the 3-character extension (or extensions at all) are really a Windows-Based OS thing.
(While we're at it, Windows-Based OS's are the only OS's I've worked with that actually require something in-front of the "."! For example, try to natively create a ".somefile" on Windows -- it will complain and not let you).
Windows has allowed extensions longer than 3 characters for about 20 years now. I'm not sure what you mean by "natively create" but it also definitely doesn't require anything before the "." -- e.g. I have files called ".gitignore" etc. just as on any other OS.
That is the most annoying error ever. No, I must not! I shall decide for myself what I must and mustn't do! /me opens cmd.exe and uses ren untitled.txt .gitignore
A slightly bigger challenge is a file named CON :)
Yes, UNIX's never had a concept of a limited length file extension. Until the 1980s filenames were limited to 14 characters in total, but you could put a dot anywhere you like in it.
I was specifically referring to the part of the article that the author claimed that windows inherited it from MS-DOS, which inherited it from CP/M. The chain goes back farther than that -- by the time CP/M was written the 3-character file extension was well-established. In general CP/M was influenced a lot by the DEC minicomputer OSes (its "PIP" command is one obvious example)
At the filesystem layer? No, never. However even in the earliest of days file extensions were common. They followed the usual UNIX preference for brevity: ".c", ".h", ".o", and ".a" were all early arrivals. So even though they didn't cram extensions into the filesystem, they weren't allergic to it as a convention.
In the 80s you started seeing things like "resolv.conf" appear, so they didn't stay mostly-single-character for long.
In Windows Explorer it will complain if you try and create a file starting with a dot, but I suspect that it's because extensions are hidden by default, so you'd end up creating a file with no name.
Can easily create such files by other means (saving in notepad, sublime, etc).
There's a trick. You can for example name a file .gitignore. (with a starting and trailing dot), it will warn you about changing the extension which you agree to, and you're left with a file named .gitignore
I recall DEC's TOPS10 operating system for the PDP-10 using three character extensions. It was introduced in 1970. I think the older PDP6 and ITS systems at MIT-AI had a different scheme for extensions. On ITS I recall that a file had a name and a version number and you could get the most recent version of the file named FOO by referring to it as "FOO >". Every time you edited FOO, you created a new version whose extension was incremented by one. Periodically you issued a command that deleted all of the old versions.
IIRC I first saw the three-letter extension scheme when I started using TOPS10 in 1974.
Exactly, and nowadays we have fast enough storage that reading the first few bytes of a file isn't a burden either. Might as well do that instead of appending bytes to each filename. Magic mime is more reliable, versatile and arguably safer (.jpg.exe attachments..?).
I wonder if it's really safer. You'd probably find that you'd have different magic databases and a carefully constructed (malicious) document might give different answers. Imagine that your email client decided a document was a PDF, but your file manager thought it was a loadable extension. You're trading one class of vulnerability for another.
I'd also argue that the endurance of the file extension is a hint that people may actually like them. Extensions were invented as an affordance to the computer (which it no longer needs) but the information they convey is useful to humans as well. It may be a quirky old convention, but it provides a universally-understood language for describing a file type.
Anyway, I think the overall discussion on this submission is a good example of why 3-character extensions survive: some people here argue for longer extensions, others say we should have no extensions, and the silent majority is happy to keep splitting the difference.
If I'm bringing up a directory listing, and the GUI tool I'm using is showing me different icons for different file types (or some other type-specific visual distinction), then I don't want it to have to read the first few bytes of every file in the directory. That doesn't scale. The file type needs to be part of the directory listing itself. Traditionally on Windows it's been part of the filename (which typically is only part of the directory listing, and not attached to the file) but it could have been stored as an attribute as well. That would have made more sense, but even NTFS was designed at a time when disk space actually cost something, and a filetype attribute probably would have wound up being a GUID registry key, which is a lot of bytes for every file in a big directory.
Fair point, and as you mention it could be optimized by using an attribute for magic mime cache. Another optimization is showing only the icons for the files currently in view, which should scale pretty much forever (I can't imagine so many files being in view that this becomes impractical without the view itself being useless). File managers already do this for thumbnails, though that's because they need to read the entire (often megabytes large) image.
Even if MS-DOS wasn't used anymore, the various MS-DOS file systems are still used; a lot of media comes from the vendor pre-formated in FAT16. Phones, cameras, USB hard disks, etc are all formatted in FAT by default. Even the file system on Spirit and Opportunity, the Martian rovers, is FAT16! ( See page 6 of https://www.cs.princeton.edu/courses/archive/fall09/cos109/m... )
By default MacOSX doesn't read Linux file systems (EFS, ReiserFS, etc); by default Linux doesn't read MacOSX file systems (HFS, HFS+). On the other hand, by default, most system read FAT16 or sometimes FAT32 file systems.
Therefore if you want to use those amovible media to exchange data accross file systems, you better stick to those MS-DOS file systems. Granted, you could use MS-Windows extensions, but again, nothing guarantees that the system where you will plug your USB key will have a system understanding those extensions. In all probability, it will have FAT16 or FAT32 support, ie. 8.3 file names.
It's been a very long time since I saw something that doesn't support FAT32, and probably even longer than that since I saw something that didn't support LFN. Windows '95 was 19 years ago. People have had plenty of time to switch, and it seems that they all have. Which is hardly surprising, because 8.3 file names were annoying, and everybody hated them.
(Any stragglers can simply be ignored; people still stuck on DOS are even less important in the grand scheme of things than people still using the Amiga...)
I don't really see an argument in favor of longer extensions except in rare cases (such as with html where you only need one extra letter). `mp4` tells me just as much about a file as `mpeg4` does and is equally easy to remember, plus it saves a few keystrokes.
The same logic applies to most programming languages. I have no problem remembering `.js` is javascript or `.rs` is rust. And I think the vast majority of programmers would agree with me.
Further, filenames don't play a large role in most (nontechnical) users lives. The tools they use (e.g. Word or Excel) append the filename for them, so they don't worry about it at all. The only people who routinely write filenames by hand are coders and they're familiar enough with the extensions that shortcuts work fine.
mp4 actually tells you more about a file than mpeg4 does, because mp4 is only one thing, the file. mpeg4 is another thing, but that thing is not a file. So, calling files mpeg4 actually confuses things rather than clarifies them. If you work with these concepts, it's nice to be able to say to your coworker, "hey, send me the mp4". if you say "send me the mpeg4", you'll get back "how do you want me to send it?"
This works for many similar concepts. The idea of not overloading words that already have meanings with other meanings was one of the sensible underpinnings of the Hungarian Naming Convention. Let's not rehash the Hungarian debate, you don't need to like Hungarian to realize the sensibleness of this particular concept.
But seriously, it's a silly limitation from a bygone era. Why should we adhere to it?
Why should we have to remember these things? Some of them are easy to remember, but even less technical users change their workflows from time to time.
Sketch uses ".sketch" as its extension. Should they have gone with something ".skc" as their extension, just because that's *the way things are done?", even though the application only runs on the latest version of an operating system without that primitive limitation?
Even with Markdown, most people use ".md", but Gruber has expressed ".markdown" as his preference because the limitation doesn't really exist.
.m is just as much a Matlab file, as a module. Speaking of Matlab, .mat extension is reserved by Access somehow deep in Windows guts and up until this day I don't know how to get rid of it.
You should either use TAB-completion or click on the file. There should be no reason to type the extension. ( Though in practice I sort of see your point. )
Heh, no respect for history. Short answer, while UNIX had its 'magic' number which looked at the first 32 bits (four bytes) of a file to figure out what kind it was (hash bang for the win) and Apple had its own awesomesauce of a 'forked' file system where the file and what it was were in two completely separate but extensible parts of the file system, CP/M (which had emulated DEC) and then DOS which had emulated CP/M used a 3 character (which is only 24 bits but still plenty) to encode the type of file in the name since it couldn't necessarily live in the file itself and there was no 'attribute fork' to stick it in.
Interestingly this is a problem that lives on, people still want to identify what a file is and that requires some sort of identifier. Embedding it in the name is just as good as anything, but the important bit is that you stay consistent. If you're web documents are .htm, .html, .www, .web, etc your configuration gets unwieldy.
I really liked the Mac "forked" file system. File types and bundles were simple and worked really well. They also facilitated "drag and drop" application installs, where the Finder did most of the work keeping track of where applications lived.
The app install process on Windows makes me weep (I think that Visual Studio installs well over 10,000 keys in the registry. What the hell? This number should be zero in a well designed system . . . but don't get me started on the disaster that is COM).
Incidentally, Windows NT has file forks. They're not often used, and lots of utilities don't know about them, but they can be quite useful.
If you were going to go the whole hog, slap the entire MIME type name in the extension :) Otherwise I don't think 3-4 character file extensions have really let anyone down in the history of computing.
> Perhaps the only, quite unusual, exception is the .html file name extension for HTML files
Sounds like he has been missing out on Microsoft Office since at least 2007, though I must say I can only applaud that fact.
More seriously, the article might as well have ranted about files having any extension at all. Magic mime is much more reliable and versatile, not to mention all the viruses using .jpg.exe extensions.
Heh, that too. I keep saying "first thing I do when" because there's so much to do when installing Windows even if it's not for myself (nowadays Windows installs are, luckily, never for myself), but it really is one of the first things.
The "hide known file extensions" default option should have been around less than a month before someone realize, "this is very insecure". The fact that it's been around for over a decade is just lame.
> For the same reason you want shorter domain names
Good one, hadn't thought of the analogy to .commercial, .network and .organization yet (.com, .net and .org, incase anyone doesn't get it). It's 2014, let's use the whole name already!
If it ain't broke, don't fix it. (The reasons for why it is "broken" to limit to 3-char in the article aren't good enough to warrant a paradigm shift for a convention almost as old as computing itself).
One of my biggest disappointments with Gnumeric is the file extension: tab completion only works sometimes, and I'd really like to change
find ~ -name \.gnumeric to
find ~ -name \.gnm or
find ~ -name \*.gsc
Also it takes up more space when listing files in a terminal.
Also, the 3 letter convention is a somewhat helpful limitation in the same way that countries defining which side of the road to use is actually helpful. For instance, it is nice to have all of my JPEGs ending in .jpg rather than a mix of .jpg, .jpeg, .JPG .JPEG.
The "htm" extension was a lot more common back when some consumer systems still had the three character limit. The web server limitations don't matter that much on something like a web hotel where users typically FTP their files over from whatever OS they are using themselves.
The only reason to make a file extension longer is to avoid collisions.
NTFS has a file length limit fo 255 characters. And if the company you work for does things like this:
\\SERVER1\Data\Region\SubRegion\SubSubRegion\Reports\Weekly\2014-25\WidgetReport\SubWidgetReports\..... you can eat up space pretty quickly. You might need those extra characters.
Just a little nit that I feel compelled to pick: The path length limitation is part of Windows, not NTFS. NTFS has a 255-character limit on individual path components, but it allows for much longer paths than that, just like all modern filesystems do. It's the Windows API that sets MAX_PATH to 260.
Feel free to nitpick. I remember writing to an NTFS parition from Linux. I finally managed to delete the unresponsive directories in Powershell except for one straggler which also had special characters.
Before everyone wised up / bad publicity forced them to, it wasn't uncommon for things like FTP servers to be able to create filenames which were either very long or contained reserved DOS names (e.g. CON, AUX, LPT1, etc.) and were thus undeletable using the Win32 APIs used by Explorer or cmd.exe.
The posix and NT-native APIs didn't have this limit so Windows admins either had to find a ported POSIX utility (rm.exe was popular) or learn how to use UNC paths:
Characteristically, Microsoft doesn't seem to be interested in fixing this so everyone who builds a new file sharing service probably has to add layers of script-kiddy protection for backwards compatibility.
.html isn't quite served exactly the same way as .htm on a Windows OS. Although both .html and .htm will open up a page in your default browser, the latter will attempt to locate predefined folders that presumably contains images/scripts for your .htm file.
Three letter extensions can also be important for compatibility with poorly-written software. I work with an "enterprise" (hah) document management system which refuses to believe that .docx files are a thing.
I think that the original selection of 3 character extensions was made because it felt like a comfortable number of characters to make something meaningful while not being too burdensome to type. The fact that we still tend towards <= 3 character extensions is probably because this is still true, not because of a conscious attempt to maintain DOS compatibility.