Replacing the File System

A colleague asked me if Microsoft’s initiative to replace their aging file system with database technology was anything like what Be did in BeOS. The answer is no, and the reason is because it’s not a very good idea.

A recent article at CNet, about Microsoft’s futuristic concept of replacing the traditional file system with something more like a database, prompted a co-worker to ask me if this was along the same lines as what Be had done (in BFS, the advanced file system that was part of BeOS).

My answer was no. BeOS provided the 20% of the functionality that gave you 80% of what real people actually want to do.

The file system was fairly Unix-like (i.e., no resource forks or anything like that), with the ability to attach arbitrary attributes to files. Some attributes were system-maintained, others could be maintained by the application software that used the files. The attributes could be simple, like numbers or dates or strings, or complex binary data. There was no practical limit on the size or number of attributes, just as there was no practical limit on the size or number of files (64-bit file system).

Simple data attributes could be indexed, and searched upon by a basic but complete query language. The operating system provided a search panel that let you do some interesting things, like search for all e-mail you received in the “last week” or “today”, or all graphics files of any type that were edited on a particular day, etc. Applications could make use of the query engine, and provide their own customized functionality for, say, searching through your collection of contacts or appointments.

Be experimented for a while with “going all the way” and actually using a real database for the file system. And in our experiments we found that there was tremendous value to the traditional file system, that it does a lot of things very, very well. People discount the value because they take it for granted, and the database thing sure sounds sexier.

And one day we will store stuff in some giant, fully indexed and searchable system, and it’ll look like a database in some ways, and give lots of new power. But it’s so complex to do that right, and well, and completely that it’ll take years of pretty sucky solutions before it’s actually useful in real life. So I hope Microsoft commits to this, because it’ll make Longhorn (the next version of Windows) ship that much later than planned.

Here’s another way to think about it. Take the Internet today and consider it a traditional file system (because in a lot of ways it’s very like a traditional file system). There’s this notion of the Semantic Web being developed right now, which will be a lot like what Microsoft envisions for their new data storage technology.

But today Google works pretty fuckin’ well, using the traditional approach. It doesn’t depend on a lot of central management of the data, where the Semantic Web will require that a whole lot of web pages will have to change. By “a whole lot” I actually mean just about all of them, i.e., billions.

The Semantic Web sounds like a great place to be in 10 years, but it’ll require a lot of ditch digging. And I don’t want to have to do the digging myself, nor do I envy the people who do end up doing it.

On the other hand, those Microsofties probably deserve it.