[Beowulf] distributed file storage solution?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Bill Broadley bill at cse.ucdavis.eduMon Dec 11 17:53:58 PST 2006
- Previous message: [Beowulf] distributed file storage solution?
- Next message: [Beowulf] distributed file storage solution?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Eric Thibodeau wrote: > You can look into OpenAFS but be warned that you have to know infrastructure software quite well (LDAP+kerberos). It's cross-platform, can be distributed but don't think it's up to multiple writes on different mirrors though. > Indeed. There are many tough compromises in distributed filesystems. Alas there are many conflicting goals. Coherency vs performance is a big one, you pretty much get one or the other. Locking is another ugly one, databases and some applications assume bit range locking which is sometimes available, sometimes not. Many unix programs assuming posix locking, again sometimes available. So, unfortunately it's easy to ask for a distributed filesystem which does not exist. I'll provide my current brain dump on the various pieces I've been tracking, I'm sure there are some inaccuracies included, but hopefully they are small ones. As always comments and corrections welcome. A high level overview of opanafs: * Openafs is distributed, but not p2p. * performs well (assuming cache friendliness, and a single peer accessing the same files/directories) * scales well (for reads, because RO volumes can be replicated) * has a universal namespace * places little trust in a peer (getting root on a client != ability to read all files) * allows for transparent volume migration (the client doesn't complain when a volume is migrated) * perfect coherency (via a subscription model) * It also supports linux, OSX, and Windows (among others). * relatively complex. NFS in contrast: * Isn't distributed (unless you count automount) * has loose coherency (poll based) * No replication (corrections?) * Doesn't scale easily * Volume migration isn't easy (nfs4 claims to enable this, I've yet to see it demonstrated in the real world). * Is mostly unix specific (Microsoft had an NFS client but MS EoL'd it?) * relatively simple Lustre: * client server * scales extremely well, seems popular on the largest of clusters. * Can survive hardware failures assuming more than 1 block server is connected to each set of disks * unix only. * relatively complex. PVFS2: * Client server * scales well * can not survive a block server death. * unix only * relatively simple. * designed for use within a cluster. Oceanstore: * p2p * claims scalability to billions of users * Highly available/byzantine fault tolerant * complex * slow * in prototype stage * Requires use of an API (AFAIK it is not available as a transparently mounted filesystem) So the end result (from my skewed perspective) is: * NFS is hugely popular, easy, not very secure (at least by default), poor coherency, but for things like sharing /home within a cluster it works reasonably well. Seems most appropriate for LAN usage. Diskless to most implies NFS (and works well within a cluster or LAN). * Lustre and PVFS2 are popular in clusters for sharing files in larger clusters where more than single file server worth of bandwidth is required. Both I believe scale well with bandwidth but only allow for a single metadata server so will ultimately scale only as far as single machine for metadata intensive workloads (such as lock intensive, directory intensive, or file creation/deletion intensive workloads). Granted this also allows for exotic hardware solutions (like solid state storage) if you really need the performance. * AFS is popular for internet wide file service, researchers love the ability to run an application that requires 100 different libraries anywhere in the world. Sysadmins love it because then can migrate volumes without having to notify users or schedule downtime. I believe performance is usually somewhat less than NFS within a cluster (because of higher overhead), and usually significantly better outside a cluster (better caching and coherency). I'm less familiar with the various commercial filesystems like ibrix. Hopefully others will expand and correct the above.
- Previous message: [Beowulf] distributed file storage solution?
- Next message: [Beowulf] distributed file storage solution?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
