In its ten plus years as an open source project, OpenAFS has established AFS as an open source success stories. OpenAFS provides clients for all of the major operating system distributions and servers on all UNIX/Linux variants. Even so, there is still a great deal that must be finished in order for AFS to acheive first class status on MacOS X and Microsoft Windows.
The work to be accomplished on OpenAFS falls into six broad categories:
It is the goal of the OpenAFS Elders to raise resources from the OpenAFS Community and others to successfully implement all of these functions over the next three to five years.
An Implementation and Release Schedule is provided at the end of the page.
A note about the estimates provided on this page. For many of the projects the Gatekeepers have designs, partially completed work, or even fully implemented systems. The estimates that are provided is the time necessary to complete the project and/or integrate it into a standard release of OpenAFS.
Core client functionality encompasses the AFS cache manager, file system interfaces, pioctl interfaces, and credential management.
The Microsoft Windows client has received significant attention over the last four years. It is a fully functional client that works on all Microsoft Windows releases from Windows 2000 SP4 through Windows Vista and Server 2008. For a summary, see the OpenAFS for Windows Status Report. Still, there are a number of deficiencies that adversely impact the ability of end users to use AFS to its full existing potential.
Read-only or Read-write disconnected mode:
Microsoft Windows users are used to the "Windows Offline Folders" functionality which permits them to synchronize local copies of files or folders from a CIFS server to their local disk for use when disconnected from the network. UMichigan long ago implemented a read-write disconnected mode for the UNIX AFS client which permits users to continue using data within the AFS cache while in an offline mode. Once the client is restored to an online state the modifications made to the cache buffers are written back to the file server provided that there are no conflicts. If there are conflicts a manual conflict resolution process must be initiated. Conflict resolution is hard but AFS users would gain a great deal even if the contents of the AFS cache were available when disconnected from the file servers in a read-only manner.
Estimate: 3 months
Status: no resource commitments
On-going maintenance necessary to keep up with backward incompatible changes to the Linux kernel and new interfaces provided and used therein.
update client to use system inodes instead of private inode pool.
There is a growing demand for pervasive access to data from handset devices. Clients for Symbian S60, Windows Mobile, Apple's iOS, OpenHandsetAlliance (aka Android), and Nokia/Intel Maemo/Meego devices will be critical in the years to come.
Client Cache Usage Tracking and Tuning:
The current cache managers implementation an explicit Least Recently Used algorithm for recycling objects. This algorithm does not take into account:
There have been many discussions about how hard AFS is to use, how end users don't want AFS and really want a WebDAV solution. What do those statements really mean? First, AFS isn't any harder to use than any other authenticated file system from the perspective of end users. If a user has an "encrypted" local disk she has to authenticate herself by providing her password. With the single sign-on solutions available for OpenAFS there isn't much reason for users today to be running without tokens when they have network access. Second, the statement that end users don't want AFS (as opposed to some other centralize storage solution) really makes no sense. End users don't ask for technologies, they ask for functionality. If a user wants centralized storage then the user wants centralized storage.
Users describe their desires using the technologies that are most familiar to them which today most often means Windows Shares (CIFS) and Browser based services. Why? Because those are the technologies the user is presented on his or her operating system's desktop. The vast majority of daily users are uncomfortable with command-line operations. Improving the ease of use of AFS can be achieved by providing tighter integration with the operating system desktop environment.
The Secure Endpoints Inc. OpenAFS Windows Road Map web page provides an number of mock ups of Explorer Shell extensions that can be used to not only make AFS more accessible to end users but also significantly improve its ease of use. By making the Explorer Shell AFS aware, users will be more comfortable using it. No longer will users have to use command line techniques to access AFS and manage its contents and metadata.
One of the most important ideas that was the result of discussions with Stanford University's Help Desk staff is the concept of Custom Name Spaces. On Microsoft Windows a Name Space is a virtual folder that appears as part of the Explorer Shell. The objects "My Computer", "My Documents", "Control Panel", My Network Places", "My Sharing Folders", etc. are all name spaces. Stanford University has been shipping for many years variations of an application now called "Stanford Desktop Tools". One of the features of SDT is the ability to search for classes, users, departments, and projects and map a drive letter to the associated AFS volume. Another feature is the ability to quickly map a drive letter to "my home directory". A final feature is the most recently used volume list.
With Name Spaces, we can implement all of this functionality. We can define a "recently used volumes list" which is always populated with the volumes the user most recently read or stored data to. We can define a "My Stanford Home Directory" name space that always contains a shortcut to the volume associated with the user's token for the ir.stanford.edu cell. We can also create name spaces for "Stanford Users", "Stanford Classes", "Stanford Departments", etc. Other organizations can distribute their own AFS name spaces that represent important data that is stored in their cell. AFS name spaces from multiple organizations can co-exist on the same system. Since name spaces are built into the Explorer Shell they are always easily accessible to the end user because they become a part of the Desktop.
A detailed proposal describing an AFS Name Spaces implementation is available in PDF.
Users expect to find a Control Panel for Services that support per-user configuration. For OpenAFS users can configure the behavior of the AFS Credential Provider for Network Identity Manager and their Protection Service Groups. For more details ...
System-wide configuration of Services are performed via Microsoft Management Console plug-ins. For more details ...
Microsoft Windows Vista User Account Control Privilege Separation. For more details ...
Apple doesn't permit the same degree of customization of the Finder as Microsoft does for the Explorer Shell. However, the Finder can be customized with an AFS virtual folder and AFS context menus. Likewise, certain other graphical interfaces which will become available in Leopard provide opportunities for customization to ease use of AFS.
Enhance Finder with an OpenAFS Context menu
In order for AFS to be treated as a first class file system for MacOS X and Microsoft Windows it must gain the following functionality:
Removing Directory Limitations:
The current AFS directory format and RPCs suffer from a number of limitations that adversely affect the user experience. A directory has a maximum of 64,000 entries if all file names are 16 or fewer octets. Longer names are implemented by consuming an additional entry for each additional 32 octets of file name. Given the ever increasing length of file names some cells are filling the directory with as few as 10,000 entries. Some scientific research projects require the use of millions of files perhaps containing a single data byte within a single directory.
The current AFS directory format is very inefficient for searching when case-insensitivity or Unicode normalization is required. Under these circumstances search time is linear to the number of entries in the directory. Many modern file systems implement the directory as a B+ tree to permit O(log n) searching. The existing format places a heavy burden on each and every cache manager. Each client must download a copy of the directory buffers and perform linear searching. This results in heavy use of the CPU when searching directories with 500 or more entries.
Another issue is the lack of support for internationalization. In the current directory format directory entries are stores as a sequence of octets without any character set hinting. A file that is stored using a name encoded with ISO 8859-5 or CP437 will not be represented correctly to the user on a system that expects UTF-8. Even when file names are stored using UTF-8 it is important to recognize that depending on the input mechanisms it is possible for a user to enter the same semantic string using different octet sequences. Therefore it is crucial that any implementation of Unicode file names support normalized forms for comparison.
Finally, Microsoft Windows and MacOS X are now requiring that first class file systems support the concept of multiple data streams per file. These streams are used to store extended attributes, security zone information, resource forks, and other forms of meta data in addition to providing a general purpose storage mechanism for applications. For more details ...
At the 2004 AFS Hackathon in Stockholm there was much discussion of potential methods of extending the existing directory format to support Unicode. http://www.afsig.se/afsig/space/AFS+directory+format+extensions However, these approaches did not address the directory search performance issues, the entry limitations or multiple data streams.
The current direction under consideration is to completely replace the on disk directory format with an entirely new one consisting of data blocks representing nodes in a B+ tree with each block containing a variable number of entries. The new data structure would be Unicode aware and support multiple data streams. Microsoft Windows clients would implement extended attributes in a reserved data stream. MacOS X clients would use a reserved stream for the resource fork.
New versions of all of the directory RPCs would be implemented to support the new data structure. Clients that use the new APIs would be delivered directory buffers which construct a B+ tree which in turn would significantly improve directory search times.
For old clients, new implementations of the old RPCs would deliver directory data translated to the old linear format up to the maximum number of directory entries. It is possible that old clients will not be able to see all the files in a given directory.
Status: Implementation promised by Your File System, 2011.
Extended Attributes are used by MacOS X to store resources and DOS Attributes. When they are not supported by the file system, MacOS X is forced to create the ._ (Apple Double files). Extended Attributes on Microsoft Windows are used to store a variety of meta-data about files and directories. The lack of EA support in AFS damages the Windows User Experience. AFS Cache Managers can implement support for extended attributes and store them in hidden Apple Double files while waiting for full EA support within AFS volumes.
For more details ... Status: Implementation promised by Your File System, 2011.
AFS supports per-directory ACLs. Per-file ACLs would make it possible to apply a different set of access constraints on a single object within a directory. At the present time storing multiple objects with different access controls requires that they be stored in separate directories. The AFS protocol provides partial support for this from the AFS/DFS translator, and this is supported in clients going back to IBM AFS. Status: Implementation promised by Your File System, 2011.
Mandatory Locking and Byte range locks:
Platforms such as Microsoft Windows and MacOS X require that their first class file systems support mandatory lock semantics and byte ranges. Applications which rely on these capabilities such as Microsoft Office and databases risk data corruption if their data files are altered while they are assumed to be under a lock. AFS only provides advisory full file locks and provides no upgradeable lock type. The existing AFS file server lock implementation doesn't keep track of which clients were issued locks which results a number of situations in which lock counts can become incorrect and produce a denial of service on a given file.
The Windows AFS client in the 1.5 series has added a localized implementation of mandatory locking and byte range locks. Each time an application requests a byte range to be locked, the cache manager ensures that it has an appropriate full lock on the object. The cache manager than accepts the responsibility of tracking each of the locks and doling out a range at a time.
Estimate: 2 months.
Status: Implementation promised by Your File System, 2011. Requires Standardization.
Status Data (Callback Registration) Expiration Algorithm Improvements:
Status data and callback registration expiration is currently determined based upon the number of clients that are accessing the data instead of the likelihood that the data is going to change.
Status: Implementation in progress.
OPEN/CLOSE File Server RPCs:
New file server RPCs would provide new audit data
Status: Implementation promised by Your File System, 2011
Additionally, Volker Lendecke implemented a similar project for use with Samba, which is not known to have been completed.
Luke Howard PADL Ltd.) developed an AFS Protection Service as part of his Active Directory clone, XAD. Ownership of XAD has since been transferred to Novell. However, it is expected that Luke will assist us in developing a new implementation in the coming months.
Status: Standardization complete. Implementation promised by Your File System, 2011. Blocked on required extensions to the Ubik database format.
Once AFS is capable of being used as a first class file system for Microsoft Windows clients it will make sense to support the AFS servers on the Windows Server platform as there are a large number of Microsoft Windows only IT organizations that do not have the expertise to manage UNIX/Linux systems. The servers are mostly there already. There is work that needs to be done on the NTFS Namei implementation and there needs to be much better integration with power management, plug-n-play networking, and Windows Event Logging.
Of course if you want to host services on Windows, you must provide a Microsoft Management Console plug-in to manage them.
The primary reason that we haven't spent the time and energy to get the AFS servers in tip top shape is that without the protocol feature enhancements, users that attempt to deploy AFS in an all Windows environment are bound to be disappointed.
For more details ...
Estimate: 4 to 6 weeks
Since that time, a large amount of networking research has been done on TCP performance, and modern TCP implementations are capable of good performance on high-speed networks. Also, other protocols such as Infiniband and SCTP have emerged as alternative transports to TCP. The growth of the World Wide Web has pushed operating system vendors to develop interfaces that allow applications to manage thousands of clients simultaneously in a scalable manner. While incremental improvements have been made to the Rx protocol since it was first developed, it has not been able to take advantage of the performance available in modern networks.
Our proposal includes the following work items:
Â· Develop an Rx protocol that uses TCP as a transport (RxTCP). As part of an earlier project, Kenneth Hornstein has developed a prototype of RxTCP; our goal is to complete this work and integrate it properly into the OpenAFS distribution.
There are many challenges with this approach. Much of the OpenAFS code has an implicit assumption that the transport protocol is the traditional Rx interface. This is further complicated by the lack of any formal API for Rx. As a result of this, many OpenAFS programs use what normally would be considered internal interfaces, and perform internal operations such as directly manipulate queues of packets. The Rx API has no way to indicate such things as connection types, so new API functions will need to be created. The Rx library makes heavy uses of threads, but uses two distinct threading libraries (pthreads and a custom thread library called LWP). Any new transport must not only be thread-safe, but also present the same threading model to applications so that they do not need to be rewritten. To maintain compatibility with existing clients and servers, the original UDP transport must still function simultaneously with the new transport.
Despite these obstacles, much work has been completed. The RxTCP transport has been implemented, and tests have shown excellent performance on Gigabit networks. The remaining challenge is to integrate this transport into the actual OpenAFS clients and servers. This new transport protocol addresses a number of deficiencies in the original Rx protocol and implementation:
o The original Rx protocol was (with one exception) limited in packet size to one Ethernet MTU (1500 bytes). In general, research has shown that larger packet sizes facilitate higher performance due to the overall reduction in per-packet processing time and the need to process fewer packets in order to send the same amount of data. With the current networking APIs available to applications today, programs cannot query the size of the MTU on a networking interface, nor can they determine the MTU of a particular network path (even though the operating system may have determined that already).
With the use of TCP, the operating system can make use of knowledge not available to the application, such as accurate estimates of round trip time, network path MTU, and interface MTU, and as a result can take advantage of the capabilities of modern networks, such as Ethernet jumbo frames. The exception to this 1500 byte packet limit is that Rx has a concept called a âjumbogramâ, which places multiple Rx packets into one UDP datagram. Unfortunately, in practice this results in no net gain in performance, since the Rx per-packet processing is not reduced.
o Since Rx utilizes a datagram transport but provides a reliable stream interface, it must implement many of the features that are already provided by TCP, such as a windowing algorithm, packet loss detection and retransmission, and congestion control. While incremental improvements have been made to the Rx implementation since it was first developed, it has not received the attention that TCP performance has had during the same time period. The use of TCP as the basis for Rx allows us to leverage the serious engineering work that has been done on TCP, rather than requiring that same level of effort be placed into Rx.
o The Rx implementation is very large and monolithic. Part of the size of Rx results in having to perform many of the same functions of TCP. This size and complexity makes it extremely difficult to understand and follow the code. In addition, most of the Rx work is done by a relatively small number of extremely complicated functions. In addition to lacking modularity, this makes profiling extremely difficult, as most of the time spent by Rx takes place in few functions, and it becomes difficult to get further granularity by profiling since most profiling tools operate on a function basis.
In contrast, RxTCP has a much smaller implementation, and is very modular. Since the overall complexity is reduced, this makes it easier to profile and understand.
o All data transmitted or received by Rx is divided internally into per-packet buffers, which means data that is delivered to an application must undergo two copies to reach its destination (kernel to Rx buffers, Rx buffers to application buffers).
RxTCP eliminates this copy completely on writes; data is sent directly from the application buffer to the network stack. On reads data is delivered directly from the network stack to an application when an application buffer is available; if an application buffer is not available, then data is placed into a large contiguous buffer internal to Rx and copied out completely when an application buffer becomes available.
o Since Rx is a custom protocol, any tool used to analyze Rx performance must be designed specifically for Rx use, or we must make modifications to an existing tool. With the use of TCP, we can use any one of a number of off-the-shelf tools to analyze TCP performance.
Â· Analyze remaining AFS performance deficiencies. Under an SBIR grant, Sine Nomine has already conducted a formal analysis of AFS performance. Poor Rx behavior was identified as the number one item impacting performance. Our strategy for dealing with Rx performance is to utilize RxTCP, as detailed above.
Aside from Rx, a number of other bottlenecks were identified in the UNIX OpenAFS client. After the integration of RxTCP, we will perform another series of benchmarks to mimic the ones in the Sine Nomine report to analyze the issues reported by Sine Nomine, and to see if any new issues arise. The issues identified by the Sine Nomine research were as follows:
o Write performance is hurt by UNIX OpenAFSâs sync-on-close semantics. Currently the UNIX OpenAFS client performs data synchronization when a file is closed by an application, or the client cache dirty blocks have exceeded a high-water mark threshold. For large files, this results in large time periods during writes where the network is quiescent, and the application must wait either when the file is closed or during a write for all of the outstanding data to be written to the fileserver.
One proposal offered in the Sine Nomine paper was to relax the sync-on-close semantics as offered by the AFS client today. Multi-client data consistency has never been an area where AFS has excelled, but our long-term experience has shown this is not a necessary function of OpenAFS. Especially for larger files, a more intelligent scheme would be to implement a relaxed consistency model where synchronization could occur between the client and server at any arbitrary time. In the case of HPC or video applications, an adaptive write-behind mechanism would be the most desirable option. We recognize that not all users desire the same semantics, so after a write-behind mechanism has been implemented the next step will be to develop the ability to select the data consistency model on a per-volume basis.
In addition, write-on-close semantics are incompatible with file locks. There is a strong desire in the AFS community to support mandatory file locking, byte range locks, and optimistic locking algorithms. This desire is primarily the result of wishing AFS to be a first class file system on the Microsoft Windows and MacOS X operating systems which require those semantics. When locks are obtained and released, the buffers affected by the locks must be flushed to the file server prior to the completion of the lock release.
The Windows OpenAFS client does not implement write-on-close semantics because of the heavy use of byte range file locks.
o If the client cache fills up during writes, the UNIX OpenAFS client blocks the process performing writes until the cache is completely flushed to the low-water mark. A better strategy would be to dispatch the truncation daemon in advance of the cache reaching a high-water mark threshold, using one of the many I/O prediction algorithms available in the common literature.
o Read performance of both OpenAFS client implementations are hampered by an inability to read more than a single chunk per RPC. When flushing dirty buffers to the file server, the client is able to send multiple contiguous chunks at a time thereby reducing the number of RPCs. On operating systems that provide hints as to the usage patterns of the file, performance can be improved by optimistically reading chunks whose need is anticipated.
o The restriction that no more than four RPCs may be outstanding on any Rx connection is another observed bottleneck. This is particularly troublesome when AFS is being used primarily to serve a single service such as a web server as only one Rx connection is created for any collection of client, server, and authentication ID.
Given the limitations imposed by network processing overhead, available network APIs, and system bus limitations we do not expect to achieve line rate performance at network speeds past 1 Gigabit. We believe that performance faster than 1 Gigabit is achievable, but we do not know what the practical limits are beyond that. One of our out year tasks is to investigate work being done at CERN in using âoff-line storageâ, where an OpenAFS client bypasses the OpenAFS fileserver completely and retrieves files via direct communication with disk hardware.
Estimated: 20 to 25 months.
Implementation promised by Your File System, 2012
Asynchronous RX RPCs:
All Rx calls in the existing implementation are synchronous. The currently executing thread must wait for completion. The maximum number of simultaneous requests that can be processed is limited by the number of threads that can be allocated to the process. By adding an asynchronous Rx call mode, the file server can be redesigned to process requests without blocking threads for callback breaks, whoareyou? probes, and getcps calls. This will permit a significant reduction in client requests waiting for threads.
Estimated: 6 weeks for Asynchronous Rx and 3 weeks for file server modifications.
Implementation promised by Your File System, 2012
rxgk is designed but has not yet been fully implemented. Love HÃ¶rnquist Ã
strand, Magnus Ahltorp, Jeffrey Hutzelman, Derrick Brashear and Jeffrey Altman met at KTH the week of 22 Jan 2007 to begin implementation of rxgk and modify as many of the AFS services as possible. Love presented a status report at the 2007 AFS & Kerberos Best Practice Workshop and did more work with Derrick the following week. Status: Standardization in progress. Implementation substantially complete. promised by Your File System, 2011
For users that are willing to give up the location independence of the data, there isn't much preventing the construction of a file server back end that reads and writes from the native file system provided that native file system has some way of notifying AFS when a file changed. Change notification is required for the file server to be able to callback the clients and report the invalidation of their data.
Another question that needs to be addressed is how to provide for authenticated access and access control lists. Finally, location discovery is a challenge that might be addressed with Apple's Bonjour and/or dyndns; this work can be extended to provide similar ability to discover a local cell for any client.
Estimate: 2 months
Most off the shelf backup systems only see file systems from the viewpoint of the user. Whereas backing up AFS so that a given volume can be restored as needed in a location independent manner is much more similar to backing up a distributed database. Backing up the files that the database writes does not allow for the necessary granularity of restores that are required. In addition, backing up the database files while they are in use results in data inconsistencies.
Teradactyl is one of the few remaining commercial offerings that have integrated support for AFS. VERITAS Net Backup and Tivoli Storage Manager have both dropped integrated AFS support. Teradactyl have been a sponsor of the AFS & Kerberos Best Practice Workshops for the last couple of years.
There have also been various efforts to contribute AFS support to Amanda, http://www.amanda.org/, and there have been efforts to provide an AFS wrapper to Legato Networker.
The implementation schedule for these projects is entirely dependent upon resource availability. Please send inquiries, comments, and offers of support to firstname.lastname@example.org. Where external contributors have promised contributions, they are included, as are timelines when those are provided. The following release schedule is subject to change.
The next release in the previous stable series for UNIX is expected before December 2011. This release will correct implementation defects and will most likely be the last release in the 1.4 series.
The 1.6 series is the current stable series for UNIX and the last stable series for Microsoft Windows without a native IFS implementation. 1.6.1 is the next release in the series and is expected before December 2011. The 1.6.1 release will correct implementation defects and add support for new kernel versions as necessary.
The 1.7 series is the development branch for the Windows IFS implementation. The first release on this branch was announced on 15 Sept 2011. Subsequent releases are expected every two to four weeks until the code enters maintenance mode after two or three months.
The 1.8 series will become the first stable release of OpenAFS to include the Windows IFS implementation. No other new features will be added to 1.8. December 2011.
The 1.9 series will replace the 1.5 series as the experimental release series. 1.9 releases will begin shortly after the 1.7 series has the Windows IFS implementation committed. Major new features will be integrated into 1.9 releases in preparation for the 1.10 stable release. October 2011.
The 2.0 series will replace the 1.6 and 1.8 series as the stable release series for UNIX and Microsoft Windows. The 2.0 series are scheduled to include the rxgk security class including Kerberos v5, RxUDP performance improvements, PTS authentication name extensions, and extended callbacks. Pre-release testing for 2.0 is expected to begin in February 2012.