OpenAFS is a 100% open source globally distributed file system derived from IBM AFS commercial offering as of 1 November 2000. Since IBM released the source code OpenAFS has thrived adding support for new platforms while enhancing its overall performance, scalability and usability.
The solution is to leverage the MMC API to build a console for configuring OpenAFS. Issues to be address are whether to use the MMC 2.0 API or the MMC 3.0 API. The difference is that MMC 3.0 is managed and provides better error handling. The management console is essentially a user interface for the appropriate settings, so it is not tightly coupled to the existing OpenAFS source code.
From there, the user interface needs to be constructed. There are already sample screenshots in the OpenAFS for Windows roadmap as a starting point. Once there is a user interface, actually reading and writing the settings while leveraging elevation in Windows Vista as necessary is the next step.
Assumptions: single filesystem (server) support, assume interfacing with existing search tools. An indexer, a search service, and documentation and testing with by byproducts of this service. Index data should be mobile with the volume data and stored in a manner which allows ready access from new services. Lucene is being investigated for use in this project.
This project seeks to change the way the servers in the list are ordered based on network conditions between the client and the possible servers. Rx peer statistics can help provide information about these network conditions. Parts of these statistics can be used to indicate which servers are closest, proximity-wise, to the client as well as create an estimated throughput capacity for the server. These factors can be analyzed to represent performance between the cache manager client and the servers. This will allow for server prioritization based on the best performance of the server and the server's network connection, which subsequently can increase access times. This project will implement this prioritization on both Windows and Unix platforms, taking into account the statistics mechanisms. The Unix implementation will port relatively simply to other Unix implementations (AIX, HP-UX, etc.).
It is now 2009, the world economy is collapsing, the U.S. dollar has regained much of its value against other currencies and we suspect that many of you are thinking about how nice it would be to have US$4500.00 in your bank account (or perhaps under your mattress) at the end of the Summer. If you are considering applying to OpenAFS for a Summer of Code project and have not already read the GSoC 2009 FAQ we suggest that you do so.
As we indicated last year "OpenAFS is a very large and highly complex software project consisting of close to one million lines of source code that is severely under-documented. As a result it takes a long time for any developers to become proficient contributors to the core systems." We discovered during 2008 GSoC that the time it takes for new developers to come up to speed is even longer than we expected, especially when the developers have little real world experience working on large system projects. Here are some of the things that make OpenAFS development particularly challenging:
Not all of these challenges will apply to all GSoC projects. In fact, we have selected project proposals that we believe are manageable by many students. If you are unsure about any of the above challenges, OpenAFS has an active and friendly community that will be happy to provide you with help and guidance. Please join us on #openafs on freenode, or openafs@conference.openafs.org on Jabber
Even though OpenAFS is challenging to get familiar with it can be a very rewarding experience; OpenAFS's functionality offers a chance to learn many features, concepts and tools, nearly any of which singly could help you develop your career. If you are interested in working on OpenAFS this Summer (and perhaps for the rest of your life) here are some things that we recommend you do:
If you apply to OpenAFS please be aware of the following:
You can speak to members of the OpenAFS Community using two forums:
You may also engage the community via our mailing lists.
This year's AFS & Kerberos Workshop is being held at Stanford University the week of June 1 to 5. A students working with OpenAFS for Summer of Code are given free admission to the tutorials and workshop sessions. The workshop is an excellent opportunity to meet the community and learn more about how OpenAFS works and how it is used.
The following are a list of project ideas that members of the OpenAFS community believe are not only interesting challenges that can be completed over a Summer but that will provide a much needed feature or function for OpenAFS end users and administrators.
In GSoC 2009, Dragos Tatulea worked with Simon Wilkinson to implement read/write disconnected operations. With disconnected operations the contents of the AFS Cache are available for use even when it is impossible to communicate with the AFS file servers. The challenge for this project is to prototype on Linux or MacOS X a user interface that will permit a user to specify which files, directories or volumes are to be available for disconnected use at all times and which other objects are available for disconnected use as a side effect of the objects being cached due to recent use.
In order to store arbitrary metadata (aka attributes) with files or directories in file systems that do not have the necessary native support Apple developed the Apple DoubleFile format. AFS does not support arbitrary metadata and on Apple MacOS X systems, the operating system will create DoubleFiles without additional support from the AFS Cache Manager. This project is to implement native support for Apple DoubleFiles in one or more of the AFS Cache Managers exporting Posix Attributes on Unix/Linux platforms and Extended Attributes on Microsoft Windows. This will permit arbitrary metadata (for example, icons, thumbnail images, author, copyright info, gps tags, etc.) to be stored in AFS and shared across multiple operating systems without requiring changes to the AFS file servers.
A challenge of today's data driven computer world is the explosion of digital data. Somehow this data must be managed and organized. Many digital document repositories, while good at storing associated metadata about the documents, do not have an efficient metaphor for working with the data. Most filesystems, on the other hand, while good at allowing one to work with the files and data itself, do not have an efficient metaphor for searching and managing the extended metadata attributes. And while extended attribute conventions do exist, the extended attributes are usually not applicable to network filesystems and are usually not easily searchable.
The overall goal of this project is to extend OpenAFS to both store extended metadata attributes and to allow one to easily work with and search these attributes. Corollaries include ease of use client side tools.
Phase 1 involves prototyping how to design such a system. A system might be modeled after existing filesystems like HFS+ and ext 2/3 by using the POSIX extended attributes API as is. Or, a system might embrace the existing POSIX api while also offering new functionality.
Phase 2 involves implementing the storing of extended attributes in the filesystem. One might begin working with apple double files, which do not require modifying the file server, volume headers, or the afs clients. One can also use existing commandline tools (or easily write new ones) to work with the extended attributes themselves.
Phase 3 assumes all volumes on a single file server. The student will work on a backend to collect the metadata and to make the metadata searchable. This phase includes thinking about keeping track of changes to file locations, names, and attributes/metadata.
Additional areas of research would involve expanding to a distributed database scheme for multiple file servers as well as modifications to afs itself to more natively store the extended attributes in the filesystem/volumes.
The student should have existing skills in relational database design and in the C programming language. Work will most likely be done on the Linux platform.
OpenAFS on Windows has a very out of date Control Panel application that is used to configure the AFS Cache Manager service. In addition to being out of date, the Control Panel is not the appropriate user interface model for configuring a system service. On Microsoft Windows, system wide services should be configured using the Microsoft Management Console (MMC). This project is to implement a MMC Add-on for the OpenAFS Cache Manager.
Mockups of a proposed MMC user interface can be found at http://www.secure-endpoints.com/openafs-windows-roadmap.html#client service mmc. This project consists of multiple components that can be successfully implemented one at a time. The successful completion of this project does not require that all of the proposed MMC pages be implemented.
This project does not require any prior knowledge of AFS nor any knowledge of OpenAFS internals. By completing this project the developer will become an expert in the Microsoft Management Console interfaces and the Microsoft Component Object Model (COM). This experience can be reapplied to numerous other applications and will prove to be an excellent item on a resume.
The programming language for this project is C/C++. The operating system is Microsoft Windows. The project will aim to support Microsoft Windows versions from XP SP2 to Vista/Server 2008.
OpenAFS currently provides readonly replication of data. Many sites would like to use replication on all their data. A proposal exists to address this, however, no work has yet been done on implementation.
The goal of this project is create readwrite replication for OpenAFS. The proposal can be read here and we are available to answer questions about anything further. Experience with systems and network programming is highly desirable, however, experience with OpenAFS is not necessary.
A student choosing this project will be able to list on a resume and apply the skills necessary for developing network-distributed systems, especially with regard to file systems.
Last year's GSoC project offers code and information which should help you understand what's already done and what pieces remain for you to implement to complete this functionality.
In order that authentication systems other than Kerberos 4 be able to be first class authentication systems in the OpenAFS space, extensions to the OpenAFS protections service need to be added.
More detail on this project can be found here.
For successful completion of this project, both the RPC additions and database extensions will need to be completed.
This project does not require any prior knowledge of AFS nor any knowledge of OpenAFS internals. By completing this project the developer will become an expert in the OpenAFS ubik database structure as well as in the OpenAFS Rx RPC system. This experience would prove valuable in dealing with other flat database formats, as well as other RPC systems used in distributed applications.
The programming language for this project is C. This is in portable code which can be run on any POSIX-compliant OS; While MacOS, Linux or Solaris are suggested as development platforms due to availability of debugging tools, any supported OpenAFS platform could be used to do this work.
AFS supports advanced file permissions, using Access Control Lists (ACLs) on directories. The traditional AFS permission model is, however, showing signs of age. The AFS model is inflexible for administrators, who must store objects with disjoint permissions in separate directories. The AFS model is also inconsistent with POSIX ACLs (POSIX 1003.1e/1003.2c), DCE DFS, CIFS, and NFSv4 (http://www.ietf.org/rfc/rfc3530.txt), all of which allow ACLs (of varying types) on a per-file basis. Implementation of this feature will improve the usability of OpenAFS for administrators, and improve its interface with modern platforms.
The objective of this SOC project is to more fully specify and then implement a per-file ACL mechanism for OpenAFS. Engaging with the mentor and the community, the student will work out details of per-file ACEs, relationship of per-file and existing per-directory ACLs, select an on-disk representation for this design in the OpenAFS namei file server, and, minimally, expose the new representation to clients via the AFS file server protocol (its FetchACL RPC and supporting code, eg, RXFetch_AccessList). The project will close with a validation testing phase.
This project will provide an interested student with an opportunity to engage with the OpenAFS codebase and development community in a significant way, in both design and detailed implementation. The student will gain exposure to the complete operation (and end-to-end enhancement) of a major file-system features--from on-disk representation to remote procedure invocation, to manifestation on the client.
The student should have existing skills in C program implementation in a Unix environment, and have minimally completed University courses in data structures and algorithms, preferably in C.
More detail on this project can be found here.
The goal of this project would be to bring the Linux kernel kAFS client as close to feature-parity with the existing OpenAFS port as possible. Examples include:
Last year's GSoC project produced partial documentation of the pioctl functions available through AFS, a pioctl syscall and VFS entry point for the Linux kernel and implemented some pioctls for the kAFS call..
Develop a simple Volume and Namespace management system on top of OpenAFS. OpenAFS can be leveraged to create complex namespaces, and it is an effective wide-area file system. However, determining exactly how to configure OpenAFS to scale well across a wide area can be complex. This project will take a few specific scenarios and develop parameterized configurations for them that can then be used by OpenAFS users as blueprints. Specific example: create a single namespace split across two locations, with tools to manage synchronization and replication according to some basic policies on who can invoke replication and how to control the bandwidth utilization.
OpenAFS uses a UDP based RPC transport called Rx. uses the endpoint maximum transfer units (MTUs) to determine how large of a packet may be transmitted without requiring the packet to be broken into fragments on its journey. The prevalence of IP tunnelling, typically with reduced MTUs, on modern internet tpographies, means this is often not an accurate way to determine true maximum packet size. This project would develop and integrate into Rx a mechanism to discover the MTU of the path rather than merely the endpoints, resultingly tuning packet sizes accordingly.
Network Address Translation, or NAT, is now very common on IPv4 networks due to inavailability of IP space. Port mappings to edge clients, especially for UDP, often do not persist, and there are no well-known mappings to clients behind the NAT device. This sometimes causes problems with the AFS protocol's use of reverse RPCs to communicate conherency information. One technology for registering edge usage of ports is Universal Plug and Play, or UPnP. While not all appliances support it, integrating use of it into OpenAFS clients would offer great potential to improve the client experience for users. A project in this space would develop a new UPnP implementation or utilize a license-compatible implement along with additional work to integrate its use into the existing OpenAFS client codebase.
Anything else that you want to submit. Speak with members of the OpenAFS Community and brainstorm. Several students from GSoC 2008 or current Capstone projects have discussed their own idea and we are looking forward to their submissions.
In 2008 OpenAFS participated in Google's Summer of Code. Here are the projects that were suggested and what transpired.