Filesystem Fun with FUSE

Do you believe that designing a filesystem is something which only kernel wizards can do? If so, this project will show you that even newbies can have plenty of fun writing cool little filesystems - with the help of a wonderful tool called FUSE (Fast Userspace filesystem). We will use FUSE to develop an interesting application called `Gdrive'. If you have a Gmail account, Gdrive will let you `mount' it and access your mail just like accessing ordinary files. Won't it be fun to use your Gmail account as a backup?

Getting started with FUSE

FUSE is composed of two parts - a kernel module and a user space library. Once the module is loaded, an application program can be linked with the userspace library and executed in the background. It is this program which implements the filesystem. As the code is not part of the kernel, it is very easy to do things like network and file I/O and you can think of creating all kinds of interesting `pseudo' file systems like the Gmail drive. If you are a Python fan like me, you can even implement your filesystem as a Python application.

FUSE can be downloaded from its home page at http://sourceforge.net/projects/fuse/. The installation is simple and involves invoking `configure' `make' and `make install'. Additionally, you would like to download the Python bindings from http://richard.jones.name/google-hacks/gmail-filesystem/fuse-python.tar.gz

Listing 1 is perhaps the simplest FUSE program which does something useful. Let's look at the way the program is executed. The first step is to load the FUSE module by typing:


modprobe fuse

Next, the `fusermount' command (which is part of the FUSE distribution) should be invoked as follows (mnt is the name of a directory and 1.py is the FUSE application written in Python):

fusermount mnt ./1.py &

Doing an `ls -ld mnt' resulted in the following lines getting displayed on my console:

called getattr: /
drwxr-xr-x    2 root     root         2048 Jan  1  1970 mnt/

Invoking `ls -ld' resulted in the `stat' Linux system call executing. The FUSE kernel module cleverly diverted this to the function `getattr' giving it as argument the name of the file upon which `stat' is being invoked. The response from `getattr' is a list containing among other things the permissions associated with the file, number of links and size (the order of the fields should be same as that specified for the return value of os.stat).

Filling up the directory

Doing an `ls -l mnt' results in the following error message:


ls: mnt/: Function not implemented

We are seeing this error because `ls' is now invoking a read operation on the directory; the FUSE kernel module tries to unsuccessfully redirect this to a `getdir' function in the application program. Listing 2 solves the problem by implementing a simple `getdir' which returns a list of file names. The files and the associated data is stored as a Python dictionary.

Implementing read

The next logical step is to allow application programs to read the contents of the files stored in our pseudo file system. A command like `cat mnt/a' should display the contents of the file properly. Any program which tries to read from a file will ultimately invoke two Linux system calls, `open' and `read'. The FUSE kernel module will automagically redirect these system calls to functions of a similar name present in our application program. Listing 3 presents two simple open and read routines. The logic of `read' is simple. It is supposed to return `length' bytes of data starting at the point specified by `offset'.

The FUSE distribution comes with a few example programs which demonstrate the implementation of all sorts of file operations. Readers should try adding one or two more operations (say, deleting a file) to the toy program which I have presented to get a better feel of the way FUSE works.

Gmail from the command line!

Google's Gmail gives you 1 Gb of storage space. A really cool idea would be to access this space just like you would access your local hard drive partition:


fusermount mnt ./gmail.py &
cp /home/pce/bigfile.tgz mnt
cp mnt/foo.gz /home/pce

We will use our FUSE skills to make this idea practical!

Gmail has the concept of `labels', rather than `folders'. It is possible to set up filters on the basis of rules like say:


From Address = To Address = xyz@gmail.com

and specify an action like:

Attach a label `gdrive' to all messages which 
match this rule. 

Now it becomes possible to list only those messages which have the label `gdrive' attached to them.

Using libgmail

There is a simple Python library called `libgmail' using which one can interact with Gmail from the command line. You can get it from http://libgmail.sourceforge.net. Listing 4 is a simple program which demonstrates the use of this library. Before running the program, we will create a directory say `abc' and populate it with a few small files. The program will be executed like this:


./4.py abc

It generates a list of all files under `abc' and sends each file as an attachment of a message whose subject line is same as the attached file's name. We shall send the mail to ourselves! The GmailComposedMessage constructor accepts as arguments the to address, subject line, body and a list which contains the names of files to be attached. We are sending each file as an independent self-mail with that file attached. You should log on to Gmail and configure a filter which will tag all self-mail's with the label `gdrive' before this program is launched.

Listing 5 demonstrates fetching mail and displaying it's properties. Each Gmail `thread' is composed of multiple messages and each message can have multiple attachments.

Bringing together libgmail and FUSE

Listing 6 is a program which helps you `mount' Gmail and perform read-only access of all files tagged with the `gdrive' label. Interested readers can convert it into a full fledged application allowing both read and write access (or follow an easier path and just download a full fledged GmailFS from http://richard.jones.name/google-hacks/gmail-filesystem/gmail-filesystem.html).

The logic of the program is simple. The dictionary `FILES' stores the names of files as keys. Associated with each key, we have a two element array which stores the body of the message as well as its size. The first time we do an `ls' on the directory on which Gmail is mounted, the FILES array will be empty; the gmail_readdir function is called to fill it up. The first time we try to read a message, the gmail_getmessage function is invoked to extract its contents, which is then cached in the dictionary FILES.

Conclusion

FUSE can be used for doing any number of small filesystem projects. If you have some hardware skills, you can think of getting a serial EEPROM and interfacing it with the parallel port; you can use FUSE to `mount' this device and store files on it - you have a home-brew thumb drive! If you have a digital camera (like the cheap and commonly available Kodak EasyShare CX6200) which can be accessed using the libgphoto2 library, you can think of writing a simple filesystem to `mount' your camera and copy images from it. If you don't like messing with hardware, and still haven't got the Gmail invitation from your friends, you can write a simple FTP filesystem which lets you `mount' an FTP server.

Source code/errata concerning this article can be downloaded from http://pramode.net/lfy-dec/