Articles / Virtual Filesystem: Buildin...

Virtual Filesystem: Building A Linux Filesystem From An Ordinary File

You can take a disk file, format it as an ext2, ext3, or reiser filesystem, and then mount it, just like a physical drive. It's then possible to read and write files to this newly-mounted device. You can also copy the complete filesystem, since it is just a file, to another computer. If security is an issue, read on. This article will show you how to encrypt the filesystem and mount it with ACL (Access Control Lists), which gives you rights beyond the traditional read (r), write (w), and execute (x) permissions for the three user groups "file", "owner", and "other".

This is an excellent way to investigate different filesystems without having to reformat a physical drive, which means you avoid the hassle of moving all your data. This method is quick -- very quick compared to preparing a physical device. You can then read and write files to the mounted device, but what is truly great about this technique is that you can explore different filesystems such as reiserfs, ext3, or ext2 without having to purchase an additional physical drive. Since the same file can be mounted on more than one mount point, you can investigate sync rates.

Creating a filesystem in this manner allows you to set a hard limit on the amount of space used, which, of course, will be equal to the file size. This can be an advantage if you need to move this information to other servers. Since the contents cannot grow beyond the file, you can easily keep track of how much space is being used.

First, you want to create a 20MB file by executing the following command:

      $ dd if=/dev/zero of=disk-image count=40960
      40960+0 records in
      40960+0 records out

You created a 20 MB file because, by default, dd uses a block size of 512 bytes. That makes the size: 40960*512=20971520.

      $ ls -l disk-image
      -rw-rw-r--    1 chirico  chirico  20971520 Sep  3 14:24 disk-image

Next, to format this as an ext3 filesystem, you just execute the following command:

      $ /sbin/mkfs -t ext3 -q disk-image
      mke2fs 1.32 (09-Nov-2002)
      disk-image is not a block special device.
      Proceed anyway? (y,n) y

You are asked whether to proceed because this is a file, and not a block device. That is OK. We will mount this as a loopback device so that this file will simulate a block device.

Next, you need to create a directory that will serve as a mount point for the loopback device.

      $ mkdir fs

You are now one step away from the last step. You just want to find out what the next available loopback device number is. Normally, loopback devices start at zero (/dev/loop0) and work their way up (/dev/loop1, /dev/loop2, ... /dev/loopn). An easy way for you to find out what loopback devices are being used is to look into /proc/mounts, since the mount command may not give you what you need.

      $ cat /proc/mounts

      rootfs / rootfs rw 0 0
      /dev/root / ext3 rw 0 0
      /proc /proc proc rw,nodiratime 0 0
      none /sys sysfs rw 0 0
      /dev/sda1 /boot ext3 rw 0 0
      none /dev/pts devpts rw 0 0
      /proc/bus/usb /proc/bus/usb usbdevfs rw 0 0
      none /dev/shm tmpfs rw 0 0

On my computer, I have no loopback devices mounted, so I'm OK to start with zero. You must do the next command as root, or with an account that has superuser privileges.

      # mount -o loop=/dev/loop0 disk-image fs

That's it. You just mounted the file as a device. Now take a look at /proc/mounts, you will see this is using /dev/loop0.

      $ cat /proc/mounts

      rootfs / rootfs rw 0 0
      /dev/root / ext3 rw 0 0
      /proc /proc proc rw,nodiratime 0 0
      none /sys sysfs rw 0 0
      /dev/sda1 /boot ext3 rw 0 0
      none /dev/pts devpts rw 0 0
      /proc/bus/usb /proc/bus/usb usbdevfs rw 0 0
      none /dev/shm tmpfs rw 0 0
      /dev/loop0 /home/chirico/junk/fs ext3 rw 0 0

You can now create new files, write to them, read them, and do everything you normally would do on a disk drive. First, I'll give access to the chirico account.

      # chown -R chirico.chirico /home/chirico/junk/fs

Now, under the chirico account, it is possible to create files.

      $ cd /home/chirico/fs
      $ mkdir one two three
      $ ls -l

      total 15
      drwx------    2 chirico  chirico     12288 Sep  3 14:28 lost+found
      drwxrwxr-x    2 chirico  chirico      1024 Sep  3 14:34 one
      drwxrwxr-x    2 chirico  chirico      1024 Sep  3 14:34 three
      drwxrwxr-x    2 chirico  chirico      1024 Sep  3 14:34 two

      $ df -h

      Filesystem            Size  Used Avail Use% Mounted on
      /dev/sda2              17G   11G  4.6G  71% /
      /dev/sda1              99M   83M   11M  89% /boot
      none                   62M     0   62M   0% /dev/shm
      /home/chirico/junk/disk-image
                             20M  1.1M   18M   6% /home/chirico/junk/fs

If you need to umount the filesystem, as root, just issue the umount command. If you need to free the loopback device, execute the losetup command with the -d option. You can execute both commands as follows:

      # umount /home/chirico/junk/fs
      # losetup -d /dev/loop0

Using RWX -- The Old Way To Collaborate

Before we get started with ACL, how would you set up rights on the filesystem so that users could create and save documents that others could modify? For instance, let's say that users chirico and sporkey are collaborating on a project together.

Well, you have to add everyone to the same group. You would execute commands like these:.

      # groupadd sharefs
      # chown -R root.sharefs /home/chirico/junk/fs
      # chmod 2775 /home/chirico/junk/fs
      # usermod -G sharefs sporkey
      # usermod -G sharefs chirico

Note that if these changes do not take effect for your users (for example, if they were logged in when you executed the commands), they'll have to log out and log in again or execute the "$ newgrp sharefs" command. No big deal, right? Well, keep reading, and see how ACL avoids this step.

More importantly, even though the old way worked for you, at some point, new users may need to be added to the project. What if some of these users only need a subset of the rights? For instance, you have developers, testers, managers, and a few special people. There are limits to what the rwx type rights can do. ACL solves a lot of these problems.

ACL, Reiserfs, and AES Encryption: The 2.6 Kernel

For the next steps, I will assume that you are running Red Hat Fedora Core 2. If not, reference the 2.6 kernel upgrade section below. Four things will be covered in this section:

  • Create A File With Random Data
  • Set Up An AES Encrypted Loopback Device With Password
  • Build A Reiser Filesystem On The Loopback Device
  • Mount With ACL Capabilities

Your installation of Fedora Core 2, by default, will be configured for loop, cryptoloop, and aes, but it is highly unlikely that you will have all of these modules loaded. So, execute the following commands to load these modules (you will need to do this as root):

      # modprobe loop
      # modprobe cryptoloop
      # modprobe aes

Next, create a directory to store the files. The Reiser filesystem will require more space than the ext3 filesystem.

      # mkdir /home/diskimg
      # cd /home/diskimg

Instead of creating the file zeroed out, like you did with the ext3 filesystem, this one is going to contain random bits, which may add a little extra security.

      # dd if=/dev/urandom of=disk-aes count=102400

We need to encrypt the loop device, so you need to use losetup. You will be prompted for a password, which you will need to remember when you mount the device.

      # losetup -e aes /dev/loop1 ./disk-aes
        Password:

This step is new also. Instead of formating the file directly, you will format the loop device. The file stays encrypted. Again, you will be prompted to continue, so just enter "y".

      # mkfs -t reiserfs /dev/loop1

      mkfs.reiserfs 3.6.13 (2003 www.namesys.com)                                                
                                                                                           
      A pair of credits:                                                                   
      Elena Gryaznova performed testing and benchmarking.                                  
                                                                                           
      The  Defense  Advanced  Research  Projects Agency (DARPA, www.darpa.mil) is the      
      primary sponsor of Reiser4.  DARPA  does  not  endorse  this project; it merely      
      sponsors it.                                                                         
                                                                                           
                                                                                           
      Guessing about desired format.. Kernel 2.6.8-1.521 is running.                       
      Format 3.6 with standard journal                                                     
      Count of blocks on the device: 12800                                                 
      Number of blocks consumed by mkreiserfs formatting process: 8212                     
      Blocksize: 4096                                                                      
      Hash function used to sort names: "r5"                                               
      Journal Size 8193 blocks (first block 18)                                            
      Journal Max transaction length 1024                                                  
      inode generation number: 0                                                           
      UUID: 435e3495-5e2e-489d-bf55-1b5f9a44b670                                           
      ATTENTION: YOU SHOULD REBOOT AFTER FDISK!                                            
              ALL DATA WILL BE LOST ON '/dev/loop1'!                                       

      Continue (y/n):y                                                                     
      Initializing journal - 0%....20%....40%....60%....80%....100%                        
      Syncing..ok                                                                          
                                                                                           
      Tell your friends to use a kernel based on 2.4.18 or later, and especially not a     
      kernel based on 2.4.9, when you use reiserFS. Have fun.                              
                                                                                           
      ReiserFS is successfully created on /dev/loop1.                                      

Create the mount point /fs, and mount this device. Note that you will be entering the acl option as well. Plus, you will prompted for a password.

      # mkdir /fs
      # mount -o loop,encryption=aes,acl ./disk-aes /fs
        Password:

Ok, now take a look at the mount command. It should show up as the Reiser filesystem, encrypted, using ACL. Note that it says loop2; it mounted it on /dev/loop2, which is one above what losetup specified, /dev/loop1.

      $ mount
      /home/diskimg/disk-aes on /fs type reiserfs (rw,loop=/dev/loop2,encryption=aes,acl)

Exploring ACL

With ACL (Access Control Lists), you have finer control over access permissions. With the rwx permission scheme, you cannot easily change rights without creating new groups to handle the users. With ACL, you can set user permissions without creating a group, and individual users can add or remove access.

These rights are set with the setfacl command. The command below will give the users donkey, chirico, and bozo2 access to this new filesystem that we mounted. Again, I'm assuming that you are using Fedora Core 2, or some distribution that is set up for ACL.

# setfacl -R -m d:u:donkey:rwx,d:u:chirico:rwx,d:u:bozo2:rwx /fs

Next, create a few directories as one of the users. The example below was done as the user chirico.

      $ mkdir /fs/one
      $ touch /fs/one/stuff
      $ ls -l /fs/one/stuff
      -rw-rw----+ 1 chirico chirico 0 Sep  3 17:48 /fs/one/stuff

Notice the plus sign in the last line. It tells us a little about who has access. So, as user chirico, the getfacl command can be executed:

      $ getfacl /fs/one/stuff                                    

      getfacl: Removing leading '/' from absolute path names     
      # file: fs/one/stuff                                       
      # owner: chirico                                           
      # group: chirico                                           
      user::rw-                                                  
      user:chirico:rwx                #effective:rw-             
      user:donkey:rwx                 #effective:rw-             
      user:bozo2:rwx                  #effective:rw-             
      group::r-x                      #effective:r--             
      mask::rw-                                                  
      other::---                                                 

We now see that donkey, chirico, and bozo2 have effective rights on this file. Chirico has enough rights to remove bozo2.

      $ setfacl -x u:bozo2 /fs/one/stuff
      $ getfacl /fs/one/stuff
      getfacl: Removing leading '/' from absolute path names
      # file: fs/one/stuff
      # owner: chirico
      # group: chirico
      user::rw-
      user:chirico:rwx
      user:donkey:rwx
      group::r-x
      mask::rwx
      other::---

This is just scratching the surface of what can be done with ACL. For more information, see some of the references below.

2.6 Kernel Upgrade

This article will get you started with the 2.6 kernel if you are currently running Red Hat 8 or 9. You may want to take a look at it to see what is involved. If you decide to upgrade, you will need to configure your kernel for the following:

      CONFIG_BLK_DEV_LOOP
      CONFIG_BLK_DEV_CRYPTOLOOP
      CONFIG_CRYPTO_AES_586

This is done in the .config file, and you can download my config file here. Just look for kernel-2.6.8.1-i686-chirico-reiserfsacl.config in the tar.gz.

In addition to upgrading the kernel, you will need the latest version of the Linux utilities. Currently, there is no need to patch this version. In the past, there was a patch, but this version worked fine for me.

You will also need the Reiser tools.

References

Linux Tips and Tricks
Check out tips 12, 22, and 91, on how to use ssh with rsync. You can create a virtual filesystem on a server, then copy it to your laptop. As you work on the laptop, sync your changes using rsync.
Linux Magazine's article on ACL
This article goes into more depth on adding and removing users.
Access Control Lists in Linux
A PDF from Andreas Grünbacher.
Advanced Linux Programming
by Mark Mitchell, Jeffrey Oldham, and Alex Samuel, of CodeSourcery LLC, published by New Riders Publishing, ISBN 0-7357-1043-0, First Edition, June 2001. This book is free and you can view it online. Chapter 6 describes loopback devices.
Implementing Encrypted Home Directories
W. Michael Petullo, July 23, 2003.
The Loopback Encrypted Filesystem HOWTO
By Ryan T. Rhea.

Other Articles by Mike Chirico

Lemon Parser Generator Tutorial
This is a yacc alternative that is compact and thread safe. It is used in the sqlite project.
Recommended Reading
Read what others suggest. I started with a list of my own, and will add suggestions from other developers, readers, and opinionated people.
README_mysql.txt
Tips on MySQL.
README_COMCAST_EMAIL.txt
Tips on using Comcast Email with a home Linux box.

RSS Recent comments

27 Nov 2004 08:34 hattmoward

No need to check used loop devices.
Simply using 'loop' in the options argument for mount will locate the first available loop device and use it.

(Also, ACLs are nice, but you still want to use groups to manage access to files!)

27 Nov 2004 09:06 mchirico

unmount --

If you need to umount a file-system, you may be blocked from umounting, if someone else is on it. fuser will list the culprit users.

# fuser -u /filesystem

To kill all processes accessing the file system /filesystem, run the
following command:

# fuser -km /filesystem

27 Nov 2004 09:53 alienscience

OpenBSD
Its good to know how to do that in Linux. I find this sort of thing useful for keeping sensitive documents/code on my laptop. In OpenBSD the steps are similar except, after creating the diskimage file:

vnconfig -k svnd0 diskimage

# Enter password

newfs /dev/svnd0c

mount /dev/svnd0c /mnt/crypt

Then when finished

umount /mnt/crypt

vnconfig -u svnd0

With OpenBSD you could also partition the file using disklabel and have a different filesystem on each partition (although I've never had to make use of this).

28 Nov 2004 01:28 Tux2000

dd blocksize
dd has options to set the block size to values different from 512 bytes, this reduces the need to fiddle with numbers. For the purpose of creating loop-mountable files, the bs argument is what you want to use. The bs argument accepts common suffixes (k, M, G) for large numbers, at least in GNU's dd.

So instead of

dd if=/dev/zero of=disk-image count=40960

you can use the bs argument with a size of one megabyte, and use a count of 20 to get 20 megabytes:

dd if=/dev/zero of=disk-image count=20 bs=1M

Or, for a floppy image:

dd if=/dev/zero of=dd-floppy count=720 bs=1k

dd if=/dev/zero of=hd-floppy count=1440 bs=1k

dd if=/dev/zero of=ehd-floppy count=2880 bs=1k

Please note:

bs specifies the number of bytes read into memory with a single read() call and written out with a single write() call. The dd command in the article uses 512 bytes, but issues 40960 read() and 40960 write() calls. My dd command uses a block of one megabyte, but issues just 20 read() and 20 write() calls. This trades memory for syscalls (=speed).

Some peripheral devices prefer large block sizes, e.g. my 48X CDROM spins up to maximum speed only when I use large block sizes (20k up to 1M, preferable equivalent to its internal cache) that result in a fast linear read sequence on the cable. The same applies to image copies between harddisks.

For simple copy purposes from device to device (counting /dev/zero, /dev/null and image files as devices), large block sizes are usually faster because you need way less syscalls.

Especially the pseudo devices /dev/null and /dev/zero have virtually no block size limit and no preferred block sizes. /dev/null just ignores your data and returns a "no error" code, /dev/zero fills the requested buffer with zeros in fast assembler code. So there is no need for small block sizes, except in situations with extreme low memory.

28 Nov 2004 09:41 andrewziem

ACL permissions GUI
Now we just need Gnome and Konqueror to manage ACL permissions.

28 Nov 2004 10:27 r0b0

cryptoloop is obsolete
Cryptoloop is obsolete, buggy and has security weankesses. You should use dm_crypt instead.

01 Dec 2004 12:32 SlimOdds

Re: dd blocksize
How about the even more obvious:

dd if=/dev/zero of=disk-image count=1 bs=20M

> dd has options to set the block size to

> values different from 512 bytes, this

> reduces the need to fiddle with numbers.

> For the purpose of creating

> loop-mountable files, the bs argument is

> what you want to use. The bs argument

> accepts common suffixes (k, M, G) for

> large numbers, at least in GNU's dd.

>

> So instead of

>

> dd if=/dev/zero of=disk-image

> count=40960

>

> you can use the bs argument with a size

> of one megabyte, and use a count of 20

> to get 20 megabytes:

>

> dd if=/dev/zero of=disk-image count=20

> bs=1M

>

> Or, for a floppy image:

>

> dd if=/dev/zero of=dd-floppy count=720

> bs=1k

> dd if=/dev/zero of=hd-floppy count=1440

> bs=1k

> dd if=/dev/zero of=ehd-floppy count=2880

> bs=1k

>

> Please note:

> bs specifies the number of bytes read

> into memory with a single read() call

> and written out with a single write()

> call. The dd command in the article uses

> 512 bytes, but issues 40960 read() and

> 40960 write() calls. My dd command uses

> a block of one megabyte, but issues just

> 20 read() and 20 write() calls. This

> trades memory for syscalls (=speed).

>

> Some peripheral devices prefer large

> block sizes, e.g. my 48X CDROM spins up

> to maximum speed only when I use large

> block sizes (20k up to 1M, preferable

> equivalent to its internal cache) that

> result in a fast linear read sequence on

> the cable. The same applies to image

> copies between harddisks.

>

> For simple copy purposes from device to

> device (counting /dev/zero, /dev/null

> and image files as devices), large block

> sizes are usually faster because you

> need way less syscalls.

>

> Especially the pseudo devices /dev/null

> and /dev/zero have virtually no block

> size limit and no preferred block sizes.

> /dev/null just ignores your data and

> returns a "no error" code,

> /dev/zero fills the requested buffer

> with zeros in fast assembler code. So

> there is no need for small block sizes,

> except in situations with extreme low

> memory.

02 Dec 2004 23:15 Tux2000

Re: dd blocksize

> How about the even more obvious:

>

> dd if=/dev/zero of=disk-image count=1

> bs=20M

>

If you have the memory, yes.

But what if you want to create a 5 Gigabyte image, e.g. to test a filesystem for large file support or as an image for a DVD-R? dd ... bs=5G needs a system with RAM + SWAP > 5 GBytes, preferably with RAM > 5 GBytes.

It makes no sense to force dd to use the swap, it just slows down everything. The bs argument should be smaller than your available real memory. 1M is a safe bet for old and new systems, and it is fast enough for most cases.

1M is 2048 times the default block size of 512 bytes. To gain the same "boost" again, you would have to use at least 2048 times 1 M, i.e. 2G -- more than many machines can handle without swapping, and already more than many disk caches inside harddisks and optical drives. At this point, increasing block sizes does not gain more performance, but gives a penality because the drives can not use their cache properly. You have to wait for the drive mechanics to find the right track. This takes way much longer than reading from a cache. Cache access time is measured in nanoseconds, disk access time in milliseconds.

Screenshot

Project Spotlight

CorneliOS

A virtual Web OS.

Screenshot

Project Spotlight

Concordance

A utility to program Logitech Harmony remote controls.