Changing the capacity of each VDEV in a ZPOOL without losing data and no downtime with ZFS


Before we start, let's define some acronyms.

FreeBSD: Free Berkeley Software Distribution
ZFS: Zettabyte File System
ZPOOL: ZFS POOL
DEV: DEVice
VDEV: Virtual DEVice
GEOM: disk GEOMetry
BHYVE: BSD HYperVisor
ZVOL: ZFS VOLume

***

I have been tinkering with FreeBSD, ZFS, GEOM, BHYVE, and other interesting technologies.

***

Let's experiment on how to change the capacity of each VDEV of a ZPOOL without losing data and no downtime with ZFS.

To do so, you need a set of disks that are bigger.
Example, if the ZPOOL has 2 mirror VDEVs with 2 10GB disks each (total: 4 10GB disks), and you want to upgrade to 20 GB disks, than you need 4 20GB disks for the upgrade.


Figure 1: Live upgrade of disks in a ZPOOL

ZPOOL tank                           ZPOOL tank
    VDEV mirror-0                        VDEV mirror-0
        DEV /dev/vtbd1                    DEV /dev/vtbd9
        DEV /dev/vtbd2    --->         DEV /dev/vtbd10
    VDEV mirror-1                       VDEV mirror-1
        DEV /dev/vtbd3                    DEV /dev/vtbd11
        DEV /dev/vtbd4                    DEV /dev/vtbd12
    VDEV mirror-2                       VDEV mirror-2
        DEV /dev/vtbd5                    DEV /dev/vtbd13
        DEV /dev/vtbd6                    DEV /dev/vtbd14
    VDEV mirror-3                       VDEV mirror-3
        DEV /dev/vtbd7                    DEV /dev/vtbd15
        DEV /dev/vtbd8                    DEV /dev/vtbd16


***

Create a ZPOOL

root@nova:~ # zpool create tank \
    mirror vtbd1 vtbd2 \
    mirror vtbd3 vtbd4 \
    mirror vtbd5 vtbd6 \
    mirror vtbd7 vtbd8


root@nova:~ # zpool status
  pool: tank
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            vtbd1   ONLINE       0     0     0
            vtbd2   ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            vtbd3   ONLINE       0     0     0
            vtbd4   ONLINE       0     0     0
          mirror-2  ONLINE       0     0     0
            vtbd5   ONLINE       0     0     0
            vtbd6   ONLINE       0     0     0
          mirror-3  ONLINE       0     0     0
            vtbd7   ONLINE       0     0     0
            vtbd8   ONLINE       0     0     0

errors: No known data errors

***

Add data to it.

root@nova:~ # zfs create -o mountpoint=/tank/dataset-x tank/dataset-x

root@nova:~ # rsync -av /boot /tank/dataset-x/


***

Now, we see that the ZPOOL tank is using 155M, with data on each VDEV.


root@nova:~ # df -h /tank/dataset-x/
Filesystem        Size    Used   Avail Capacity  Mounted on
tank/dataset-x     37G    153M     37G     0%    /tank/dataset-x


root@nova:~ # zfs list
NAME             USED  AVAIL  REFER  MOUNTPOINT
tank             154M  36.7G   176K  /tank
tank/dataset-x   153M  36.7G   153M  /tank/dataset-x
 

root@nova:~ # zpool list -v
NAME         SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
tank          38G   155M  37.8G        -         -     0%     0%  1.00x  ONLINE  -
  mirror    9.50G  41.0M  9.46G        -         -     0%     0%
    vtbd1       -      -      -        -         -      -      -
    vtbd2       -      -      -        -         -      -      -
  mirror    9.50G  41.9M  9.46G        -         -     0%     0%
    vtbd3       -      -      -        -         -      -      -
    vtbd4       -      -      -        -         -      -      -
  mirror    9.50G  36.4M  9.46G        -         -     0%     0%
    vtbd5       -      -      -        -         -      -      -
    vtbd6       -      -      -        -         -      -      -
  mirror    9.50G  35.4M  9.47G        -         -     0%     0%
    vtbd7       -      -      -        -         -      -      -
    vtbd8       -      -      -        -         -      -      -

***

Let's upgrade the 10-GB disks to 20-GB disks to double the capacity.
We will do it in-place, without putting the service offline, and without creating a new temporary pool.

***

Replace the disks of mirror-0.

root@nova:~ # zpool status
  pool: tank
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            vtbd1   ONLINE       0     0     0
            vtbd2   ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            vtbd3   ONLINE       0     0     0
            vtbd4   ONLINE       0     0     0
          mirror-2  ONLINE       0     0     0
            vtbd5   ONLINE       0     0     0
            vtbd6   ONLINE       0     0     0
          mirror-3  ONLINE       0     0     0
            vtbd7   ONLINE       0     0     0
            vtbd8   ONLINE       0     0     0


root@nova:~ # zpool remove tank mirror-0
 

root@nova:~ # zpool status
  pool: tank
 state: ONLINE
  scan: none requested
remove: Removal of vdev 0 copied 41.0M in 0h0m, completed on Thu Jun 11 20:46:12 2020
    2.06K memory used for removed device mappings
config:

        NAME          STATE     READ WRITE CKSUM
        tank          ONLINE       0     0     0
          mirror-1    ONLINE       0     0     0
            vtbd3     ONLINE       0     0     0
            vtbd4     ONLINE       0     0     0
          mirror-2    ONLINE       0     0     0
            vtbd5     ONLINE       0     0     0
            vtbd6     ONLINE       0     0     0
          mirror-3    ONLINE       0     0     0
            vtbd7     ONLINE       0     0     0
            vtbd8     ONLINE       0     0     0

errors: No known data errors


***

We can see that ZFS is smart and that it copied the data that was on mirror-0 to the other ONLINE VDEVs of the ZPOOL tank.


root@nova:~ # zpool list -v
NAME           SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
tank          28.5G   156M  28.3G        -         -     0%     0%  1.00x  ONLINE  -
  mirror      9.50G  57.2M  9.44G        -         -     0%     0%
    vtbd3         -      -      -        -         -      -      -
    vtbd4         -      -      -        -         -      -      -
  mirror      9.50G  50.2M  9.45G        -         -     0%     0%
    vtbd5         -      -      -        -         -      -      -
    vtbd6         -      -      -        -         -      -      -
  mirror      9.50G  48.7M  9.45G        -         -     0%     0%
    vtbd7         -      -      -        -         -      -      -
    vtbd8         -      -      -        -         -      -      -


***

Now, add the new 20-GB disks for mirror-0.



root@nova:~ # zpool add tank mirror vtbd9 vtbd10

root@nova:~ # zpool status
  pool: tank
 state: ONLINE
  scan: none requested
remove: Removal of vdev 0 copied 41.0M in 0h0m, completed on Thu Jun 11 20:46:12 2020
    2.06K memory used for removed device mappings
config:

        NAME          STATE     READ WRITE CKSUM
        tank          ONLINE       0     0     0
          mirror-1    ONLINE       0     0     0
            vtbd3     ONLINE       0     0     0
            vtbd4     ONLINE       0     0     0
          mirror-2    ONLINE       0     0     0
            vtbd5     ONLINE       0     0     0
            vtbd6     ONLINE       0     0     0
          mirror-3    ONLINE       0     0     0
            vtbd7     ONLINE       0     0     0
            vtbd8     ONLINE       0     0     0
          mirror-4    ONLINE       0     0     0
            vtbd9     ONLINE       0     0     0
            vtbd10    ONLINE       0     0     0

errors: No known data errors


root@nova:~ # zpool list -v
NAME           SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
tank            48G   157M  47.8G        -         -     0%     0%  1.00x  ONLINE  -
  mirror      9.50G  57.2M  9.44G        -         -     0%     0%
    vtbd3         -      -      -        -         -      -      -
    vtbd4         -      -      -        -         -      -      -
  mirror      9.50G  50.3M  9.45G        -         -     0%     0%
    vtbd5         -      -      -        -         -      -      -
    vtbd6         -      -      -        -         -      -      -
  mirror      9.50G  48.5M  9.45G        -         -     0%     0%
    vtbd7         -      -      -        -         -      -      -
    vtbd8         -      -      -        -         -      -      -
  mirror      19.5G   552K  19.5G        -         -     0%     0%
    vtbd9         -      -      -        -         -      -      -
    vtbd10        -      -      -        -         -      -      -


***

Do the same thing for the other 10-GB disks.

root@nova:~ # zpool remove tank mirror-1
root@nova:~ # zpool add tank mirror vtbd11 vtbd12
root@nova:~ # zpool remove tank mirror-2
root@nova:~ # zpool add tank mirror vtbd13 vtbd14
root@nova:~ # zpool remove tank mirror-3
root@nova:~ # zpool add tank mirror vtbd15 vtbd16

***

The only problem is that the data is not balanced across the VDEVs of the ZPOOL tank. But hey, at least, we did it in place.

root@nova:~ # zpool status
  pool: tank
 state: ONLINE
  scan: none requested
remove: Removal of vdev 3 copied 88.8M in 0h0m, completed on Thu Jun 11 20:55:49 2020
    11.7K memory used for removed device mappings
config:

        NAME          STATE     READ WRITE CKSUM
        tank          ONLINE       0     0     0
          mirror-4    ONLINE       0     0     0
            vtbd9     ONLINE       0     0     0
            vtbd10    ONLINE       0     0     0
          mirror-5    ONLINE       0     0     0
            vtbd11    ONLINE       0     0     0
            vtbd12    ONLINE       0     0     0
          mirror-6    ONLINE       0     0     0
            vtbd13    ONLINE       0     0     0
            vtbd14    ONLINE       0     0     0
          mirror-7    ONLINE       0     0     0
            vtbd15    ONLINE       0     0     0
            vtbd16    ONLINE       0     0     0

errors: No known data errors
 

root@nova:~ # zpool list -v
NAME           SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
tank            78G   157M  77.8G        -         -     0%     0%  1.00x  ONLINE  -
  mirror      19.5G  69.9M  19.4G        -         -     0%     0%
    vtbd9         -      -      -        -         -      -      -
    vtbd10        -      -      -        -         -      -      -
  mirror      19.5G  56.2M  19.4G        -         -     0%     0%
    vtbd11        -      -      -        -         -      -      -
    vtbd12        -      -      -        -         -      -      -
  mirror      19.5G    31M  19.5G        -         -     0%     0%
    vtbd13        -      -      -        -         -      -      -
    vtbd14        -      -      -        -         -      -      -
  mirror      19.5G   128K  19.5G        -         -     0%     0%
    vtbd15        -      -      -        -         -      -      -
    vtbd16        -      -      -        -         -      -      -


***

Now, verify the integrity of the data. The only file that changed is  /boot/zpool.cache, which means that we should not have used the files in /boot in the first place...

root@nova:~ # diff -r -u /boot /tank/dataset-x/boot
Binary files /boot/zfs/zpool.cache and /tank/dataset-x/boot/zfs/zpool.cache differ


***

Let's rebalance the ZPOOL, the ZFS way. We simply need to create a snapshot, and send it in replicate mode, and then receive it.

root@nova:~ # zfs snapshot -r tank@2020-06-11-2111-0001
 

root@nova:~ # zfs list -t snapshot
NAME                                  USED  AVAIL  REFER  MOUNTPOINT
tank@2020-06-11-2111-0001                0      -   176K  -
tank/dataset-x@2020-06-11-2111-0001      0      -   153M  -


root@nova:~ # zfs send -R tank@2020-06-11-2111-0001 | zfs recv tank/rebalanced

root@nova:~ # zpool list -v
NAME           SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
tank            78G   314M  77.7G        -         -     0%     0%  1.00x  ONLINE  -
  mirror      19.5G   108M  19.4G        -         -     0%     0%
    vtbd9         -      -      -        -         -      -      -
    vtbd10        -      -      -        -         -      -      -
  mirror      19.5G  96.1M  19.4G        -         -     0%     0%
    vtbd11        -      -      -        -         -      -      -
    vtbd12        -      -      -        -         -      -      -
  mirror      19.5G  69.3M  19.4G        -         -     0%     0%
    vtbd13        -      -      -        -         -      -      -
    vtbd14        -      -      -        -         -      -      -
  mirror      19.5G  41.0M  19.5G        -         -     0%     0%
    vtbd15        -      -      -        -         -      -      -
    vtbd16        -      -      -        -         -      -      -
 

root@nova:~ # zfs list
NAME                        USED  AVAIL  REFER  MOUNTPOINT
tank                        310M  75.3G   176K  /tank
tank/dataset-x              153M  75.3G   153M  /tank/dataset-x
tank/rebalanced             153M  75.3G   176K  /tank/rebalanced
tank/rebalanced/dataset-x   153M  75.3G   153M  /tank/dataset-x


The only thing left to do is to destroy the old dataset, and to rename the rebalanced one.

root@nova:~ # zfs destroy -r tank/dataset-x

root@nova:~ # zfs rename tank/rebalanced/dataset-x tank/dataset-x
 

root@nova:~ # zfs destroy -r tank/rebalanced

root@nova:~ # zfs list
NAME             USED  AVAIL  REFER  MOUNTPOINT
tank             156M  75.4G   176K  /tank
tank/dataset-x   153M  75.4G   153M  /tank/dataset-x


***

Now, let's check if the pool is balanced. It looks good.

root@nova:~ # zpool list -v
NAME           SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
tank            78G   157M  77.8G        -         -     0%     0%  1.00x  ONLINE  -
  mirror      19.5G  38.6M  19.5G        -         -     0%     0%
    vtbd9         -      -      -        -         -      -      -
    vtbd10        -      -      -        -         -      -      -
  mirror      19.5G  40.3M  19.5G        -         -     0%     0%
    vtbd11        -      -      -        -         -      -      -
    vtbd12        -      -      -        -         -      -      -
  mirror      19.5G  38.7M  19.5G        -         -     0%     0%
    vtbd13        -      -      -        -         -      -      -
    vtbd14        -      -      -        -         -      -      -
  mirror      19.5G  39.6M  19.5G        -         -     0%     0%
    vtbd15        -      -      -        -         -      -      -
    vtbd16        -      -      -        -         -      -      -


***


Finally, verify the integrity of the files again.

Strangely, we maybe have hit a ZFS bug ! Oh no.

root@nova:~ # ls /tank/dataset-x/ | wc -l
0

But have no fear, ZFS garantees integrity.

root@nova:~ # zfs mount tank/dataset-x

root@nova:~ # diff -r -u /boot /tank/dataset-x/boot
Binary files /boot/zfs/zpool.cache and /tank/dataset-x/boot/zfs/zpool.cache differ


Everything seems in order.



***

We now have more capacity.

root@nova:~ # df -h /tank/dataset-x/
Filesystem        Size    Used   Avail Capacity  Mounted on
tank/dataset-x     76G    153M     75G     0%    /tank/dataset-x



***

While preparing this post, I found a bug.

See #247188
Bug 247188 - "camcontrol devlist" does not show zvol-backed virtio devices
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=247188

Comments

Popular posts from this blog

A survey of the burgeoning industry of cloud genomics

Generating neural machine instructions for multi-head attention

Adding ZVOL VIRTIO disks to a guest running on a host with the FreeBSD BHYVE hypervisor