Changing the capacity of each VDEV in a ZPOOL without losing data and no downtime with ZFS


Before we start, let's define some acronyms.

FreeBSD: Free Berkeley Software Distribution
ZFS: Zettabyte File System
ZPOOL: ZFS POOL
DEV: DEVice
VDEV: Virtual DEVice
GEOM: disk GEOMetry
BHYVE: BSD HYperVisor
ZVOL: ZFS VOLume

***

I have been tinkering with FreeBSD, ZFS, GEOM, BHYVE, and other interesting technologies.

***

Let's experiment on how to change the capacity of each VDEV of a ZPOOL without losing data and no downtime with ZFS.

To do so, you need a set of disks that are bigger.
Example, if the ZPOOL has 2 mirror VDEVs with 2 10GB disks each (total: 4 10GB disks), and you want to upgrade to 20 GB disks, than you need 4 20GB disks for the upgrade.


Figure 1: Live upgrade of disks in a ZPOOL

ZPOOL tank                           ZPOOL tank
    VDEV mirror-0                        VDEV mirror-0
        DEV /dev/vtbd1                    DEV /dev/vtbd9
        DEV /dev/vtbd2    --->         DEV /dev/vtbd10
    VDEV mirror-1                       VDEV mirror-1
        DEV /dev/vtbd3                    DEV /dev/vtbd11
        DEV /dev/vtbd4                    DEV /dev/vtbd12
    VDEV mirror-2                       VDEV mirror-2
        DEV /dev/vtbd5                    DEV /dev/vtbd13
        DEV /dev/vtbd6                    DEV /dev/vtbd14
    VDEV mirror-3                       VDEV mirror-3
        DEV /dev/vtbd7                    DEV /dev/vtbd15
        DEV /dev/vtbd8                    DEV /dev/vtbd16


***

Create a ZPOOL

root@nova:~ # zpool create tank \
    mirror vtbd1 vtbd2 \
    mirror vtbd3 vtbd4 \
    mirror vtbd5 vtbd6 \
    mirror vtbd7 vtbd8


root@nova:~ # zpool status
  pool: tank
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            vtbd1   ONLINE       0     0     0
            vtbd2   ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            vtbd3   ONLINE       0     0     0
            vtbd4   ONLINE       0     0     0
          mirror-2  ONLINE       0     0     0
            vtbd5   ONLINE       0     0     0
            vtbd6   ONLINE       0     0     0
          mirror-3  ONLINE       0     0     0
            vtbd7   ONLINE       0     0     0
            vtbd8   ONLINE       0     0     0

errors: No known data errors

***

Add data to it.

root@nova:~ # zfs create -o mountpoint=/tank/dataset-x tank/dataset-x

root@nova:~ # rsync -av /boot /tank/dataset-x/


***

Now, we see that the ZPOOL tank is using 155M, with data on each VDEV.


root@nova:~ # df -h /tank/dataset-x/
Filesystem        Size    Used   Avail Capacity  Mounted on
tank/dataset-x     37G    153M     37G     0%    /tank/dataset-x


root@nova:~ # zfs list
NAME             USED  AVAIL  REFER  MOUNTPOINT
tank             154M  36.7G   176K  /tank
tank/dataset-x   153M  36.7G   153M  /tank/dataset-x
 

root@nova:~ # zpool list -v
NAME         SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
tank          38G   155M  37.8G        -         -     0%     0%  1.00x  ONLINE  -
  mirror    9.50G  41.0M  9.46G        -         -     0%     0%
    vtbd1       -      -      -        -         -      -      -
    vtbd2       -      -      -        -         -      -      -
  mirror    9.50G  41.9M  9.46G        -         -     0%     0%
    vtbd3       -      -      -        -         -      -      -
    vtbd4       -      -      -        -         -      -      -
  mirror    9.50G  36.4M  9.46G        -         -     0%     0%
    vtbd5       -      -      -        -         -      -      -
    vtbd6       -      -      -        -         -      -      -
  mirror    9.50G  35.4M  9.47G        -         -     0%     0%
    vtbd7       -      -      -        -         -      -      -
    vtbd8       -      -      -        -         -      -      -

***

Let's upgrade the 10-GB disks to 20-GB disks to double the capacity.
We will do it in-place, without putting the service offline, and without creating a new temporary pool.

***

Replace the disks of mirror-0.

root@nova:~ # zpool status
  pool: tank
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            vtbd1   ONLINE       0     0     0
            vtbd2   ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            vtbd3   ONLINE       0     0     0
            vtbd4   ONLINE       0     0     0
          mirror-2  ONLINE       0     0     0
            vtbd5   ONLINE       0     0     0
            vtbd6   ONLINE       0     0     0
          mirror-3  ONLINE       0     0     0
            vtbd7   ONLINE       0     0     0
            vtbd8   ONLINE       0     0     0


root@nova:~ # zpool remove tank mirror-0
 

root@nova:~ # zpool status
  pool: tank
 state: ONLINE
  scan: none requested
remove: Removal of vdev 0 copied 41.0M in 0h0m, completed on Thu Jun 11 20:46:12 2020
    2.06K memory used for removed device mappings
config:

        NAME          STATE     READ WRITE CKSUM
        tank          ONLINE       0     0     0
          mirror-1    ONLINE       0     0     0
            vtbd3     ONLINE       0     0     0
            vtbd4     ONLINE       0     0     0
          mirror-2    ONLINE       0     0     0
            vtbd5     ONLINE       0     0     0
            vtbd6     ONLINE       0     0     0
          mirror-3    ONLINE       0     0     0
            vtbd7     ONLINE       0     0     0
            vtbd8     ONLINE       0     0     0

errors: No known data errors


***

We can see that ZFS is smart and that it copied the data that was on mirror-0 to the other ONLINE VDEVs of the ZPOOL tank.


root@nova:~ # zpool list -v
NAME           SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
tank          28.5G   156M  28.3G        -         -     0%     0%  1.00x  ONLINE  -
  mirror      9.50G  57.2M  9.44G        -         -     0%     0%
    vtbd3         -      -      -        -         -      -      -
    vtbd4         -      -      -        -         -      -      -
  mirror      9.50G  50.2M  9.45G        -         -     0%     0%
    vtbd5         -      -      -        -         -      -      -
    vtbd6         -      -      -        -         -      -      -
  mirror      9.50G  48.7M  9.45G        -         -     0%     0%
    vtbd7         -      -      -        -         -      -      -
    vtbd8         -      -      -        -         -      -      -


***

Now, add the new 20-GB disks for mirror-0.



root@nova:~ # zpool add tank mirror vtbd9 vtbd10

root@nova:~ # zpool status
  pool: tank
 state: ONLINE
  scan: none requested
remove: Removal of vdev 0 copied 41.0M in 0h0m, completed on Thu Jun 11 20:46:12 2020
    2.06K memory used for removed device mappings
config:

        NAME          STATE     READ WRITE CKSUM
        tank          ONLINE       0     0     0
          mirror-1    ONLINE       0     0     0
            vtbd3     ONLINE       0     0     0
            vtbd4     ONLINE       0     0     0
          mirror-2    ONLINE       0     0     0
            vtbd5     ONLINE       0     0     0
            vtbd6     ONLINE       0     0     0
          mirror-3    ONLINE       0     0     0
            vtbd7     ONLINE       0     0     0
            vtbd8     ONLINE       0     0     0
          mirror-4    ONLINE       0     0     0
            vtbd9     ONLINE       0     0     0
            vtbd10    ONLINE       0     0     0

errors: No known data errors


root@nova:~ # zpool list -v
NAME           SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
tank            48G   157M  47.8G        -         -     0%     0%  1.00x  ONLINE  -
  mirror      9.50G  57.2M  9.44G        -         -     0%     0%
    vtbd3         -      -      -        -         -      -      -
    vtbd4         -      -      -        -         -      -      -
  mirror      9.50G  50.3M  9.45G        -         -     0%     0%
    vtbd5         -      -      -        -         -      -      -
    vtbd6         -      -      -        -         -      -      -
  mirror      9.50G  48.5M  9.45G        -         -     0%     0%
    vtbd7         -      -      -        -         -      -      -
    vtbd8         -      -      -        -         -      -      -
  mirror      19.5G   552K  19.5G        -         -     0%     0%
    vtbd9         -      -      -        -         -      -      -
    vtbd10        -      -      -        -         -      -      -


***

Do the same thing for the other 10-GB disks.

root@nova:~ # zpool remove tank mirror-1
root@nova:~ # zpool add tank mirror vtbd11 vtbd12
root@nova:~ # zpool remove tank mirror-2
root@nova:~ # zpool add tank mirror vtbd13 vtbd14
root@nova:~ # zpool remove tank mirror-3
root@nova:~ # zpool add tank mirror vtbd15 vtbd16

***

The only problem is that the data is not balanced across the VDEVs of the ZPOOL tank. But hey, at least, we did it in place.

root@nova:~ # zpool status
  pool: tank
 state: ONLINE
  scan: none requested
remove: Removal of vdev 3 copied 88.8M in 0h0m, completed on Thu Jun 11 20:55:49 2020
    11.7K memory used for removed device mappings
config:

        NAME          STATE     READ WRITE CKSUM
        tank          ONLINE       0     0     0
          mirror-4    ONLINE       0     0     0
            vtbd9     ONLINE       0     0     0
            vtbd10    ONLINE       0     0     0
          mirror-5    ONLINE       0     0     0
            vtbd11    ONLINE       0     0     0
            vtbd12    ONLINE       0     0     0
          mirror-6    ONLINE       0     0     0
            vtbd13    ONLINE       0     0     0
            vtbd14    ONLINE       0     0     0
          mirror-7    ONLINE       0     0     0
            vtbd15    ONLINE       0     0     0
            vtbd16    ONLINE       0     0     0

errors: No known data errors
 

root@nova:~ # zpool list -v
NAME           SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
tank            78G   157M  77.8G        -         -     0%     0%  1.00x  ONLINE  -
  mirror      19.5G  69.9M  19.4G        -         -     0%     0%
    vtbd9         -      -      -        -         -      -      -
    vtbd10        -      -      -        -         -      -      -
  mirror      19.5G  56.2M  19.4G        -         -     0%     0%
    vtbd11        -      -      -        -         -      -      -
    vtbd12        -      -      -        -         -      -      -
  mirror      19.5G    31M  19.5G        -         -     0%     0%
    vtbd13        -      -      -        -         -      -      -
    vtbd14        -      -      -        -         -      -      -
  mirror      19.5G   128K  19.5G        -         -     0%     0%
    vtbd15        -      -      -        -         -      -      -
    vtbd16        -      -      -        -         -      -      -


***

Now, verify the integrity of the data. The only file that changed is  /boot/zpool.cache, which means that we should not have used the files in /boot in the first place...

root@nova:~ # diff -r -u /boot /tank/dataset-x/boot
Binary files /boot/zfs/zpool.cache and /tank/dataset-x/boot/zfs/zpool.cache differ


***

Let's rebalance the ZPOOL, the ZFS way. We simply need to create a snapshot, and send it in replicate mode, and then receive it.

root@nova:~ # zfs snapshot -r tank@2020-06-11-2111-0001
 

root@nova:~ # zfs list -t snapshot
NAME                                  USED  AVAIL  REFER  MOUNTPOINT
tank@2020-06-11-2111-0001                0      -   176K  -
tank/dataset-x@2020-06-11-2111-0001      0      -   153M  -


root@nova:~ # zfs send -R tank@2020-06-11-2111-0001 | zfs recv tank/rebalanced

root@nova:~ # zpool list -v
NAME           SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
tank            78G   314M  77.7G        -         -     0%     0%  1.00x  ONLINE  -
  mirror      19.5G   108M  19.4G        -         -     0%     0%
    vtbd9         -      -      -        -         -      -      -
    vtbd10        -      -      -        -         -      -      -
  mirror      19.5G  96.1M  19.4G        -         -     0%     0%
    vtbd11        -      -      -        -         -      -      -
    vtbd12        -      -      -        -         -      -      -
  mirror      19.5G  69.3M  19.4G        -         -     0%     0%
    vtbd13        -      -      -        -         -      -      -
    vtbd14        -      -      -        -         -      -      -
  mirror      19.5G  41.0M  19.5G        -         -     0%     0%
    vtbd15        -      -      -        -         -      -      -
    vtbd16        -      -      -        -         -      -      -
 

root@nova:~ # zfs list
NAME                        USED  AVAIL  REFER  MOUNTPOINT
tank                        310M  75.3G   176K  /tank
tank/dataset-x              153M  75.3G   153M  /tank/dataset-x
tank/rebalanced             153M  75.3G   176K  /tank/rebalanced
tank/rebalanced/dataset-x   153M  75.3G   153M  /tank/dataset-x


The only thing left to do is to destroy the old dataset, and to rename the rebalanced one.

root@nova:~ # zfs destroy -r tank/dataset-x

root@nova:~ # zfs rename tank/rebalanced/dataset-x tank/dataset-x
 

root@nova:~ # zfs destroy -r tank/rebalanced

root@nova:~ # zfs list
NAME             USED  AVAIL  REFER  MOUNTPOINT
tank             156M  75.4G   176K  /tank
tank/dataset-x   153M  75.4G   153M  /tank/dataset-x


***

Now, let's check if the pool is balanced. It looks good.

root@nova:~ # zpool list -v
NAME           SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
tank            78G   157M  77.8G        -         -     0%     0%  1.00x  ONLINE  -
  mirror      19.5G  38.6M  19.5G        -         -     0%     0%
    vtbd9         -      -      -        -         -      -      -
    vtbd10        -      -      -        -         -      -      -
  mirror      19.5G  40.3M  19.5G        -         -     0%     0%
    vtbd11        -      -      -        -         -      -      -
    vtbd12        -      -      -        -         -      -      -
  mirror      19.5G  38.7M  19.5G        -         -     0%     0%
    vtbd13        -      -      -        -         -      -      -
    vtbd14        -      -      -        -         -      -      -
  mirror      19.5G  39.6M  19.5G        -         -     0%     0%
    vtbd15        -      -      -        -         -      -      -
    vtbd16        -      -      -        -         -      -      -


***


Finally, verify the integrity of the files again.

Strangely, we maybe have hit a ZFS bug ! Oh no.

root@nova:~ # ls /tank/dataset-x/ | wc -l
0

But have no fear, ZFS garantees integrity.

root@nova:~ # zfs mount tank/dataset-x

root@nova:~ # diff -r -u /boot /tank/dataset-x/boot
Binary files /boot/zfs/zpool.cache and /tank/dataset-x/boot/zfs/zpool.cache differ


Everything seems in order.



***

We now have more capacity.

root@nova:~ # df -h /tank/dataset-x/
Filesystem        Size    Used   Avail Capacity  Mounted on
tank/dataset-x     76G    153M     75G     0%    /tank/dataset-x



***

While preparing this post, I found a bug.

See #247188
Bug 247188 - "camcontrol devlist" does not show zvol-backed virtio devices
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=247188

Comments

Popular posts from this blog

Learning to solve the example 1 of puzzle 3aa6fb7a in the ARC prize

The Thorium actor engine is operational now, we can start to work on actor applications for metagenomics

The source code of SOAPdenovo2 sits in the shadows