Changing the capacity of each VDEV in a ZPOOL without losing data and no downtime with ZFS
Before we start, let's define some acronyms.
FreeBSD: Free Berkeley Software Distribution
ZFS: Zettabyte File System
ZPOOL: ZFS POOL
DEV: DEVice
VDEV: Virtual DEVice
GEOM: disk GEOMetry
BHYVE: BSD HYperVisor
ZVOL: ZFS VOLume
***
I have been tinkering with FreeBSD, ZFS, GEOM, BHYVE, and other interesting technologies.
***
Let's experiment on how to change the capacity of each VDEV of a ZPOOL without losing data and no downtime with ZFS.
To do so, you need a set of disks that are bigger.
Example, if the ZPOOL has 2 mirror VDEVs with 2 10GB disks each (total: 4 10GB disks), and you want to upgrade to 20 GB disks, than you need 4 20GB disks for the upgrade.
Figure 1: Live upgrade of disks in a ZPOOL
ZPOOL tank ZPOOL tank
VDEV mirror-0 VDEV mirror-0
DEV /dev/vtbd1 DEV /dev/vtbd9
DEV /dev/vtbd2 ---> DEV /dev/vtbd10
VDEV mirror-1 VDEV mirror-1
DEV /dev/vtbd3 DEV /dev/vtbd11
DEV /dev/vtbd4 DEV /dev/vtbd12
VDEV mirror-2 VDEV mirror-2
DEV /dev/vtbd5 DEV /dev/vtbd13
DEV /dev/vtbd6 DEV /dev/vtbd14
VDEV mirror-3 VDEV mirror-3
DEV /dev/vtbd7 DEV /dev/vtbd15
DEV /dev/vtbd8 DEV /dev/vtbd16
***
Create a ZPOOL
root@nova:~ # zpool create tank \
mirror vtbd1 vtbd2 \
mirror vtbd3 vtbd4 \
mirror vtbd5 vtbd6 \
mirror vtbd7 vtbd8
root@nova:~ # zpool status
pool: tank
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
vtbd1 ONLINE 0 0 0
vtbd2 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
vtbd3 ONLINE 0 0 0
vtbd4 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
vtbd5 ONLINE 0 0 0
vtbd6 ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
vtbd7 ONLINE 0 0 0
vtbd8 ONLINE 0 0 0
errors: No known data errors
***
Add data to it.
root@nova:~ # zfs create -o mountpoint=/tank/dataset-x tank/dataset-x
root@nova:~ # rsync -av /boot /tank/dataset-x/
***
Now, we see that the ZPOOL tank is using 155M, with data on each VDEV.
root@nova:~ # df -h /tank/dataset-x/
Filesystem Size Used Avail Capacity Mounted on
tank/dataset-x 37G 153M 37G 0% /tank/dataset-x
root@nova:~ # zfs list
NAME USED AVAIL REFER MOUNTPOINT
tank 154M 36.7G 176K /tank
tank/dataset-x 153M 36.7G 153M /tank/dataset-x
root@nova:~ # zpool list -v
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
tank 38G 155M 37.8G - - 0% 0% 1.00x ONLINE -
mirror 9.50G 41.0M 9.46G - - 0% 0%
vtbd1 - - - - - - -
vtbd2 - - - - - - -
mirror 9.50G 41.9M 9.46G - - 0% 0%
vtbd3 - - - - - - -
vtbd4 - - - - - - -
mirror 9.50G 36.4M 9.46G - - 0% 0%
vtbd5 - - - - - - -
vtbd6 - - - - - - -
mirror 9.50G 35.4M 9.47G - - 0% 0%
vtbd7 - - - - - - -
vtbd8 - - - - - - -
***
Let's upgrade the 10-GB disks to 20-GB disks to double the capacity.
We will do it in-place, without putting the service offline, and without creating a new temporary pool.
***
Replace the disks of mirror-0.
root@nova:~ # zpool status
pool: tank
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
vtbd1 ONLINE 0 0 0
vtbd2 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
vtbd3 ONLINE 0 0 0
vtbd4 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
vtbd5 ONLINE 0 0 0
vtbd6 ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
vtbd7 ONLINE 0 0 0
vtbd8 ONLINE 0 0 0
root@nova:~ # zpool remove tank mirror-0
root@nova:~ # zpool status
pool: tank
state: ONLINE
scan: none requested
remove: Removal of vdev 0 copied 41.0M in 0h0m, completed on Thu Jun 11 20:46:12 2020
2.06K memory used for removed device mappings
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
vtbd3 ONLINE 0 0 0
vtbd4 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
vtbd5 ONLINE 0 0 0
vtbd6 ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
vtbd7 ONLINE 0 0 0
vtbd8 ONLINE 0 0 0
errors: No known data errors
***
We can see that ZFS is smart and that it copied the data that was on mirror-0 to the other ONLINE VDEVs of the ZPOOL tank.
root@nova:~ # zpool list -v
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
tank 28.5G 156M 28.3G - - 0% 0% 1.00x ONLINE -
mirror 9.50G 57.2M 9.44G - - 0% 0%
vtbd3 - - - - - - -
vtbd4 - - - - - - -
mirror 9.50G 50.2M 9.45G - - 0% 0%
vtbd5 - - - - - - -
vtbd6 - - - - - - -
mirror 9.50G 48.7M 9.45G - - 0% 0%
vtbd7 - - - - - - -
vtbd8 - - - - - - -
***
Now, add the new 20-GB disks for mirror-0.
root@nova:~ # zpool add tank mirror vtbd9 vtbd10
root@nova:~ # zpool status
pool: tank
state: ONLINE
scan: none requested
remove: Removal of vdev 0 copied 41.0M in 0h0m, completed on Thu Jun 11 20:46:12 2020
2.06K memory used for removed device mappings
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
vtbd3 ONLINE 0 0 0
vtbd4 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
vtbd5 ONLINE 0 0 0
vtbd6 ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
vtbd7 ONLINE 0 0 0
vtbd8 ONLINE 0 0 0
mirror-4 ONLINE 0 0 0
vtbd9 ONLINE 0 0 0
vtbd10 ONLINE 0 0 0
errors: No known data errors
root@nova:~ # zpool list -v
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
tank 48G 157M 47.8G - - 0% 0% 1.00x ONLINE -
mirror 9.50G 57.2M 9.44G - - 0% 0%
vtbd3 - - - - - - -
vtbd4 - - - - - - -
mirror 9.50G 50.3M 9.45G - - 0% 0%
vtbd5 - - - - - - -
vtbd6 - - - - - - -
mirror 9.50G 48.5M 9.45G - - 0% 0%
vtbd7 - - - - - - -
vtbd8 - - - - - - -
mirror 19.5G 552K 19.5G - - 0% 0%
vtbd9 - - - - - - -
vtbd10 - - - - - - -
***
Do the same thing for the other 10-GB disks.
root@nova:~ # zpool remove tank mirror-1
root@nova:~ # zpool add tank mirror vtbd11 vtbd12
root@nova:~ # zpool remove tank mirror-2
root@nova:~ # zpool add tank mirror vtbd13 vtbd14
root@nova:~ # zpool remove tank mirror-3
root@nova:~ # zpool add tank mirror vtbd15 vtbd16
***
The only problem is that the data is not balanced across the VDEVs of the ZPOOL tank. But hey, at least, we did it in place.
root@nova:~ # zpool status
pool: tank
state: ONLINE
scan: none requested
remove: Removal of vdev 3 copied 88.8M in 0h0m, completed on Thu Jun 11 20:55:49 2020
11.7K memory used for removed device mappings
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-4 ONLINE 0 0 0
vtbd9 ONLINE 0 0 0
vtbd10 ONLINE 0 0 0
mirror-5 ONLINE 0 0 0
vtbd11 ONLINE 0 0 0
vtbd12 ONLINE 0 0 0
mirror-6 ONLINE 0 0 0
vtbd13 ONLINE 0 0 0
vtbd14 ONLINE 0 0 0
mirror-7 ONLINE 0 0 0
vtbd15 ONLINE 0 0 0
vtbd16 ONLINE 0 0 0
errors: No known data errors
root@nova:~ # zpool list -v
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
tank 78G 157M 77.8G - - 0% 0% 1.00x ONLINE -
mirror 19.5G 69.9M 19.4G - - 0% 0%
vtbd9 - - - - - - -
vtbd10 - - - - - - -
mirror 19.5G 56.2M 19.4G - - 0% 0%
vtbd11 - - - - - - -
vtbd12 - - - - - - -
mirror 19.5G 31M 19.5G - - 0% 0%
vtbd13 - - - - - - -
vtbd14 - - - - - - -
mirror 19.5G 128K 19.5G - - 0% 0%
vtbd15 - - - - - - -
vtbd16 - - - - - - -
***
Now, verify the integrity of the data. The only file that changed is /boot/zpool.cache, which means that we should not have used the files in /boot in the first place...
root@nova:~ # diff -r -u /boot /tank/dataset-x/boot
Binary files /boot/zfs/zpool.cache and /tank/dataset-x/boot/zfs/zpool.cache differ
***
Let's rebalance the ZPOOL, the ZFS way. We simply need to create a snapshot, and send it in replicate mode, and then receive it.
root@nova:~ # zfs snapshot -r tank@2020-06-11-2111-0001
root@nova:~ # zfs list -t snapshot
NAME USED AVAIL REFER MOUNTPOINT
tank@2020-06-11-2111-0001 0 - 176K -
tank/dataset-x@2020-06-11-2111-0001 0 - 153M -
root@nova:~ # zfs send -R tank@2020-06-11-2111-0001 | zfs recv tank/rebalanced
root@nova:~ # zpool list -v
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
tank 78G 314M 77.7G - - 0% 0% 1.00x ONLINE -
mirror 19.5G 108M 19.4G - - 0% 0%
vtbd9 - - - - - - -
vtbd10 - - - - - - -
mirror 19.5G 96.1M 19.4G - - 0% 0%
vtbd11 - - - - - - -
vtbd12 - - - - - - -
mirror 19.5G 69.3M 19.4G - - 0% 0%
vtbd13 - - - - - - -
vtbd14 - - - - - - -
mirror 19.5G 41.0M 19.5G - - 0% 0%
vtbd15 - - - - - - -
vtbd16 - - - - - - -
root@nova:~ # zfs list
NAME USED AVAIL REFER MOUNTPOINT
tank 310M 75.3G 176K /tank
tank/dataset-x 153M 75.3G 153M /tank/dataset-x
tank/rebalanced 153M 75.3G 176K /tank/rebalanced
tank/rebalanced/dataset-x 153M 75.3G 153M /tank/dataset-x
The only thing left to do is to destroy the old dataset, and to rename the rebalanced one.
root@nova:~ # zfs destroy -r tank/dataset-x
root@nova:~ # zfs rename tank/rebalanced/dataset-x tank/dataset-x
root@nova:~ # zfs destroy -r tank/rebalanced
root@nova:~ # zfs list
NAME USED AVAIL REFER MOUNTPOINT
tank 156M 75.4G 176K /tank
tank/dataset-x 153M 75.4G 153M /tank/dataset-x
***
Now, let's check if the pool is balanced. It looks good.
root@nova:~ # zpool list -v
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
tank 78G 157M 77.8G - - 0% 0% 1.00x ONLINE -
mirror 19.5G 38.6M 19.5G - - 0% 0%
vtbd9 - - - - - - -
vtbd10 - - - - - - -
mirror 19.5G 40.3M 19.5G - - 0% 0%
vtbd11 - - - - - - -
vtbd12 - - - - - - -
mirror 19.5G 38.7M 19.5G - - 0% 0%
vtbd13 - - - - - - -
vtbd14 - - - - - - -
mirror 19.5G 39.6M 19.5G - - 0% 0%
vtbd15 - - - - - - -
vtbd16 - - - - - - -
***
Finally, verify the integrity of the files again.
Strangely, we maybe have hit a ZFS bug ! Oh no.
root@nova:~ # ls /tank/dataset-x/ | wc -l
0
But have no fear, ZFS garantees integrity.
root@nova:~ # zfs mount tank/dataset-x
root@nova:~ # diff -r -u /boot /tank/dataset-x/boot
Binary files /boot/zfs/zpool.cache and /tank/dataset-x/boot/zfs/zpool.cache differ
Everything seems in order.
***
We now have more capacity.
root@nova:~ # df -h /tank/dataset-x/
Filesystem Size Used Avail Capacity Mounted on
tank/dataset-x 76G 153M 75G 0% /tank/dataset-x
***
While preparing this post, I found a bug.
See #247188
Bug 247188 - "camcontrol devlist" does not show zvol-backed virtio devices
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=247188
Comments