This page documents my experiences in converting a 3-way RAIDZ1 pool into a 6-way RAIDZ2 pool (adding 3 new disks to the 3 original disk), with minimal downtime. The system boots off a UFS gmirror and this remains unchanged (though I have allocated equivalent space on the new disks to allow potential future conversion to ZFS).
In my case, I was expanding a pool named tank
, using partition 5
of disks ada0
, ada1
and ada2
by adding 3 new disks ada3
, da0
and da1
. (Despite the names, the latter are SATA disks, attached to
a 3ware 9650SE-2LP since I'd run out of motherboard ports).
server# gpart show => 34 1953522988 ada0 GPT (932G) 34 94 1 freebsd-boot (47K) 128 6291456 2 freebsd-ufs (3.0G) 6291584 6291456 3 freebsd-swap (3.0G) 12583040 4194304 4 freebsd-ufs (2.0G) 16777344 1936745678 5 freebsd-zfs (924G) => 34 1953522988 ada1 GPT (932G) 34 94 1 freebsd-boot (47K) 128 6291456 2 freebsd-ufs (3.0G) 6291584 6291456 3 freebsd-swap (3.0G) 12583040 4194304 4 freebsd-ufs (2.0G) 16777344 1936745678 5 freebsd-zfs (924G) => 34 1953525101 ada2 GPT (932G) 34 94 1 freebsd-boot (47K) 128 6291456 2 freebsd-ufs (3.0G) 6291584 6291456 3 freebsd-swap (3.0G) 12583040 4194304 4 freebsd-ufs (2.0G) 16777344 1936747791 5 freebsd-zfs (924G) server# zpool status -v pool: tank state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using 'zpool upgrade'. Once this is done, the pool will no longer be accessible on older software versions. scrub: scrub completed after 14h22m with 0 errors on Thu Oct 28 18:22:28 2010 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ada0p5 ONLINE 0 0 0 ada1p5 ONLINE 0 0 0 ada2p5 ONLINE 0 0 0 errors: No known data errors
server% gpart show => 34 1953525101 da0 GPT (932G) 34 94 1 freebsd-boot (47K) 128 10485760 2 freebsd-zfs (5.0G) 10485888 6291456 3 freebsd-swap (3.0G) 16777344 1936747791 5 freebsd-zfs (924G) => 34 1953525101 da1 GPT (932G) 34 94 1 freebsd-boot (47K) 128 10485760 2 freebsd-zfs (5.0G) 10485888 6291456 3 freebsd-swap (3.0G) 16777344 1936747791 5 freebsd-zfs (924G) => 34 1953522988 ada0 GPT (932G) 34 94 1 freebsd-boot (47K) 128 6291456 2 freebsd-ufs (3.0G) 6291584 6291456 3 freebsd-swap (3.0G) 12583040 4194304 4 freebsd-ufs (2.0G) 16777344 1936745678 5 freebsd-zfs (924G) => 34 1953522988 ada1 GPT (932G) 34 94 1 freebsd-boot (47K) 128 6291456 2 freebsd-ufs (3.0G) 6291584 6291456 3 freebsd-swap (3.0G) 12583040 4194304 4 freebsd-ufs (2.0G) 16777344 1936745678 5 freebsd-zfs (924G) => 34 1953525101 ada2 GPT (932G) 34 94 1 freebsd-boot (47K) 128 6291456 2 freebsd-ufs (3.0G) 6291584 6291456 3 freebsd-swap (3.0G) 12583040 4194304 4 freebsd-ufs (2.0G) 16777344 1936747791 5 freebsd-zfs (924G) => 34 1953525101 ada3 GPT (932G) 34 94 1 freebsd-boot (47K) 128 10485760 2 freebsd-zfs (5.0G) 10485888 6291456 3 freebsd-swap (3.0G) 16777344 1936747791 5 freebsd-zfs (924G) server% zpool status pool: tank state: ONLINE scrub: scrub completed after 3h54m with 0 errors on Sat Nov 6 16:35:22 2010 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz2 ONLINE 0 0 0 ada0p5 ONLINE 0 0 0 ada1p5 ONLINE 0 0 0 ada2p5 ONLINE 0 0 0 ada3p5 ONLINE 0 0 0 da0p5 ONLINE 0 0 0 da1p5 ONLINE 0 0 0 errors: No known data errors
The overall process is:
In detail:
In my case, I have root and swap on the same disks, so I needed to carve out space for that. Even if you use the disks solely for ZFS, it’s probably a good idea to partition a couple of MB off the disks in case a replacement disk is slightly smaller. As shown above, the old disks have 5 partitions (boot, UFS root, UFS /var, swap and ZFS). My long-term plans are to switch to ZFS root, so I combined the space allocated to both UFS partitions into one, but skipped p4 so that the ZFS partition remained at p5 for consistency.
Apart from the boot partition, all partitions are aligned on 8-sector boundaries to simplify possible future migration to 4KiB disks.
Initially, split the ZFS partition into two equal pieces:
for i in da0 da1 ada3; do gpart add -b 34 -s 94 -t freebsd-boot $i gpart add -b 128 -s 10485760 -i 2 -t freebsd-zfs $i gpart add -b 10485888 -s 6291456 -i 3 -t freebsd-swap $i gpart add -b 16777344 -s 968373895 -i 5 -t freebsd-zfs $i gpart add -b 985151239 -s 968373895 -i 6 -t freebsd-zfs $i gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 $i done
At this stage, my disk layout is:
# gpart show => 34 1953522988 ada0 GPT (932G) 34 94 1 freebsd-boot (47K) 128 6291456 2 freebsd-ufs (3.0G) 6291584 6291456 3 freebsd-swap (3.0G) 12583040 4194304 4 freebsd-ufs (2.0G) 16777344 1936745678 5 freebsd-zfs (924G) => 34 1953522988 ada1 GPT (932G) 34 94 1 freebsd-boot (47K) 128 6291456 2 freebsd-ufs (3.0G) 6291584 6291456 3 freebsd-swap (3.0G) 12583040 4194304 4 freebsd-ufs (2.0G) 16777344 1936745678 5 freebsd-zfs (924G) => 34 1953525101 ada2 GPT (932G) 34 94 1 freebsd-boot (47K) 128 6291456 2 freebsd-ufs (3.0G) 6291584 6291456 3 freebsd-swap (3.0G) 12583040 4194304 4 freebsd-ufs (2.0G) 16777344 1936747791 5 freebsd-zfs (924G) => 34 1953525101 ada3 GPT (932G) 34 94 1 freebsd-boot (47K) 128 10485760 2 freebsd-zfs (5.0G) 10485888 6291456 3 freebsd-swap (3.0G) 16777344 968373895 5 freebsd-zfs (462G) 985151239 968373895 6 freebsd-zfs (462G) 1953525134 1 - free - (512B) => 34 1953525101 da0 GPT (932G) 34 94 1 freebsd-boot (47K) 128 10485760 2 freebsd-zfs (5.0G) 10485888 6291456 3 freebsd-swap (3.0G) 16777344 968373895 5 freebsd-zfs (462G) 985151239 968373895 6 freebsd-zfs (462G) 1953525134 1 - free - (512B) => 34 1953525101 da1 GPT (932G) 34 94 1 freebsd-boot (47K) 128 10485760 2 freebsd-zfs (5.0G) 10485888 6291456 3 freebsd-swap (3.0G) 16777344 968373895 5 freebsd-zfs (462G) 985151239 968373895 6 freebsd-zfs (462G) 1953525134 1 - free - (512B)
If the new disks have previously been used, particularly for ZFS, it's a good idea to zero out the first and last 512KiB or more - which is where ZFS stores its vdev labels.
for i in da0 da1 ada3; do dd if=/dev/zero of=/dev/${i}p5 count=1024 dd if=/dev/zero of=/dev/${i}p5 seek=968372000 dd if=/dev/zero of=/dev/${i}p6 count=1024 dd if=/dev/zero of=/dev/${i}p6 seek=968372000 done
I wanted my final pool configuration to have all the vdevs in alphabetical order, so I allocated the temporary vdevs first.
zpool create tank2 raidz2 da0p6 da1p6 ada3p6 ada3p5 da0p5 da1p5 zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT tank 2.70T 2.34T 369G 86% ONLINE - tank2 2.70T 202K 2.70T 0% ONLINE -
By using ZFS snapshots, I can transfer the majority of the pool contents over to the new pool without impacting normal system operation. This significantly reduces the necessary system outage.
I recommend the use of ports/misc/mbuffer or similar between the “send” and “recv” to improve throughput (I used my own equivalent tool but have left it out of the following commands).
Note that the '-u' option to 'zfs recv' is crucial - otherwise the filesystems copied to 'tank2' will be automatically mounted. Where any filesystems have 'mountpoint' specified, this would result in the 'tank2' filesystem being mounted over the equivalent 'tank' filesystem.
zfs snapshot -r tank@20101104bu zfs send -R tank@20101104bu | zfs recv -vuF -d tank2
If you are paranoid, you can then do a scrub on 'tank2'. This will not be especially quick because having multiple vdevs per physical disk causes additional seeking between vdevs.
zpool scrub tank2
This step entails a system outage but should be relatively quick because the bulk of the data was copied in the previous step and this step just needs to copy changes since the snapshot was taken. In my case, this took approximately 25 minutes but that included a second send/recv onto my external backup disk as well as a couple of mistakes.
In order to prevent any updates, the system should be brought down to single-user mode:
shutdown now
Once nothing is writing to ZFS, a second snapshot can be taken and transferred to the new pool. The rollback is needed if tank2 has been altered since the previous 'zfs recv' (this includes atime updates).
zfs snapshot -r tank@20101105bu zfs rollback -R tank2@20101104bu zfs send -R -I tank@20101104bu tank@20101105bu | zfs recv -vu -d tank2
The original pool is now renamed by exporting and importing it under a new name and then exporting it to umount it.
zpool export tank zpool import tank tanko zpool export tanko
And the new pool is renamed to the wanted name via export/import.
zpool export tank2 zpool import tank2 tank
The system can now be returned to multiuser mode and any required testing performed.
exit
I didn't explicitly destory the old pool but just wiped the vdev labels as I reused the disks. This gave me slightly more recovery scope as I could (in theory) recreate the old pool even after I'd reused the first disk (since it was RAIDZ1).
Note that the resilver appears to be achieved by regenerating the disk contents from the remaining vdevs, rather than just copying the disk being replaced (though normal FS writes appear to be addressed to it).
First disk:
dd if=/dev/zero of=/dev/ada0p5 count=1024 dd if=/dev/zero of=/dev/ada0p5 seek=1936744000 zpool replace tank da0p6 ada0p5
In my case, this took 7h20m to resilver 342G.
Second disk:
dd if=/dev/zero of=/dev/ada1p5 count=1024 dd if=/dev/zero of=/dev/ada1p5 seek=1936744000 zpool replace tank da1p6 ada1p5
In my case, this took 4h53m to resilver 341G.
Third disk:
dd if=/dev/zero of=/dev/ada2p5 count=1024 dd if=/dev/zero of=/dev/ada2p5 seek=1936744000 zpool replace tank ada3p6 ada2p5
In my case, this took 5h41m to resilver 342G.
At this point the pool is spread across all 6 disks but it still limited to ~500GB per vdev.
In order to expand the pool, the vdevs on the 3 new disks need to be resized. It's not possible to expand the gpart partition so this also requires a (short) outage.
For safety (to prevent ZFS confusion), the vdev metadata at the end of the temporary vdevs was destroyed, since this would otherwise appear at the end of the expanded vdevs.
dd if=/dev/zero of=/dev/da0p6 seek=968372000 dd if=/dev/zero of=/dev/da1p6 seek=968372000 dd if=/dev/zero of=/dev/ada3p6 seek=968372000
The system needs to be placed in single-user mode to allow the partitions and pool to be manipulated:
shutdown now
Once in single-user mode, all 3 partition 6's can be deleted and the partition 5's expanded (by deleting them and recreating them with the larger size):
zpool export tank for i in da0 da1 ada3; do gpart delete -i 6 $i gpart delete -i 5 $i gpart add -b 16777344 -i 5 -t freebsd-zfs -s 1936747791 $i done zpool import tank
The pool has now expanded to 4TB:
zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT back2 1.81T 1.57T 243G 86% ONLINE - tank 5.41T 2.02T 3.39T 37% ONLINE -
And the system can be restarted:
exit
Remember to add the new disks to (eg) daily_status_smart_devices