This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
zfsraid [2010/11/10 20:30] – Few more words peterjeremy | zfsraid [2015/04/24 08:17] (current) – The pool does get defrag'd peterjeremy | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ===== Expanding a ZFS RAID System ===== | + | ====== Expanding a ZFS RAID System |
This page documents my experiences in converting a 3-way RAIDZ1 pool into a 6-way RAIDZ2 pool (adding 3 new disks to the 3 original disk), with minimal downtime. | This page documents my experiences in converting a 3-way RAIDZ1 pool into a 6-way RAIDZ2 pool (adding 3 new disks to the 3 original disk), with minimal downtime. | ||
+ | In my case, I was expanding a pool named '' | ||
+ | by adding 3 new disks '' | ||
+ | a 3ware 9650SE-2LP since I'd run out of motherboard ports). | ||
- | ==== Original Configuration ==== | + | ===== Original Configuration |
< | < | ||
Line 49: | Line 52: | ||
</ | </ | ||
- | ==== Final Configuration ==== | + | ===== Final Configuration |
< | < | ||
Line 111: | Line 114: | ||
</ | </ | ||
- | ==== Procedure ==== | + | ===== Procedure |
- | === Partition up new disks === | + | The overall process is: |
+ | - Create a 6-way RAIDZ2 across the 3 new disks (ie each disk provides two vdevs). | ||
+ | - Copy the existing pool onto the new disks. | ||
+ | - Switch the system to use the new 6-way pool. | ||
+ | - Destroy the original pool. | ||
+ | - Replace the second vdev in each disk with one of the original disks. | ||
+ | - Re-partition the new disks to expand the remaining vdev to occupy the now unused space. | ||
- | In my case, I have root and swap on the same disks, so I needed to carve out space for that. Even if you use the disks solely for ZFS, it's probably a good idea to partition a couple of MB off the disks in case a replacement disk is slight | + | In detail: |
+ | |||
+ | ==== Partition up new disks ==== | ||
+ | |||
+ | In my case, I have root and swap on the same disks, so I needed to | ||
+ | carve out space for that. | ||
+ | Even if you use the disks solely for ZFS, it’s probably a good idea to | ||
+ | partition a couple of MB off the disks in case a replacement disk is | ||
+ | slightly | ||
+ | As shown above, the old disks have 5 partitions (boot, UFS root, UFS | ||
+ | /var, swap and ZFS). | ||
+ | My long-term plans are to switch to ZFS root, so I combined the space | ||
+ | allocated to both UFS partitions into one, but skipped p4 so that the | ||
+ | ZFS partition remained at p5 for consistency. | ||
+ | |||
+ | Apart from the boot partition, all partitions are aligned on 8-sector | ||
+ | boundaries to simplify possible future migration to 4KiB disks. | ||
+ | |||
+ | Initially, split the ZFS partition into two equal pieces: | ||
+ | |||
+ | < | ||
+ | for i in da0 da1 ada3; do | ||
+ | gpart add -b 34 -s 94 -t freebsd-boot $i | ||
+ | gpart add -b 128 -s 10485760 -i 2 -t freebsd-zfs $i | ||
+ | gpart add -b 10485888 -s 6291456 -i 3 -t freebsd-swap $i | ||
+ | gpart add -b 16777344 -s 968373895 -i 5 -t freebsd-zfs $i | ||
+ | gpart add -b 985151239 -s 968373895 -i 6 -t freebsd-zfs $i | ||
+ | gpart bootcode -b /boot/pmbr -p / | ||
+ | done | ||
+ | </ | ||
+ | |||
+ | At this stage, my disk layout is: | ||
+ | < | ||
+ | # gpart show | ||
+ | => 34 1953522988 | ||
+ | 34 94 | ||
+ | | ||
+ | | ||
+ | 12583040 | ||
+ | 16777344 | ||
+ | |||
+ | => 34 1953522988 | ||
+ | 34 94 | ||
+ | | ||
+ | | ||
+ | 12583040 | ||
+ | 16777344 | ||
+ | |||
+ | => 34 1953525101 | ||
+ | 34 94 | ||
+ | | ||
+ | | ||
+ | 12583040 | ||
+ | 16777344 | ||
+ | |||
+ | => 34 1953525101 | ||
+ | 34 94 | ||
+ | | ||
+ | 10485888 | ||
+ | 16777344 | ||
+ | | ||
+ | 1953525134 | ||
+ | |||
+ | => 34 1953525101 | ||
+ | 34 94 1 freebsd-boot | ||
+ | | ||
+ | 10485888 | ||
+ | 16777344 | ||
+ | | ||
+ | 1953525134 | ||
+ | |||
+ | => 34 1953525101 | ||
+ | 34 94 1 freebsd-boot | ||
+ | | ||
+ | 10485888 | ||
+ | 16777344 | ||
+ | | ||
+ | 1953525134 | ||
+ | |||
+ | </ | ||
+ | |||
+ | ==== Create 6-way RAIDZ2 zpool ==== | ||
+ | |||
+ | If the new disks have previously been used, particularly for ZFS, it's a | ||
+ | good idea to zero out the first and last 512KiB or more - which is where | ||
+ | ZFS stores its vdev labels. | ||
+ | |||
+ | < | ||
+ | for i in da0 da1 ada3; do | ||
+ | dd if=/ | ||
+ | dd if=/ | ||
+ | dd if=/ | ||
+ | dd if=/ | ||
+ | done | ||
+ | </ | ||
+ | |||
+ | I wanted my final pool configuration to have all the vdevs in alphabetical | ||
+ | order, so I allocated the temporary vdevs first. | ||
+ | |||
+ | < | ||
+ | zpool create tank2 raidz2 da0p6 da1p6 ada3p6 ada3p5 da0p5 da1p5 | ||
+ | zpool list | ||
+ | NAME SIZE | ||
+ | tank | ||
+ | tank2 2.70T | ||
+ | </ | ||
+ | |||
+ | ==== Initial data copy to new pool ==== | ||
+ | |||
+ | By using ZFS snapshots, I can transfer the majority of the pool contents | ||
+ | over to the new pool without impacting normal system operation. | ||
+ | significantly reduces the necessary system outage. | ||
+ | |||
+ | I recommend the use of ports/ | ||
+ | " | ||
+ | tool but have left it out of the following commands). | ||
+ | |||
+ | Note that the ' | ||
+ | filesystems copied to ' | ||
+ | any filesystems have ' | ||
+ | ' | ||
+ | |||
+ | < | ||
+ | zfs snapshot -r tank@20101104bu | ||
+ | zfs send -R tank@20101104bu | zfs recv -vuF -d tank2 | ||
+ | </ | ||
+ | |||
+ | If you are paranoid, you can then do a scrub on ' | ||
+ | be especially quick because having multiple vdevs per physical disk | ||
+ | causes additional seeking between vdevs. | ||
+ | |||
+ | < | ||
+ | zpool scrub tank2 | ||
+ | </ | ||
+ | |||
+ | ==== Switch to new pool ==== | ||
+ | |||
+ | This step entails a system outage but should be relatively quick because | ||
+ | the bulk of the data was copied in the previous step and this step just | ||
+ | needs to copy changes since the snapshot was taken. | ||
+ | took approximately 25 minutes but that included a second send/recv onto | ||
+ | my external backup disk as well as a couple of mistakes. | ||
+ | |||
+ | In order to prevent any updates, the system should be brought down to | ||
+ | single-user mode: | ||
+ | < | ||
+ | shutdown now | ||
+ | </ | ||
+ | |||
+ | Once nothing is writing to ZFS, a second snapshot can be taken and | ||
+ | transferred to the new pool. The rollback is needed if tank2 has been | ||
+ | altered since the previous 'zfs recv' (this includes atime updates). | ||
+ | |||
+ | < | ||
+ | zfs snapshot -r tank@20101105bu | ||
+ | zfs rollback -R tank2@20101104bu | ||
+ | zfs send -R -I tank@20101104bu tank@20101105bu | zfs recv -vu -d tank2 | ||
+ | </ | ||
+ | |||
+ | The original pool is now renamed by exporting and importing it under a | ||
+ | new name and then exporting it to umount it. | ||
+ | |||
+ | < | ||
+ | zpool export tank | ||
+ | zpool import tank tanko | ||
+ | zpool export tanko | ||
+ | </ | ||
+ | |||
+ | And the new pool is renamed to the wanted name via export/ | ||
+ | |||
+ | < | ||
+ | zpool export tank2 | ||
+ | zpool import tank2 tank | ||
+ | </ | ||
+ | |||
+ | The system can now be returned to multiuser mode and any required testing | ||
+ | performed. | ||
+ | |||
+ | < | ||
+ | exit | ||
+ | </ | ||
+ | |||
+ | ==== Replace vdevs ==== | ||
+ | |||
+ | I didn't explicitly destory the old pool but just wiped the vdev labels | ||
+ | as I reused the disks. | ||
+ | could (in theory) recreate the old pool even after I'd reused the first | ||
+ | disk (since it was RAIDZ1). | ||
+ | |||
+ | Note that the resilver appears to be achieved by regenerating the | ||
+ | disk contents from the remaining vdevs, rather than just copying the | ||
+ | disk being replaced (though normal FS writes appear to be addressed | ||
+ | to it). | ||
+ | |||
+ | First disk: | ||
+ | < | ||
+ | dd if=/ | ||
+ | dd if=/ | ||
+ | zpool replace tank da0p6 ada0p5 | ||
+ | </ | ||
+ | In my case, this took 7h20m to resilver 342G. | ||
+ | |||
+ | Second disk: | ||
+ | < | ||
+ | dd if=/ | ||
+ | dd if=/ | ||
+ | zpool replace tank da1p6 ada1p5 | ||
+ | </ | ||
+ | In my case, this took 4h53m to resilver 341G. | ||
+ | |||
+ | Third disk: | ||
+ | < | ||
+ | dd if=/ | ||
+ | dd if=/ | ||
+ | zpool replace tank ada3p6 ada2p5 | ||
+ | </ | ||
+ | In my case, this took 5h41m to resilver 342G. | ||
+ | |||
+ | At this point the pool is spread across all 6 disks but it still limited | ||
+ | to ~500GB per vdev. | ||
+ | |||
+ | ==== Expand pool ==== | ||
+ | |||
+ | In order to expand the pool, the vdevs on the 3 new disks need to be | ||
+ | resized. | ||
+ | requires a (short) outage. | ||
+ | |||
+ | For safety (to prevent ZFS confusion), the vdev metadata at the end of | ||
+ | the temporary vdevs was destroyed, since this would otherwise appear | ||
+ | at the end of the expanded vdevs. | ||
+ | |||
+ | < | ||
+ | dd if=/ | ||
+ | dd if=/ | ||
+ | dd if=/ | ||
+ | </ | ||
+ | |||
+ | The system needs to be placed in single-user mode to allow the partitions | ||
+ | and pool to be manipulated: | ||
+ | |||
+ | < | ||
+ | shutdown now | ||
+ | </ | ||
+ | |||
+ | Once in single-user mode, all 3 partition 6's can be deleted and the | ||
+ | partition 5's expanded (by deleting them and recreating them with the | ||
+ | larger size): | ||
+ | |||
+ | < | ||
+ | zpool export tank | ||
+ | for i in da0 da1 ada3; do | ||
+ | gpart delete -i 6 $i | ||
+ | gpart delete -i 5 $i | ||
+ | gpart add -b 16777344 -i 5 -t freebsd-zfs -s 1936747791 $i | ||
+ | done | ||
+ | zpool import tank | ||
+ | </ | ||
+ | |||
+ | The pool has now expanded to 4TB: | ||
+ | < | ||
+ | zpool list | ||
+ | NAME SIZE | ||
+ | back2 1.81T 1.57T | ||
+ | tank | ||
+ | </ | ||
+ | |||
+ | And the system can be restarted: | ||
+ | < | ||
+ | exit | ||
+ | </ | ||
+ | |||
+ | Remember to add the new disks to (eg) daily_status_smart_devices | ||