Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
zfsraid [2010/11/10 20:30] – Few more words peterjeremyzfsraid [2015/04/24 08:17] (current) – The pool does get defrag'd peterjeremy
Line 1: Line 1:
-===== Expanding a ZFS RAID System =====+====== Expanding a ZFS RAID System ======
  
 This page documents my experiences in converting a 3-way RAIDZ1 pool into a 6-way RAIDZ2 pool (adding 3 new disks to the 3 original disk), with minimal downtime.  The system boots off a UFS gmirror and this remains unchanged (though I have allocated equivalent space on the new disks to allow potential future conversion to ZFS). This page documents my experiences in converting a 3-way RAIDZ1 pool into a 6-way RAIDZ2 pool (adding 3 new disks to the 3 original disk), with minimal downtime.  The system boots off a UFS gmirror and this remains unchanged (though I have allocated equivalent space on the new disks to allow potential future conversion to ZFS).
  
 +In my case, I was expanding a pool named ''tank'', using partition ''5'' of disks ''ada0'', ''ada1'' and ''ada2''
 +by adding 3 new disks ''ada3'', ''da0'' and ''da1'' (Despite the names, the latter are SATA disks, attached to
 +a 3ware 9650SE-2LP since I'd run out of motherboard ports).
  
-==== Original Configuration ====+===== Original Configuration =====
  
 <code> <code>
Line 49: Line 52:
 </code> </code>
  
-==== Final Configuration ====+===== Final Configuration =====
  
 <code> <code>
Line 111: Line 114:
 </code> </code>
  
-==== Procedure ====+===== Procedure =====
  
-=== Partition up new disks ===+The overall process is: 
 +  - Create a 6-way RAIDZ2 across the 3 new disks (ie each disk provides two vdevs). 
 +  - Copy the existing pool onto the new disks. 
 +  - Switch the system to use the new 6-way pool. 
 +  - Destroy the original pool. 
 +  - Replace the second vdev in each disk with one of the original disks. 
 +  - Re-partition the new disks to expand the remaining vdev to occupy the now unused space.
  
-In my case, I have root and swap on the same disks, so I needed to carve out space for that.  Even if you use the disks solely for ZFS, it's probably a good idea to partition a couple of MB off the disks in case a replacement disk is slight smaller.+In detail: 
 + 
 +==== Partition up new disks ==== 
 + 
 +In my case, I have root and swap on the same disks, so I needed to 
 +carve out space for that. 
 +Even if you use the disks solely for ZFS, its probably a good idea to 
 +partition a couple of MB off the disks in case a replacement disk is 
 +slightly smaller. 
 +As shown above, the old disks have 5 partitions (boot, UFS root, UFS 
 +/var, swap and ZFS). 
 +My long-term plans are to switch to ZFS root, so I combined the space 
 +allocated to both UFS partitions into one, but skipped p4 so that the 
 +ZFS partition remained at p5 for consistency. 
 + 
 +Apart from the boot partition, all partitions are aligned on 8-sector 
 +boundaries to simplify possible future migration to 4KiB disks. 
 + 
 +Initially, split the ZFS partition into two equal pieces: 
 + 
 +<code> 
 +for i in da0 da1 ada3; do 
 +  gpart add -b 34 -s 94 -t freebsd-boot $i 
 +  gpart add -b 128 -s 10485760 -i 2 -t freebsd-zfs $i 
 +  gpart add -b 10485888 -s 6291456 -i 3 -t freebsd-swap $i 
 +  gpart add -b  16777344 -s 968373895 -i 5 -t freebsd-zfs $i 
 +  gpart add -b 985151239 -s 968373895 -i 6 -t freebsd-zfs $i 
 +  gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 $i 
 +done 
 +</code> 
 + 
 +At this stage, my disk layout is: 
 +<code> 
 +# gpart show 
 +=>        34  1953522988  ada0  GPT  (932G) 
 +          34          94      freebsd-boot  (47K) 
 +         128     6291456      freebsd-ufs  (3.0G) 
 +     6291584     6291456      freebsd-swap  (3.0G) 
 +    12583040     4194304      freebsd-ufs  (2.0G) 
 +    16777344  1936745678      freebsd-zfs  (924G) 
 + 
 +=>        34  1953522988  ada1  GPT  (932G) 
 +          34          94      freebsd-boot  (47K) 
 +         128     6291456      freebsd-ufs  (3.0G) 
 +     6291584     6291456      freebsd-swap  (3.0G) 
 +    12583040     4194304      freebsd-ufs  (2.0G) 
 +    16777344  1936745678      freebsd-zfs  (924G) 
 + 
 +=>        34  1953525101  ada2  GPT  (932G) 
 +          34          94      freebsd-boot  (47K) 
 +         128     6291456      freebsd-ufs  (3.0G) 
 +     6291584     6291456      freebsd-swap  (3.0G) 
 +    12583040     4194304      freebsd-ufs  (2.0G) 
 +    16777344  1936747791      freebsd-zfs  (924G) 
 + 
 +=>        34  1953525101  ada3  GPT  (932G) 
 +          34          94      freebsd-boot  (47K) 
 +         128    10485760      freebsd-zfs  (5.0G) 
 +    10485888     6291456      freebsd-swap  (3.0G) 
 +    16777344   968373895      freebsd-zfs  (462G) 
 +   985151239   968373895      freebsd-zfs  (462G) 
 +  1953525134                  - free -  (512B) 
 + 
 +=>        34  1953525101  da0  GPT  (932G) 
 +          34          94    1  freebsd-boot  (47K) 
 +         128    10485760    2  freebsd-zfs  (5.0G) 
 +    10485888     6291456    3  freebsd-swap  (3.0G) 
 +    16777344   968373895    5  freebsd-zfs  (462G) 
 +   985151239   968373895    6  freebsd-zfs  (462G) 
 +  1953525134                 - free -  (512B) 
 + 
 +=>        34  1953525101  da1  GPT  (932G) 
 +          34          94    1  freebsd-boot  (47K) 
 +         128    10485760    2  freebsd-zfs  (5.0G) 
 +    10485888     6291456    3  freebsd-swap  (3.0G) 
 +    16777344   968373895    5  freebsd-zfs  (462G) 
 +   985151239   968373895    6  freebsd-zfs  (462G) 
 +  1953525134                 - free -  (512B) 
 + 
 +</code> 
 + 
 +==== Create 6-way RAIDZ2 zpool ==== 
 + 
 +If the new disks have previously been used, particularly for ZFS, it's a 
 +good idea to zero out the first and last 512KiB or more - which is where 
 +ZFS stores its vdev labels. 
 + 
 +<code> 
 +for i in da0 da1 ada3; do 
 +  dd if=/dev/zero of=/dev/${i}p5 count=1024 
 +  dd if=/dev/zero of=/dev/${i}p5 seek=968372000 
 +  dd if=/dev/zero of=/dev/${i}p6 count=1024 
 +  dd if=/dev/zero of=/dev/${i}p6 seek=968372000 
 +done 
 +</code> 
 + 
 +I wanted my final pool configuration to have all the vdevs in alphabetical 
 +order, so I allocated the temporary vdevs first. 
 + 
 +<code> 
 +zpool create tank2 raidz2 da0p6 da1p6 ada3p6 ada3p5 da0p5 da1p5 
 +zpool list 
 +NAME    SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT 
 +tank   2.70T  2.34T   369G    86%  ONLINE 
 +tank2  2.70T   202K  2.70T     0%  ONLINE 
 +</code> 
 + 
 +==== Initial data copy to new pool ==== 
 + 
 +By using ZFS snapshots, I can transfer the majority of the pool contents 
 +over to the new pool without impacting normal system operation.  This 
 +significantly reduces the necessary system outage. 
 + 
 +I recommend the use of ports/misc/mbuffer or similar between the 
 +"send" and "recv" to improve throughput (I used my own equivalent 
 +tool but have left it out of the following commands). 
 + 
 +Note that the '-u' option to 'zfs recv' is crucial - otherwise the 
 +filesystems copied to 'tank2' will be automatically mounted.  Where 
 +any filesystems have 'mountpoint' specified, this would result in the 
 +'tank2' filesystem being mounted over the equivalent 'tank' filesystem. 
 + 
 +<code> 
 +zfs snapshot -r tank@20101104bu 
 +zfs send -R tank@20101104bu | zfs recv -vuF -d tank2 
 +</code> 
 + 
 +If you are paranoid, you can then do a scrub on 'tank2' This will not 
 +be especially quick because having multiple vdevs per physical disk 
 +causes additional seeking between vdevs. 
 + 
 +<code> 
 +  zpool scrub tank2 
 +</code> 
 + 
 +==== Switch to new pool ==== 
 + 
 +This step entails a system outage but should be relatively quick because 
 +the bulk of the data was copied in the previous step and this step just 
 +needs to copy changes since the snapshot was taken.  In my case, this 
 +took approximately 25 minutes but that included a second send/recv onto 
 +my external backup disk as well as a couple of mistakes. 
 + 
 +In order to prevent any updates, the system should be brought down to 
 +single-user mode: 
 +<code> 
 +  shutdown now 
 +</code> 
 + 
 +Once nothing is writing to ZFS, a second snapshot can be taken and 
 +transferred to the new pool.  The rollback is needed if tank2 has been 
 +altered since the previous 'zfs recv' (this includes atime updates). 
 + 
 +<code> 
 +  zfs snapshot -r tank@20101105bu 
 +  zfs rollback -R tank2@20101104bu 
 +  zfs send -R -I tank@20101104bu tank@20101105bu | zfs recv -vu -d tank2 
 +</code> 
 + 
 +The original pool is now renamed by exporting and importing it under a 
 +new name and then exporting it to umount it. 
 + 
 +<code> 
 +  zpool export tank 
 +  zpool import tank tanko 
 +  zpool export tanko 
 +</code> 
 + 
 +And the new pool is renamed to the wanted name via export/import. 
 + 
 +<code> 
 +  zpool export tank2 
 +  zpool import tank2 tank 
 +</code> 
 + 
 +The system can now be returned to multiuser mode and any required testing 
 +performed. 
 + 
 +<code> 
 +  exit 
 +</code> 
 + 
 +==== Replace vdevs ==== 
 + 
 +I didn't explicitly destory the old pool but just wiped the vdev labels 
 +as I reused the disks.  This gave me slightly more recovery scope as I 
 +could (in theory) recreate the old pool even after I'd reused the first 
 +disk (since it was RAIDZ1). 
 + 
 +Note that the resilver appears to be achieved by regenerating the 
 +disk contents from the remaining vdevs, rather than just copying the 
 +disk being replaced (though normal FS writes appear to be addressed 
 +to it). 
 + 
 +First disk: 
 +<code> 
 +dd if=/dev/zero of=/dev/ada0p5 count=1024 
 +dd if=/dev/zero of=/dev/ada0p5 seek=1936744000 
 +zpool replace tank da0p6 ada0p5 
 +</code> 
 +In my case, this took 7h20m to resilver 342G. 
 + 
 +Second disk: 
 +<code> 
 +dd if=/dev/zero of=/dev/ada1p5 count=1024 
 +dd if=/dev/zero of=/dev/ada1p5 seek=1936744000 
 +zpool replace tank da1p6 ada1p5 
 +</code> 
 +In my case, this took 4h53m to resilver 341G. 
 + 
 +Third disk: 
 +<code> 
 +dd if=/dev/zero of=/dev/ada2p5 count=1024 
 +dd if=/dev/zero of=/dev/ada2p5 seek=1936744000 
 +zpool replace tank ada3p6 ada2p5 
 +</code> 
 +In my case, this took 5h41m to resilver 342G.   
 + 
 +At this point the pool is spread across all 6 disks but it still limited 
 +to ~500GB per vdev. 
 + 
 +==== Expand pool ==== 
 + 
 +In order to expand the pool, the vdevs on the 3 new disks need to be 
 +resized.  It's not possible to expand the gpart partition so this also 
 +requires a (short) outage. 
 + 
 +For safety (to prevent ZFS confusion), the vdev metadata at the end of 
 +the temporary vdevs was destroyed, since this would otherwise appear 
 +at the end of the expanded vdevs. 
 + 
 +<code> 
 +dd if=/dev/zero of=/dev/da0p6 seek=968372000 
 +dd if=/dev/zero of=/dev/da1p6 seek=968372000 
 +dd if=/dev/zero of=/dev/ada3p6 seek=968372000 
 +</code> 
 + 
 +The system needs to be placed in single-user mode to allow the partitions 
 +and pool to be manipulated: 
 + 
 +<code> 
 +  shutdown now 
 +</code> 
 + 
 +Once in single-user mode, all 3 partition 6's can be deleted and the 
 +partition 5's expanded (by deleting them and recreating them with the 
 +larger size): 
 + 
 +<code> 
 +zpool export tank 
 +for i in da0 da1 ada3; do 
 +  gpart delete -i 6 $i 
 +  gpart delete -i 5 $i 
 +  gpart add -b 16777344 -i 5 -t freebsd-zfs -s 1936747791 $i 
 +done 
 +zpool import tank 
 +</code> 
 + 
 +The pool has now expanded to 4TB: 
 +<code> 
 +zpool list 
 +NAME    SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT 
 +back2  1.81T  1.57T   243G    86%  ONLINE 
 +tank   5.41T  2.02T  3.39T    37%  ONLINE 
 +</code> 
 + 
 +And the system can be restarted: 
 +<code> 
 +  exit 
 +</code> 
 + 
 +Remember to add the new disks to (eg) daily_status_smart_devices
  
zfsraid.txt · Last modified: 2015/04/24 08:17 by peterjeremy
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki