Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
zfsraid [2010/11/10 20:30] – Few more words peterjeremyzfsraid [2014/12/13 00:36] – Fix type-o's, wrap code. chuzzwassa
Line 1: Line 1:
-===== Expanding a ZFS RAID System =====+====== Expanding a ZFS RAID System ======
  
 This page documents my experiences in converting a 3-way RAIDZ1 pool into a 6-way RAIDZ2 pool (adding 3 new disks to the 3 original disk), with minimal downtime.  The system boots off a UFS gmirror and this remains unchanged (though I have allocated equivalent space on the new disks to allow potential future conversion to ZFS). This page documents my experiences in converting a 3-way RAIDZ1 pool into a 6-way RAIDZ2 pool (adding 3 new disks to the 3 original disk), with minimal downtime.  The system boots off a UFS gmirror and this remains unchanged (though I have allocated equivalent space on the new disks to allow potential future conversion to ZFS).
  
 +In my case, I was expanding a pool named ''tank'', using partition ''5'' of disks ''ada0'', ''ada1'' and ''ada2''
 +by adding 3 new disks ''ada3'', ''da0'' and ''da1'' (Despite the names, the latter are SATA disks, attached to
 +a 3ware 9650SE-2LP since I'd run out of motherboard ports).
  
-==== Original Configuration ====+Note that this procedure will not defragment your pool and you should do a send|recv if possible. 
 + 
 +===== Original Configuration =====
  
 <code> <code>
Line 49: Line 54:
 </code> </code>
  
-==== Final Configuration ====+===== Final Configuration =====
  
 <code> <code>
Line 111: Line 116:
 </code> </code>
  
-==== Procedure ====+===== Procedure =====
  
-=== Partition up new disks ===+The overall process is: 
 +  - Create a 6-way RAIDZ2 across the 3 new disks (ie each disk provides two vdevs). 
 +  - Copy the existing pool onto the new disks. 
 +  - Switch the system to use the new 6-way pool. 
 +  - Destroy the original pool. 
 +  - Replace the second vdev in each disk with one of the original disks. 
 +  - Re-partition the new disks to expand the remaining vdev to occupy the now unused space. 
 + 
 +In detail: 
 + 
 +==== Partition up new disks ===
 + 
 +In my case, I have root and swap on the same disks, so I needed to 
 +carve out space for that. 
 +Even if you use the disks solely for ZFS, it’s probably a good idea to 
 +partition a couple of MB off the disks in case a replacement disk is 
 +slightly smaller. 
 +As shown above, the old disks have 5 partitions (boot, UFS root, UFS 
 +/var, swap and ZFS). 
 +My long-term plans are to switch to ZFS root, so I combined the space 
 +allocated to both UFS partitions into one, but skipped p4 so that the 
 +ZFS partition remained at p5 for consistency. 
 + 
 +Apart from the boot partition, all partitions are aligned on 8-sector 
 +boundaries to simplify possible future migration to 4KiB disks. 
 + 
 +Initially, split the ZFS partition into two equal pieces: 
 + 
 +<code> 
 +for i in da0 da1 ada3; do 
 +  gpart add -b 34 -s 94 -t freebsd-boot $i 
 +  gpart add -b 128 -s 10485760 -i 2 -t freebsd-zfs $i 
 +  gpart add -b 10485888 -s 6291456 -i 3 -t freebsd-swap $i 
 +  gpart add -b  16777344 -s 968373895 -i 5 -t freebsd-zfs $i 
 +  gpart add -b 985151239 -s 968373895 -i 6 -t freebsd-zfs $i 
 +  gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 $i 
 +done 
 +</code> 
 + 
 +At this stage, my disk layout is: 
 +<code> 
 +# gpart show 
 +=>        34  1953522988  ada0  GPT  (932G) 
 +          34          94      freebsd-boot  (47K) 
 +         128     6291456      freebsd-ufs  (3.0G) 
 +     6291584     6291456      freebsd-swap  (3.0G) 
 +    12583040     4194304      freebsd-ufs  (2.0G) 
 +    16777344  1936745678      freebsd-zfs  (924G) 
 + 
 +=>        34  1953522988  ada1  GPT  (932G) 
 +          34          94      freebsd-boot  (47K) 
 +         128     6291456      freebsd-ufs  (3.0G) 
 +     6291584     6291456      freebsd-swap  (3.0G) 
 +    12583040     4194304      freebsd-ufs  (2.0G) 
 +    16777344  1936745678      freebsd-zfs  (924G) 
 + 
 +=>        34  1953525101  ada2  GPT  (932G) 
 +          34          94      freebsd-boot  (47K) 
 +         128     6291456      freebsd-ufs  (3.0G) 
 +     6291584     6291456      freebsd-swap  (3.0G) 
 +    12583040     4194304      freebsd-ufs  (2.0G) 
 +    16777344  1936747791      freebsd-zfs  (924G) 
 + 
 +=>        34  1953525101  ada3  GPT  (932G) 
 +          34          94      freebsd-boot  (47K) 
 +         128    10485760      freebsd-zfs  (5.0G) 
 +    10485888     6291456      freebsd-swap  (3.0G) 
 +    16777344   968373895      freebsd-zfs  (462G) 
 +   985151239   968373895      freebsd-zfs  (462G) 
 +  1953525134                  - free -  (512B) 
 + 
 +=>        34  1953525101  da0  GPT  (932G) 
 +          34          94    1  freebsd-boot  (47K) 
 +         128    10485760    2  freebsd-zfs  (5.0G) 
 +    10485888     6291456    3  freebsd-swap  (3.0G) 
 +    16777344   968373895    5  freebsd-zfs  (462G) 
 +   985151239   968373895    6  freebsd-zfs  (462G) 
 +  1953525134                 - free -  (512B) 
 + 
 +=>        34  1953525101  da1  GPT  (932G) 
 +          34          94    1  freebsd-boot  (47K) 
 +         128    10485760    2  freebsd-zfs  (5.0G) 
 +    10485888     6291456    3  freebsd-swap  (3.0G) 
 +    16777344   968373895    5  freebsd-zfs  (462G) 
 +   985151239   968373895    6  freebsd-zfs  (462G) 
 +  1953525134                 - free -  (512B) 
 + 
 +</code> 
 + 
 +==== Create 6-way RAIDZ2 zpool ==== 
 + 
 +If the new disks have previously been used, particularly for ZFS, it's a 
 +good idea to zero out the first and last 512KiB or more - which is where 
 +ZFS stores its vdev labels. 
 + 
 +<code> 
 +for i in da0 da1 ada3; do 
 +  dd if=/dev/zero of=/dev/${i}p5 count=1024 
 +  dd if=/dev/zero of=/dev/${i}p5 seek=968372000 
 +  dd if=/dev/zero of=/dev/${i}p6 count=1024 
 +  dd if=/dev/zero of=/dev/${i}p6 seek=968372000 
 +done 
 +</code> 
 + 
 +I wanted my final pool configuration to have all the vdevs in alphabetical 
 +order, so I allocated the temporary vdevs first. 
 + 
 +<code> 
 +zpool create tank2 raidz2 da0p6 da1p6 ada3p6 ada3p5 da0p5 da1p5 
 +zpool list 
 +NAME    SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT 
 +tank   2.70T  2.34T   369G    86%  ONLINE 
 +tank2  2.70T   202K  2.70T     0%  ONLINE 
 +</code> 
 + 
 +==== Initial data copy to new pool ==== 
 + 
 +By using ZFS snapshots, I can transfer the majority of the pool contents 
 +over to the new pool without impacting normal system operation.  This 
 +significantly reduces the necessary system outage. 
 + 
 +I recommend the use of ports/misc/mbuffer or similar between the 
 +"send" and "recv" to improve throughput (I used my own equivalent 
 +tool but have left it out of the following commands). 
 + 
 +Note that the '-u' option to 'zfs recv' is crucial - otherwise the 
 +filesystems copied to 'tank2' will be automatically mounted.  Where 
 +any filesystems have 'mountpoint' specified, this would result in the 
 +'tank2' filesystem being mounted over the equivalent 'tank' filesystem. 
 + 
 +<code> 
 +zfs snapshot -r tank@20101104bu 
 +zfs send -R tank@20101104bu | zfs recv -vuF -d tank2 
 +</code> 
 + 
 +If you are paranoid, you can then do a scrub on 'tank2' This will not 
 +be especially quick because having multiple vdevs per physical disk 
 +causes additional seeking between vdevs. 
 + 
 +<code> 
 +  zpool scrub tank2 
 +</code> 
 + 
 +==== Switch to new pool ==== 
 + 
 +This step entails a system outage but should be relatively quick because 
 +the bulk of the data was copied in the previous step and this step just 
 +needs to copy changes since the snapshot was taken.  In my case, this 
 +took approximately 25 minutes but that included a second send/recv onto 
 +my external backup disk as well as a couple of mistakes. 
 + 
 +In order to prevent any updates, the system should be brought down to 
 +single-user mode: 
 +<code> 
 +  shutdown now 
 +</code> 
 + 
 +Once nothing is writing to ZFS, a second snapshot can be taken and 
 +transferred to the new pool.  The rollback is needed if tank2 has been 
 +altered since the previous 'zfs recv' (this includes atime updates). 
 + 
 +<code> 
 +  zfs snapshot -r tank@20101105bu 
 +  zfs rollback -R tank2@20101104bu 
 +  zfs send -R -I tank@20101104bu tank@20101105bu | zfs recv -vu -d tank2 
 +</code> 
 + 
 +The original pool is now renamed by exporting and importing it under a 
 +new name and then exporting it to umount it. 
 + 
 +<code> 
 +  zpool export tank 
 +  zpool import tank tanko 
 +  zpool export tanko 
 +</code> 
 + 
 +And the new pool is renamed to the wanted name via export/import. 
 + 
 +<code> 
 +  zpool export tank2 
 +  zpool import tank2 tank 
 +</code> 
 + 
 +The system can now be returned to multiuser mode and any required testing 
 +performed. 
 + 
 +<code> 
 +  exit 
 +</code> 
 + 
 +==== Replace vdevs ==== 
 + 
 +I didn't explicitly destory the old pool but just wiped the vdev labels 
 +as I reused the disks.  This gave me slightly more recovery scope as I 
 +could (in theory) recreate the old pool even after I'd reused the first 
 +disk (since it was RAIDZ1). 
 + 
 +Note that the resilver appears to be achieved by regenerating the 
 +disk contents from the remaining vdevs, rather than just copying the 
 +disk being replaced (though normal FS writes appear to be addressed 
 +to it). 
 + 
 +First disk: 
 +<code> 
 +dd if=/dev/zero of=/dev/ada0p5 count=1024 
 +dd if=/dev/zero of=/dev/ada0p5 seek=1936744000 
 +zpool replace tank da0p6 ada0p5 
 +</code> 
 +In my case, this took 7h20m to resilver 342G. 
 + 
 +Second disk: 
 +<code> 
 +dd if=/dev/zero of=/dev/ada1p5 count=1024 
 +dd if=/dev/zero of=/dev/ada1p5 seek=1936744000 
 +zpool replace tank da1p6 ada1p5 
 +</code> 
 +In my case, this took 4h53m to resilver 341G. 
 + 
 +Third disk: 
 +<code> 
 +dd if=/dev/zero of=/dev/ada2p5 count=1024 
 +dd if=/dev/zero of=/dev/ada2p5 seek=1936744000 
 +zpool replace tank ada3p6 ada2p5 
 +</code> 
 +In my case, this took 5h41m to resilver 342G.   
 + 
 +At this point the pool is spread across all 6 disks but it still limited 
 +to ~500GB per vdev. 
 + 
 +==== Expand pool ==== 
 + 
 +In order to expand the pool, the vdevs on the 3 new disks need to be 
 +resized.  It's not possible to expand the gpart partition so this also 
 +requires a (short) outage. 
 + 
 +For safety (to prevent ZFS confusion), the vdev metadata at the end of 
 +the temporary vdevs was destroyed, since this would otherwise appear 
 +at the end of the expanded vdevs. 
 + 
 +<code> 
 +dd if=/dev/zero of=/dev/da0p6 seek=968372000 
 +dd if=/dev/zero of=/dev/da1p6 seek=968372000 
 +dd if=/dev/zero of=/dev/ada3p6 seek=968372000 
 +</code> 
 + 
 +The system needs to be placed in single-user mode to allow the partitions 
 +and pool to be manipulated: 
 + 
 +<code> 
 +  shutdown now 
 +</code> 
 + 
 +Once in single-user mode, all 3 partition 6's can be deleted and the 
 +partition 5's expanded (by deleting them and recreating them with the 
 +larger size): 
 + 
 +<code> 
 +zpool export tank 
 +for i in da0 da1 ada3; do 
 +  gpart delete -i 6 $i 
 +  gpart delete -i 5 $i 
 +  gpart add -b 16777344 -i 5 -t freebsd-zfs -s 1936747791 $i 
 +done 
 +zpool import tank 
 +</code> 
 + 
 +The pool has now expanded to 4TB: 
 +<code> 
 +zpool list 
 +NAME    SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT 
 +back2  1.81T  1.57T   243G    86%  ONLINE 
 +tank   5.41T  2.02T  3.39T    37%  ONLINE 
 +</code> 
 + 
 +And the system can be restarted: 
 +<code> 
 +  exit 
 +</code>
  
-In my case, I have root and swap on the same disks, so I needed to carve out space for that.  Even if you use the disks solely for ZFS, it's probably a good idea to partition a couple of MB off the disks in case a replacement disk is slight smaller.+Remember to add the new disks to (eg) daily_status_smart_devices
  
zfsraid.txt · Last modified: 2015/04/24 08:17 by peterjeremy
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki