Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
ulevs4bsd [2013/06/29 06:36] – Insert corrected graphs peterjeremyulevs4bsd [2013/06/30 11:53] (current) – Interim comments peterjeremy
Line 32: Line 32:
 {{xcpu_1.png|Xeon CPU Time with 1KiB Working Set}} {{xcpu_1.png|Xeon CPU Time with 1KiB Working Set}}
  
-shows +is far less clean and shows the impact of hyperthreading, rather than real cores, with CPU time for >10 processes stabilising at nearly twice the CPU time for a single process.  The reason for the wide distribution of times for 2-6 processes is unclear but, at least for the 4BSD scheduler, is probably due incorrect allocation of processes to hardware threads.
  
 {{wall_1.png|V890 Elapsed Time with 1KiB Working Set}} {{wall_1.png|V890 Elapsed Time with 1KiB Working Set}}
-{{xwall_1.png|Xeon Elapsed Time with 1KiB Working Set}} 
  
 shows that both schedulers are well-behaved until there are more processes than cores. shows that both schedulers are well-behaved until there are more processes than cores.
-Once there are more processes than cores, 4BSD remains well behaved, whilst ULE has significant jumps in wallclock time - ie the same set of tasks take longer to run with ULE.  The following graph shows this more obviously.+Once there are more processes than cores, 4BSD remains well behaved, whilst ULE has significant jumps in wallclock time - ie the same set of tasks take longer to run with ULE.  The scheduler efficiency graph below shows this more clearly. 
 + 
 +{{xwall_1.png|Xeon Elapsed Time with 1KiB Working Set}} 
 + 
 +Again, between 2 and 7 processes, the Xeon is not well-behaved.
  
 {{eff_1.png|V890 Scheduler efficiency with 1KiB Working set}} {{eff_1.png|V890 Scheduler efficiency with 1KiB Working set}}
-{{yeff_1.png|Xeon Scheduler efficiency with 1KiB Working set}} 
  
 This graph shows scheduler efficiency as a ratio of wallclock time to CPU time. This graph shows scheduler efficiency as a ratio of wallclock time to CPU time.
 A perfect scheduler would have a ratio of 1 and both schedulers do a good job up to 16 processes. A perfect scheduler would have a ratio of 1 and both schedulers do a good job up to 16 processes.
 Beyond this, 4BSD has a small jump whilst ULE has a significant jump - forming roughly a sawtooth that peaks at about 18 processes and slopes back to 1 at 32 processes before jumping up again and sloping back to 1 at 48 processes. Beyond this, 4BSD has a small jump whilst ULE has a significant jump - forming roughly a sawtooth that peaks at about 18 processes and slopes back to 1 at 32 processes before jumping up again and sloping back to 1 at 48 processes.
-This suggests that ULE does not do a good job of timesharing where there are more CPU-bound processes than cores - instead it preferentially schedules already running processes.+ 
 +{{yeff_1.png|Xeon Scheduler efficiency with 1KiB Working set}} 
 + 
 +As with the V890 results, there is a sawtooth pattern with a number-of-threads periodicity, though it is far less pronounced than the V890.  On the downside, neither scheduler does a good job between 2 and 7 processes. 
 + 
 +Both these results suggest that ULE does not do a good job of timesharing where there are more CPU-bound processes than cores - instead it preferentially schedules already running processes.
  
 ==== 4MiB Working Set ==== ==== 4MiB Working Set ====
Line 53: Line 60:
  
 {{cpu_4.png|V890 CPU Time with 4MiB Working Set}} {{cpu_4.png|V890 CPU Time with 4MiB Working Set}}
-{{xcpu_4.png|Xeon CPU Time with 4MiB Working Set}} 
  
 shows that both schedulers behave similarly for less than 4 processes and between 10 and 16 processes. shows that both schedulers behave similarly for less than 4 processes and between 10 and 16 processes.
Line 59: Line 65:
 Between 17 and about 40 proceses, ULE uses significantly less CPU than ULE. Between 17 and about 40 proceses, ULE uses significantly less CPU than ULE.
 Beyond about 48 processes, 4BSD again takes the lead. Beyond about 48 processes, 4BSD again takes the lead.
 +
 +{{xcpu_4.png|Xeon CPU Time with 4MiB Working Set}}
  
  
ulevs4bsd.txt · Last modified: 2013/06/30 11:53 by peterjeremy
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki