Differences

This shows you the differences between two versions of the page.

--- ulevs4bsd [2013/06/29 06:36] – Insert corrected graphs peterjeremy
+++ ulevs4bsd [2013/06/30 11:53] (current) – Interim comments peterjeremy
@@ Line 32: / Line 32: @@
 {{xcpu_1.png|Xeon CPU Time with 1KiB Working Set}}
-shows
+is far less clean and shows the impact of hyperthreading, rather than real cores, with CPU time for >10 processes stabilising at nearly twice the CPU time for a single process.  The reason for the wide distribution of times for 2-6 processes is unclear but, at least for the 4BSD scheduler, is probably due incorrect allocation of processes to hardware threads.
 {{wall_1.png|V890 Elapsed Time with 1KiB Working Set}}
-{{xwall_1.png|Xeon Elapsed Time with 1KiB Working Set}}
 shows that both schedulers are well-behaved until there are more processes than cores.
-Once there are more processes than cores, 4BSD remains well behaved, whilst ULE has significant jumps in wallclock time - ie the same set of tasks take longer to run with ULE.  The following graph shows this more obviously.
+Once there are more processes than cores, 4BSD remains well behaved, whilst ULE has significant jumps in wallclock time - ie the same set of tasks take longer to run with ULE.  The scheduler efficiency graph below shows this more clearly.
+{{xwall_1.png|Xeon Elapsed Time with 1KiB Working Set}}
+Again, between 2 and 7 processes, the Xeon is not well-behaved.
 {{eff_1.png|V890 Scheduler efficiency with 1KiB Working set}}
-{{yeff_1.png|Xeon Scheduler efficiency with 1KiB Working set}}
 This graph shows scheduler efficiency as a ratio of wallclock time to CPU time.
 A perfect scheduler would have a ratio of 1 and both schedulers do a good job up to 16 processes.
 Beyond this, 4BSD has a small jump whilst ULE has a significant jump - forming roughly a sawtooth that peaks at about 18 processes and slopes back to 1 at 32 processes before jumping up again and sloping back to 1 at 48 processes.
-This suggests that ULE does not do a good job of timesharing where there are more CPU-bound processes than cores - instead it preferentially schedules already running processes.
+{{yeff_1.png|Xeon Scheduler efficiency with 1KiB Working set}}
+As with the V890 results, there is a sawtooth pattern with a number-of-threads periodicity, though it is far less pronounced than the V890.  On the downside, neither scheduler does a good job between 2 and 7 processes.
+Both these results suggest that ULE does not do a good job of timesharing where there are more CPU-bound processes than cores - instead it preferentially schedules already running processes.
 ==== 4MiB Working Set ====
@@ Line 53: / Line 60: @@
 {{cpu_4.png|V890 CPU Time with 4MiB Working Set}}
-{{xcpu_4.png|Xeon CPU Time with 4MiB Working Set}}
 shows that both schedulers behave similarly for less than 4 processes and between 10 and 16 processes.
@@ Line 59: / Line 65: @@
 Between 17 and about 40 proceses, ULE uses significantly less CPU than ULE.
 Beyond about 48 processes, 4BSD again takes the lead.
+{{xcpu_4.png|Xeon CPU Time with 4MiB Working Set}}