Differences

This shows you the differences between two versions of the page.

--- ulevs4bsd [2013/06/28 21:42] – Temporarily wipe images peterjeremy
+++ ulevs4bsd [2013/06/30 11:53] (current) – Interim comments peterjeremy
@@ Line 27: / Line 27: @@
 {{cpu_1.png|V890 CPU Time with 1KiB Working Set}}
-{{xcpu_1.png|Xeon CPU Time with 1KiB Working Set}}
 shows that ULE is very slightly more efficient than 4BSD and (pleasingly) that the amount of CPU time taken to perform a task is independent of the number of active processes for either scheduler.
+{{xcpu_1.png|Xeon CPU Time with 1KiB Working Set}}
+is far less clean and shows the impact of hyperthreading, rather than real cores, with CPU time for >10 processes stabilising at nearly twice the CPU time for a single process.  The reason for the wide distribution of times for 2-6 processes is unclear but, at least for the 4BSD scheduler, is probably due incorrect allocation of processes to hardware threads.
 {{wall_1.png|V890 Elapsed Time with 1KiB Working Set}}
-{{xwall_1.png|Xeon Elapsed Time with 1KiB Working Set}}
 shows that both schedulers are well-behaved until there are more processes than cores.
-Once there are more processes than cores, 4BSD remains well behaved, whilst ULE has significant jumps in wallclock time - ie the same set of tasks take longer to run with ULE.  The following graph shows this more obviously.
+Once there are more processes than cores, 4BSD remains well behaved, whilst ULE has significant jumps in wallclock time - ie the same set of tasks take longer to run with ULE.  The scheduler efficiency graph below shows this more clearly.
+{{xwall_1.png|Xeon Elapsed Time with 1KiB Working Set}}
+Again, between 2 and 7 processes, the Xeon is not well-behaved.
 {{eff_1.png|V890 Scheduler efficiency with 1KiB Working set}}
-{{eff_1.png|Xeon Scheduler efficiency with 1KiB Working set}}
 This graph shows scheduler efficiency as a ratio of wallclock time to CPU time.
 A perfect scheduler would have a ratio of 1 and both schedulers do a good job up to 16 processes.
 Beyond this, 4BSD has a small jump whilst ULE has a significant jump - forming roughly a sawtooth that peaks at about 18 processes and slopes back to 1 at 32 processes before jumping up again and sloping back to 1 at 48 processes.
-This suggests that ULE does not do a good job of timesharing where there are more CPU-bound processes than cores - instead it preferentially schedules already running processes.
+{{yeff_1.png|Xeon Scheduler efficiency with 1KiB Working set}}
+As with the V890 results, there is a sawtooth pattern with a number-of-threads periodicity, though it is far less pronounced than the V890.  On the downside, neither scheduler does a good job between 2 and 7 processes.
+Both these results suggest that ULE does not do a good job of timesharing where there are more CPU-bound processes than cores - instead it preferentially schedules already running processes.
 ==== 4MiB Working Set ====
@@ Line 50: / Line 60: @@
 {{cpu_4.png|V890 CPU Time with 4MiB Working Set}}
-{{xcpu_4.png|Xeon CPU Time with 4MiB Working Set}}
 shows that both schedulers behave similarly for less than 4 processes and between 10 and 16 processes.
@@ Line 56: / Line 65: @@
 Between 17 and about 40 proceses, ULE uses significantly less CPU than ULE.
 Beyond about 48 processes, 4BSD again takes the lead.
+{{xcpu_4.png|Xeon CPU Time with 4MiB Working Set}}
 {{wall_4.png|V890 Elapsed Time with 4MiB Working Set}}
@@ Line 64: / Line 77: @@
 {{eff_4.png|V890 Scheduler efficiency with 4MiB Working set}}
-{{eff_4.png|Xeon Scheduler efficiency with 4MiB Working set}}
+{{yeff_4.png|Xeon Scheduler efficiency with 4MiB Working set}}
 The 4BSD scheduler maintains a fairly constant effeciency, with only a slight bump between 4 and 12 processes.
@@ Line 86: / Line 99: @@
 {{eff_32.png|V890 Scheduler efficiency with 32MiB Working set}}
-{{eff_32.png|Xeon Scheduler efficiency with 32MiB Working set}}
+{{yeff_32.png|Xeon Scheduler efficiency with 32MiB Working set}}
 The 4BSD scheduler maintains a fairly constant effeciency, with a slight bump between about 16 and 32 processes.
@@ Line 105: / Line 118: @@
 {{xwall.png|Xeon Elapsed Times}}
-{{eff.png|Xeon Scheduler efficiencies}}
+{{yeff.png|Xeon Scheduler efficiencies}}l0