This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
ulevs4bsd [2013/06/28 21:42] – Temporarily wipe images peterjeremy | ulevs4bsd [2013/06/30 11:53] (current) – Interim comments peterjeremy | ||
---|---|---|---|
Line 27: | Line 27: | ||
{{cpu_1.png|V890 CPU Time with 1KiB Working Set}} | {{cpu_1.png|V890 CPU Time with 1KiB Working Set}} | ||
- | {{xcpu_1.png|Xeon CPU Time with 1KiB Working Set}} | ||
shows that ULE is very slightly more efficient than 4BSD and (pleasingly) that the amount of CPU time taken to perform a task is independent of the number of active processes for either scheduler. | shows that ULE is very slightly more efficient than 4BSD and (pleasingly) that the amount of CPU time taken to perform a task is independent of the number of active processes for either scheduler. | ||
+ | |||
+ | {{xcpu_1.png|Xeon CPU Time with 1KiB Working Set}} | ||
+ | |||
+ | is far less clean and shows the impact of hyperthreading, | ||
{{wall_1.png|V890 Elapsed Time with 1KiB Working Set}} | {{wall_1.png|V890 Elapsed Time with 1KiB Working Set}} | ||
- | {{xwall_1.png|Xeon Elapsed Time with 1KiB Working Set}} | ||
shows that both schedulers are well-behaved until there are more processes than cores. | shows that both schedulers are well-behaved until there are more processes than cores. | ||
- | Once there are more processes than cores, 4BSD remains well behaved, whilst ULE has significant jumps in wallclock time - ie the same set of tasks take longer to run with ULE. The following | + | Once there are more processes than cores, 4BSD remains well behaved, whilst ULE has significant jumps in wallclock time - ie the same set of tasks take longer to run with ULE. The scheduler efficiency |
+ | |||
+ | {{xwall_1.png|Xeon Elapsed Time with 1KiB Working Set}} | ||
+ | |||
+ | Again, between 2 and 7 processes, the Xeon is not well-behaved. | ||
{{eff_1.png|V890 Scheduler efficiency with 1KiB Working set}} | {{eff_1.png|V890 Scheduler efficiency with 1KiB Working set}} | ||
- | {{eff_1.png|Xeon Scheduler efficiency with 1KiB Working set}} | ||
This graph shows scheduler efficiency as a ratio of wallclock time to CPU time. | This graph shows scheduler efficiency as a ratio of wallclock time to CPU time. | ||
A perfect scheduler would have a ratio of 1 and both schedulers do a good job up to 16 processes. | A perfect scheduler would have a ratio of 1 and both schedulers do a good job up to 16 processes. | ||
Beyond this, 4BSD has a small jump whilst ULE has a significant jump - forming roughly a sawtooth that peaks at about 18 processes and slopes back to 1 at 32 processes before jumping up again and sloping back to 1 at 48 processes. | Beyond this, 4BSD has a small jump whilst ULE has a significant jump - forming roughly a sawtooth that peaks at about 18 processes and slopes back to 1 at 32 processes before jumping up again and sloping back to 1 at 48 processes. | ||
- | This suggests | + | |
+ | {{yeff_1.png|Xeon Scheduler efficiency with 1KiB Working set}} | ||
+ | |||
+ | As with the V890 results, there is a sawtooth pattern with a number-of-threads periodicity, | ||
+ | |||
+ | Both these results suggest | ||
==== 4MiB Working Set ==== | ==== 4MiB Working Set ==== | ||
Line 50: | Line 60: | ||
{{cpu_4.png|V890 CPU Time with 4MiB Working Set}} | {{cpu_4.png|V890 CPU Time with 4MiB Working Set}} | ||
- | {{xcpu_4.png|Xeon CPU Time with 4MiB Working Set}} | ||
shows that both schedulers behave similarly for less than 4 processes and between 10 and 16 processes. | shows that both schedulers behave similarly for less than 4 processes and between 10 and 16 processes. | ||
Line 56: | Line 65: | ||
Between 17 and about 40 proceses, ULE uses significantly less CPU than ULE. | Between 17 and about 40 proceses, ULE uses significantly less CPU than ULE. | ||
Beyond about 48 processes, 4BSD again takes the lead. | Beyond about 48 processes, 4BSD again takes the lead. | ||
+ | |||
+ | {{xcpu_4.png|Xeon CPU Time with 4MiB Working Set}} | ||
+ | |||
+ | |||
{{wall_4.png|V890 Elapsed Time with 4MiB Working Set}} | {{wall_4.png|V890 Elapsed Time with 4MiB Working Set}} | ||
Line 64: | Line 77: | ||
{{eff_4.png|V890 Scheduler efficiency with 4MiB Working set}} | {{eff_4.png|V890 Scheduler efficiency with 4MiB Working set}} | ||
- | {{eff_4.png|Xeon Scheduler efficiency with 4MiB Working set}} | + | {{yeff_4.png|Xeon Scheduler efficiency with 4MiB Working set}} |
The 4BSD scheduler maintains a fairly constant effeciency, with only a slight bump between 4 and 12 processes. | The 4BSD scheduler maintains a fairly constant effeciency, with only a slight bump between 4 and 12 processes. | ||
Line 86: | Line 99: | ||
{{eff_32.png|V890 Scheduler efficiency with 32MiB Working set}} | {{eff_32.png|V890 Scheduler efficiency with 32MiB Working set}} | ||
- | {{eff_32.png|Xeon Scheduler efficiency with 32MiB Working set}} | + | {{yeff_32.png|Xeon Scheduler efficiency with 32MiB Working set}} |
The 4BSD scheduler maintains a fairly constant effeciency, with a slight bump between about 16 and 32 processes. | The 4BSD scheduler maintains a fairly constant effeciency, with a slight bump between about 16 and 32 processes. | ||
Line 105: | Line 118: | ||
{{xwall.png|Xeon Elapsed Times}} | {{xwall.png|Xeon Elapsed Times}} | ||
- | {{eff.png|Xeon Scheduler efficiencies}} | + | {{yeff.png|Xeon Scheduler efficiencies}}l0 |