This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revision | |||
ulevs4bsd [2013/06/29 06:36] – Insert corrected graphs peterjeremy | ulevs4bsd [2013/06/30 11:53] (current) – Interim comments peterjeremy | ||
---|---|---|---|
Line 32: | Line 32: | ||
{{xcpu_1.png|Xeon CPU Time with 1KiB Working Set}} | {{xcpu_1.png|Xeon CPU Time with 1KiB Working Set}} | ||
- | shows | + | is far less clean and shows the impact of hyperthreading, |
{{wall_1.png|V890 Elapsed Time with 1KiB Working Set}} | {{wall_1.png|V890 Elapsed Time with 1KiB Working Set}} | ||
- | {{xwall_1.png|Xeon Elapsed Time with 1KiB Working Set}} | ||
shows that both schedulers are well-behaved until there are more processes than cores. | shows that both schedulers are well-behaved until there are more processes than cores. | ||
- | Once there are more processes than cores, 4BSD remains well behaved, whilst ULE has significant jumps in wallclock time - ie the same set of tasks take longer to run with ULE. The following | + | Once there are more processes than cores, 4BSD remains well behaved, whilst ULE has significant jumps in wallclock time - ie the same set of tasks take longer to run with ULE. The scheduler efficiency |
+ | |||
+ | {{xwall_1.png|Xeon Elapsed Time with 1KiB Working Set}} | ||
+ | |||
+ | Again, between 2 and 7 processes, the Xeon is not well-behaved. | ||
{{eff_1.png|V890 Scheduler efficiency with 1KiB Working set}} | {{eff_1.png|V890 Scheduler efficiency with 1KiB Working set}} | ||
- | {{yeff_1.png|Xeon Scheduler efficiency with 1KiB Working set}} | ||
This graph shows scheduler efficiency as a ratio of wallclock time to CPU time. | This graph shows scheduler efficiency as a ratio of wallclock time to CPU time. | ||
A perfect scheduler would have a ratio of 1 and both schedulers do a good job up to 16 processes. | A perfect scheduler would have a ratio of 1 and both schedulers do a good job up to 16 processes. | ||
Beyond this, 4BSD has a small jump whilst ULE has a significant jump - forming roughly a sawtooth that peaks at about 18 processes and slopes back to 1 at 32 processes before jumping up again and sloping back to 1 at 48 processes. | Beyond this, 4BSD has a small jump whilst ULE has a significant jump - forming roughly a sawtooth that peaks at about 18 processes and slopes back to 1 at 32 processes before jumping up again and sloping back to 1 at 48 processes. | ||
- | This suggests | + | |
+ | {{yeff_1.png|Xeon Scheduler efficiency with 1KiB Working set}} | ||
+ | |||
+ | As with the V890 results, there is a sawtooth pattern with a number-of-threads periodicity, | ||
+ | |||
+ | Both these results suggest | ||
==== 4MiB Working Set ==== | ==== 4MiB Working Set ==== | ||
Line 53: | Line 60: | ||
{{cpu_4.png|V890 CPU Time with 4MiB Working Set}} | {{cpu_4.png|V890 CPU Time with 4MiB Working Set}} | ||
- | {{xcpu_4.png|Xeon CPU Time with 4MiB Working Set}} | ||
shows that both schedulers behave similarly for less than 4 processes and between 10 and 16 processes. | shows that both schedulers behave similarly for less than 4 processes and between 10 and 16 processes. | ||
Line 59: | Line 65: | ||
Between 17 and about 40 proceses, ULE uses significantly less CPU than ULE. | Between 17 and about 40 proceses, ULE uses significantly less CPU than ULE. | ||
Beyond about 48 processes, 4BSD again takes the lead. | Beyond about 48 processes, 4BSD again takes the lead. | ||
+ | |||
+ | {{xcpu_4.png|Xeon CPU Time with 4MiB Working Set}} | ||