Managing Contention for Shared Resources on Multicore Processors
Contention for Resources in Multicore Systems
According to Fedorova et al. (2009), current multicore systems are intended to permit clusters of cores in sharing hardware parts, for instance, the memory controllers, last level caches, as well as the interconnects. Fedorova et al. (2009) refer to this as the memory domain. This is because of the fact that sharing resources in multicore systems are closely related to the Memory Ladder. The cores are able to share various resources, such as caches, memory controllers and interconnections. This ultimately leads to a contention for these resources and thus the dreadlocks and performance hindrances. This paper deals with an analysis of some of the contentions as discovered by Fedorova and the other two researchers as well as the techniques to deal with these contentions as suggested by the three researchers.
Findings from the Study
In their study they came up with a few discoveries useful in dealing with cache contention. After the examination of the contention by use of the three applications (i.e. Soplex, Sphinx and Namd), the researchers found out that applications that have a reputation of high miss rates should not be kept together. This means that it is unwise to co-schedule them together in any singular memory domain.
The researchers, however, found out that the contention for any shared cache was not the principal cause of performance lowering. Rival applications on a multicore system mainly engage in cache contention because of other reasons. It became apparent from their study that there is also contention for other resources, which are shared by applications operating in a multicore system. These resources include control of memory, prefetching of resources and as well as the frontside bus. Contention for these resources was found by Fedorova et al. (2009) to be the major degrader of performance in multicore systems. However, cache miss ups by threads were found to be better predictors for frontside bus, control of memory and prefetching of hardware. In order for other applications to generate contention, the researchers had decided to co-schedule them with Milc (another application). It could be seen that applications generating more cache miss ups occupied the memory controller, as well as the frontside bus (Gadde et al., 2001). This meant that it could hurt other adjacent applications using the same hardware as well as itself in the event that the hardware is misappropriated by the suffering applications. Additionally, those applications that tend to be more aggressive as far as hardware prefetching is concerned will tend to have higher LLC miss rates because requests for prefetching data that do not exist in the caches will also be counted as miss ups.
Contention for Cache
Contention for cache arises when threads (i.e. two or more) are made to operate on cores that share the same memory domain. They are predisposed to share the same last level caches (LLC). It is important to have a rough idea of a cache, hence as Fedorova et al. (2009) describe it as a collection of lines assigned to grasp the memory of various threads as they continue to call out for caches. When one thread makes a mistake in its request, which is calling out for a line that does not exist in the cache, new caches are necessitated to be allocated to this threads. If it happens that other caches are busy (in other words, being to hold data used by other threads), it calls for a different move and that is to free away some data so that there remains free cache to hold the data by the thread that got miss up (Tambat & Vajapeyam, 2002). The new data, therefore, finds a storage space.
In the past, a thread that is different from the one that made the miss up is the one that could create room for the thread with the miss up. Nonetheless, modern day central processing units do not give that exception. It means that the so-called aggressive threads can evict equally important thread information thus leading to a decline in performance. Despite the numerous studies by various researches on LLC conflicts, all of the current operating systems seem not to have given any attention to the findings of these studies. This is what led the researchers to the issue of scheduling. As of now, two schools of thought exist with regard to modeling the contention of caches.
The authors had one chief goal when they were evaluating fresh models that deal with cache contention. This was the efficiency of models in creating thread schedules that were contention free. They are aimed at using the new models in finding nice schedules taking precaution so as to evade the bad ones. The authors assessed the models based on the schedules they generated. They came up with schedulers that utilize pain metrics to come up with the best schedule. This scheduler predicted the pain for all the pairs (co-scheduled threads) after which it gave an average of the pain's values for each pair thus giving the resultant pain for the entire schedule. The schedule that had the lowermost pain was considered to be probably the best schedule so far (Fedorova et al., 2009). It is possible to obtain the preferable schedule either by use of the pain metrics created from the profiles of real memory re-uses or estimating the pain metrics with the facilitation of online information. Upon obtaining the estimated best schedule, these three researchers then embarked on comparing its capacity with that of a real best schedule.
The researchers found out that their new model (the pain model) was the best one as compared to the others since it aided the scheduler to locate the most appropriate thread assignment. The results from the pain model were also more accurate as they fell within one percent of the really best schedule. From the findings of these authors, it is not advisable to use a random schedule since it yields worse performances.Additionally, as the number of cores continues to increase, the performance increasingly becomes degraded. This information is becoming vital, especially in today's multicore systems, because the current trend is that of increasing the number of cores. If people fail to take into account the gravity of choosing the most efficient schedule, the performance of multicore systems might be hugely undermined (Zwick, 2008).
Contention Avoidance Techniques
After evaluating the findings from their study, the three researchers had come up with scheduler prototype that would be highly conscious of contention in the multicore systems. They called this new technique the Distributed Intensity Online (DIO). The supposed scheduler would dish out all of the intense (high LLC miss ups) applications in the whole of the memory domain. This is done after evaluating the amount of miss ups online. From their findings, it is possible, therefore, to deduce that memory contention for all shared structures can be highly reduced through the availability of numerous memory modules with a network of interconnection that has a common access through put equivalent to the number of memory modules that are shared (Nemirovsky & Tullsen, 2013).
Moreover, they also proposed a second prototype, known as the workload Power Distribute Intensity (Power DI) which would schedule applications with regard to undertaking various activities in the information centers. This second prototype would save power through defining the way in which systems are employed without degrading their performance or that of others.
Sharing of resources in multicore systems leads to the contention that has the potential of reducing the efficiency of the systems. Fedorova et al. (2009), through their study have managed to come up with ways of reducing these contentions by use of the scheduling procedures. Unlike past beliefs, performance lowering has been able to be attributed to a number of other resources sharing and not cache, as it was mainly thought. Memory interconnection and hardware prefetch have been found to be the major degraders nowadays. The LLC miss up rates have proved to be the best predictors for any contention. Processes that use this technique are likely to help in the maintenance of good performance and also in saving power.