The question of combining software prefetching algorithms with cache coherency protocols is that different prefetching pragmas will create different cache hits and misses, depending on the execution order states that each data state has. Additionally, creating a simulation of a hardware’s computation of a program is difficult, which yields another barrier to our understanding with a steep learning curve: to understand the simulator’s code and inject additional prefetching subroutines and pragmas for the programs we simulate. If we are able to accomplish our standard goals, then we have an additional problem of developing a machine learning algorithm for the task. This problem is challenging because there are a lot of moving parts and software prefetching is a relatively unexplored area. The aspects of the learning algorithm that make it hard to parallelize is that we will have to do everything without interfering with our cache. In the ideal case, we would like one program to run in isolation and update based on the single instance run. Other aspects that make it difficult is that we have separate cases running, but also want it to update a shared resource. The only dependency is that to build the reach goal learning model, we need each run to finish with an updated execution time. There is a possibility that some runs take longer than others and can lead to interesting race cases. There is a relatively high communication ratio between each finished run, but as the test program grows, there should be more computation done. There will be possibly divergent execution but only in the installation of pragmas for the prefetching algorithms, as these pragmas will only be inserted depending on the cache coherence protocols.