%0 Conference Paper %B Proceedings of the 27th ACM International Conference on Supercomputing (ICS '13) %D 2013 %T Diagnosis and Optimization of Application Prefetching Performance %A Gabriel Marin %A Colin McCurdy %A Jeffrey Vetter %E Allen D. Malony %E Nemirovsky, Mario %E Midkiff, Sam %X Hardware prefetchers are effective at recognizing streaming memory access patterns and at moving data closer to the processing units to hide memory latency. However, hardware prefetchers can track only a limited number of data streams due to finite hardware resources. In this paper, we introduce the term streaming concurrency to characterize the number of parallel, logical data streams in an application. We present a simulation algorithm for understanding the streaming concurrency at any point in an application, and we show that this metric is a good predictor of the number of memory requests initiated by streaming prefetchers. Next, we try to understand the causes behind poor prefetching performance. We identified four prefetch unfriendly conditions and we show how to classify an application's memory references based on these conditions. We evaluated our analysis using the SPEC CPU2006 benchmark suite. We selected two benchmarks with unfavorable access patterns and transformed them to improve their prefetching effectiveness. Results show that making applications more prefetcher friendly can yield meaningful performance gains. %B Proceedings of the 27th ACM International Conference on Supercomputing (ICS '13) %I ACM Press %C Eugene, Oregon, USA %8 2013-06 %@ 9781450321303 %G eng %U http://dl.acm.org/citation.cfm?doid=2464996.2465014 %R 10.1145/2464996.2465014