Performance analysis is a one of the fundamental pillars of software systems. From a mobile phone UI's responsiveness to the simulations and modeling performed on supercomputers at national labs, efficiency plays a vital role in the wider adoption of modern software systems. Traditionally, performance of a system is examined from "hot spot" perspective, which highlights regions of high resource utilization without identifying whether resources were "well used". In this talk, I introduce an unconventional, yet effective, performance analysis approach, which instead of looking for high resource utilization it focuses on identifying high resource waste. This approach elevates a performance analysis tool from one that merely highlights the symptoms of problems to one that identifies the root causes of performance loss. One can use different metrics to evaluate under-utilized or poorly-utilized resources. In this talk, I will introduce two such metrics. First, DeadSpy --- a tool to pinpoint program inefficiencies arising out of dead memory write operations. And second, CPU-GPU-Blame-shifting to identify performance inefficiencies arising in heterogeneous multi-core architectures. The techniques proposed here open new avenues for performance tuning. Our experience with DeadSpy helped in improving the performance of several widely used, and hand-tuned applications; for example DeadSpy helped improve gcc and hmmer by as much as 28% and 40% respectively for certain workloads. Our on-going experience in building CPU-GPU-Blame-shifting is intended to prepare HPCToolkit for analyzing CPU-GPU hybrid applications on supercomputers equipped with GPU accelerators.