class: center, middle # Software Caching ## Alan Cox and Scott Rixner --- layout: true --- # Caching One of the core concepts in computer systems - Hardware CPU caches - Virtual memory - Buffer cache - Local files - Network files - Application-level caches - ... --- # Application-level Caching - Applications cache data for many reasons - Data was expensive to compute - Data must be retrieved over the network - ... - Memory is finite - Don't want to cache everything - If the cache is too large, the VM system will swap it out - Need to choose a suitable replacement policy --- # Replacement Policies - LRU (Least Recently Used) - Evict the least recently used item - FIFO (First In, First Out) - Evict the oldest item - Clock - FIFO with a "second chance" - Simpler approximation to LRU - Greedy Dual - Incorporate "cost" into the replacement policy - ... --- # Greedy Dual - A replacement policy that uses a cost function - The replacement cost of an item is a measure of how expensive it is to recompute/retrieve the item - Miss (that requires eviction) - Evict the item with the current lowest cost - Reduce the cost of all other items by the cost of the evicted item - Set the cost of the newly inserted item to its replacement cost - Hit - Reset the cost of the item to its replacement cost --- # Intuition - The initial cost of an item is a measure of how expensive it is to recompute/retrieve the item - Replace the item that is the cheapest to recompute/retrieve - Evict the lowest cost item - Don't replace items that are frequently accessed - *Increase* the cost of an item on a hit - Eventually replace expensive, unaccessed items - *Decrease* the cost of all remaining items on every miss --- # Bias - The cost of all items has to be updated on every eviction - Instead, we can use a *bias* (that starts at 0) to modify the costs - Miss (that requires eviction) - Evict the item with the current lowest cost - Set the bias to the cost of the evicted item - Set the cost of the newly inserted item to the bias plus its replacement cost - Hit - Set the cost of the item to the bias plus its replacement cost --- # Greedy Dual Size - Take the size of the item into account - Cost function becomes replacement cost / size - Instead of just replacement cost - This favors evicting larger items - Everything else remains the same - Perform the same actions on a hit/miss - Track the bias in the same way - Set the replacement cost to 1 for all items to maximize hit rate - Larger items reduce the number of items in the cache - Caching more smaller items increases hit rate --- # Algorithm - `i` is the item to be inserted - `C(x)` is the cost of item `x` in the cache - `r(x)` is the replacement cost of item `x` - `s(x)` is the size of item `x` - `B` is the bias ```plaintext - If i is already in the cache, set C(i) to B + r(i)/s(i). - If i is not in the cache, - While there is not enough room in the cache for i, - Let B be the minimum C(q) over all items q in the cache. - Evict q and set B to be C(q). - Bring i into the cache and set C(i) to be B + r(i)/s(i). ``` --- # Exercise - Modify a web server that caches recently accessed files/pages - Implement a greedy dual cache - Use the greedy dual size replacement policy - Use a replacement cost of 1 for all pages - Use a hash table to store the items in the cache - Use a circular, doubly-linked list to store the items in increasing order of cost - Use a bias to simplify the cost function --- class: center, middle # Get Started!