How TLB misses are handled
17 May, 2010 §
This blog post was co-authored by Jared Wein and Brendan Grebur.
Have you ever wondered how a TLB (Translation Lookaside Buffer) miss is handled? Probably not, but in case you have, or are curious just what I’m talking about, keep on reading.
First, let’s assume that we have a physically-addressed Level 1 cache.
Assume that the desired portion of the page table is resident in memory, but it is not in the cache. For simplicity assume that there is only a L1 cache, no L2.
The dilemma is that to get a new TLB entry one needs correct values in the TLB, but a TLB fault appears to mean that they are not there.
A Hardware-controlled TLB.
A x86(32-bit) machine using 4kB memory pages.
The CR3 register on the x86 chip will be loaded with the General Page Directory physical address for the current running process.
A General Page Directory (GPD) contains 1024, 4-byte entries of physical addresses to Internal Page Tables (IPT). The Internal Page Tables themselves consist of 1024, 4-byte entries, which contain the physical page number.
This two-level Page Table scheme translates:
The upper 10 bits (31-22) in a Virtual Address (VA) as an offset into the General Page Directory.
The next 10 bits (21-12) are translated as an offset into the Internal Page Table pointed at by the General Page Directory’s entry.
The entry in the Internal Page Table contains the physical page number the VA refers to and the lower 12 bits (11-0) serve as the byte offset into this physical page.
When a virtual memory address is referenced by an instruction, for whom a valid page table entry does not exist in the TLB, a fault occurs. The Memory Management Unit (MMU) now bypasses the TLB and attempts to read the address contained in the CR3 register, with the offset contained in the first 10 bits of the faulting VA. The faulting instruction now becomes stalled as the L1 cache is checked for the particular GPD entry. If the location is not found, the L1 cache loads the address from DRAM. Once the GPD entry is read, the MMU checks the valid bit on the entry. Since we can assume these tables are resident in memory, the address contained within the entry is now requested by the MMU, bypassing the TLB again, after the offset from bits 21-12 of the VA have been applied. Again, if this data is not already in the L1 cache, the data must be loaded from DRAM causing further delay. Contained within the IPT entry is the physical page number the VA refers to. Since a new PTE has been found, the TLB must be updated with the value. Assuming the TLB is full, the MMU uses an NRU (Not Recently Used) algorithm to replace the victim PTE with the new PTE contained in the IPT entry. However, before the victim PTE is discarded it must be checked for a set dirty bit. If set, the corresponding PTE must be copied back to main memory.
As the correct PTE has been loaded into the TLB, the faulting instruction is restarted, resulting in a TLB hit.