How TLB misses are handled

17 May, 2010 § Leave a comment

This blog post was co-authored by Jared Wein and Brendan Grebur.

Have you ever wondered how a TLB (Translation Lookaside Buffer) miss is handled? Probably not, but in case you have, or are curious just what I’m talking about, keep on reading.

First, let’s assume that we have a physically-addressed Level 1 cache.

Assume that the desired portion of the page table is resident in memory, but it is not in the cache. For simplicity assume that there is only a L1 cache, no L2.

The dilemma is that to get a new TLB entry one needs correct values in the TLB, but a TLB fault appears to mean that they are not there.

Further assumptions:

  • A Hardware-controlled TLB.

  • A x86(32-bit) machine using 4kB memory pages.

  • The CR3 register on the x86 chip will be loaded with the General Page Directory physical address for the current running process.

  • A General Page Directory (GPD) contains 1024, 4-byte entries of physical addresses to Internal Page Tables (IPT). The Internal Page Tables themselves consist of 1024, 4-byte entries, which contain the physical page number.

  • This two-level Page Table scheme translates:

  • The upper 10 bits (31-22) in a Virtual Address (VA) as an offset into the General Page Directory.

  • The next 10 bits (21-12) are translated as an offset into the Internal Page Table pointed at by the General Page Directory’s entry.

  • The entry in the Internal Page Table contains the physical page number the VA refers to and the lower 12 bits (11-0) serve as the byte offset into this physical page.

When a virtual memory address is referenced by an instruction, for whom a valid page table entry does not exist in the TLB, a fault occurs. The Memory Management Unit (MMU) now bypasses the TLB and attempts to read the address contained in the CR3 register, with the offset contained in the first 10 bits of the faulting VA. The faulting instruction now becomes stalled as the L1 cache is checked for the particular GPD entry. If the location is not found, the L1 cache loads the address from DRAM. Once the GPD entry is read, the MMU checks the valid bit on the entry. Since we can assume these tables are resident in memory, the address contained within the entry is now requested by the MMU, bypassing the TLB again, after the offset from bits 21-12 of the VA have been applied. Again, if this data is not already in the L1 cache, the data must be loaded from DRAM causing further delay. Contained within the IPT entry is the physical page number the VA refers to. Since a new PTE has been found, the TLB must be updated with the value. Assuming the TLB is full, the MMU uses an NRU (Not Recently Used) algorithm to replace the victim PTE with the new PTE contained in the IPT entry. However, before the victim PTE is discarded it must be checked for a set dirty bit. If set, the corresponding PTE must be copied back to main memory.

As the correct PTE has been loaded into the TLB, the faulting instruction is restarted, resulting in a TLB hit.

Sources:

Tagged: ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

What’s this?

You are currently reading How TLB misses are handled at JAWS.

meta

Follow

Get every new post delivered to your Inbox.

Join 1,004 other followers