How TLB misses are handled (part 2)

18 May, 2010 § 1 Comment

This blog post was co-authored by Brendan Grebur and Jared Wein.

As a follow up from the previous post on TLB misses, I’d like to cover a special case of TLB misses.

At boot time there is a “chicken-and-egg” dilemma where the TLB is empty yet TLB values are needed immediately.

How does a computer handle this? First to make some assumptions:

  • Linux Kernel
  • Hardware-managed TLB

The x86 chips boot up in Real Mode with a very limited memory space and the MMU disabled. The Linux kernel is uncompressed and loaded into low-memory by the boot loader. Assembly code initializes a page directory for the initial kernel process, sets the CR3 register, then enables the PG bit in CR0 to effectively enable the MMU and begin addressing in Protected Mode. Since this area is kernel memory, the virtual address will be identical to the physical address, as kernel memory is never swapped out. The init process begins running C code and making memory references to initialize the rest of the kernel. TLB misses occur, but resolve themselves as the MMU walks the page directory previously set up for the kernel process.

Sources:

  1. http://lkml.indiana.edu/hypermail/linux/kernel/0811.2/01602.html
  2. http://beaversource.oregonstate.edu/projects/cspfl/browser/trunk/software/u-boot-omap3/board/sbc8560/tlb.c?rev=376
  3. http://tldp.org/LDP/tlk/tlk.html
  4. http://lxr.linux.no/linux-old+v2.0.33/arch/i386/kernel/head.S#L286

How TLB misses are handled

17 May, 2010 § Leave a comment

This blog post was co-authored by Jared Wein and Brendan Grebur.

Have you ever wondered how a TLB (Translation Lookaside Buffer) miss is handled? Probably not, but in case you have, or are curious just what I’m talking about, keep on reading.

First, let’s assume that we have a physically-addressed Level 1 cache.

Assume that the desired portion of the page table is resident in memory, but it is not in the cache. For simplicity assume that there is only a L1 cache, no L2.

The dilemma is that to get a new TLB entry one needs correct values in the TLB, but a TLB fault appears to mean that they are not there.

Further assumptions:

  • A Hardware-controlled TLB.

  • A x86(32-bit) machine using 4kB memory pages.

  • The CR3 register on the x86 chip will be loaded with the General Page Directory physical address for the current running process.

  • A General Page Directory (GPD) contains 1024, 4-byte entries of physical addresses to Internal Page Tables (IPT). The Internal Page Tables themselves consist of 1024, 4-byte entries, which contain the physical page number.

  • This two-level Page Table scheme translates:

  • The upper 10 bits (31-22) in a Virtual Address (VA) as an offset into the General Page Directory.

  • The next 10 bits (21-12) are translated as an offset into the Internal Page Table pointed at by the General Page Directory’s entry.

  • The entry in the Internal Page Table contains the physical page number the VA refers to and the lower 12 bits (11-0) serve as the byte offset into this physical page.

When a virtual memory address is referenced by an instruction, for whom a valid page table entry does not exist in the TLB, a fault occurs. The Memory Management Unit (MMU) now bypasses the TLB and attempts to read the address contained in the CR3 register, with the offset contained in the first 10 bits of the faulting VA. The faulting instruction now becomes stalled as the L1 cache is checked for the particular GPD entry. If the location is not found, the L1 cache loads the address from DRAM. Once the GPD entry is read, the MMU checks the valid bit on the entry. Since we can assume these tables are resident in memory, the address contained within the entry is now requested by the MMU, bypassing the TLB again, after the offset from bits 21-12 of the VA have been applied. Again, if this data is not already in the L1 cache, the data must be loaded from DRAM causing further delay. Contained within the IPT entry is the physical page number the VA refers to. Since a new PTE has been found, the TLB must be updated with the value. Assuming the TLB is full, the MMU uses an NRU (Not Recently Used) algorithm to replace the victim PTE with the new PTE contained in the IPT entry. However, before the victim PTE is discarded it must be checked for a set dirty bit. If set, the corresponding PTE must be copied back to main memory.

As the correct PTE has been loaded into the TLB, the faulting instruction is restarted, resulting in a TLB hit.

Sources:

Where Am I?

You are currently browsing entries tagged with cache at JAWS.

Follow

Get every new post delivered to your Inbox.

Join 100 other followers