|
我现在不写没测试的preview了,要写就等拿到实际的东西。
更多一点的就是:
Nehalem supports macrofusion in both 32-bit and 64-bit modes
改进的Loop Stream Detector(Nehalem 28 uops vs Merom 18 instr)
拥有"L2 Branch Predictor"
加入了L2 TLB:
1st Level Instruction TLBs
Small Page (4k): 128
Large Page (2M/4M): 7 per thread
1st Level Data TLBs
Small Page (4k): 64
Large Page (2M/4M): 32
New 2nd Level Unified TLB
Small Page Only: 512
"No reason to use aligned instructions on Nehalem!"
SMT Implementation Details
􀁹 Multiple policies possible for implementation of SMT
􀁹 Replicated – Duplicate state for SMT
- Register state
- Renamed RSB
- Large page ITLB
􀁹 Partitioned – Statically allocated between threads
- Key buffers: Load, store, Reorder
- Small page ITLB
􀁹 Competitively shared – Depends on thread’s dynamic behavior
- Reservation station
- Caches
- Data TLBs, 2nd level TLB
􀁹 Unaware
- Execution units |
|