|
Larrabee 的 paper 出来了,其中有提到专门的指令和指令模式实现显式 cache 控制:
Larrabee also adds new instructions and instruction modes for
explicit cache control. Examples include instructions to prefetch
data into the L1 or L2 caches and instruction modes to reduce the
priority of a cache line. For example, streaming data typically
sweeps existing data out of a cache. Larrabee is able to mark each
streaming cache line for early eviction after it is accessed. These
cache control instructions also allow the L2 cache to be used
similarly to a scratchpad memory, while remaining fully coherent.
Within a single core, synchronizing access to shared memory by
multiple threads is inexpensive. The threads on a single core share
the same local L1 cache, so a single atomic semaphore read
within the L1 cache is sufficient. Synchronizing access between
multiple cores is more expensive, since it requires inter-processor
locks. This is a well known problem in multi-processor design.
Multi-issue CPU cores often lose performance due to the
difficulty of finding instructions that can execute together.
Larrabee’s dual-issue decoder has a high multi-issue rate in code
that we’ve tested. The pairing rules for the primary and secondary
instruction pipes are deterministic, which allows compilers to
perform offline analysis with a wider scope than a runtime out-oforder
instruction picker can. All instructions can issue on the
primary pipeline, which minimizes the combinatorial problems
for a compiler. The secondary pipeline can execute a large subset
of the scalar x86 instruction set, including loads, stores, simple
ALU operations, branches, cache manipulation instructions, and
vector stores. Because the secondary pipeline is relatively small
and cheap, the area and power wasted by failing to dual-issue on
every cycle is small. In our analysis, it is relatively easy for
compilers to schedule dual-issue instructions. |
|