通用计算对决，目测N炮们又有新活了，这次不是AMD了哦

CNBETA · 发表于 2012-8-7 00:11

本帖最后由 CNBETA 于 2012-8-7 00:12 编辑

winson_surewin 发表于 2012-8-7 00:10
真没创意

AMD的枪最有创意了。

AMD无敌了

壮哉我大农企

winson_surewin · 发表于 2012-8-7 00:14

CNBETA 发表于 2012-8-7 00:11
AMD的枪最有创意了。

AMD无敌了

恶人总是要先告状的

跟马甲小号，俺没什么好说的，惹不起，就躲~

CNBETA · 发表于 2012-8-7 00:22

winson_surewin 发表于 2012-8-7 00:14
恶人总是要先告状的

跟马甲小号，俺没什么好说的，惹不起，就躲~

哎呦，谦虚啥？再陪我水个300回合？

winson_surewin · 发表于 2012-8-7 00:25

CNBETA 发表于 2012-8-7 00:22
哎呦，谦虚啥？再陪我水个300回合？

俺不是拿钱，白天还要上班赚钱吃饭，哪有精力和时间跟职业的耗

还有拿大号跟马甲耗，我虽不聪明，可也不至于那么盲

kinno · 发表于 2012-8-7 01:19

intel出这玩意应该是作为配套产品的，单独卖这玩意很难赚钱，主要是配合他的整套系统的吧

qwased · 发表于 2012-8-7 07:31

水星思路发表于 2012-8-6 23:19
把那个英文说明找出来。另外你可以看看这个：
http://software.intel.com/en-us/blogs/2012/06/05/knigh ...

Knights Corner: Highly programmable while being optimized for power efficiency and highly parallel workloads.

The MIC architecture is specifically designed to provide the programmability of an SMP system while optimizing for power efficiency and highly parallel workloads. Knights Corner is the first product to use this architecture and deliver on this exciting vision. It’s an SMP on-a-chip, with the following key improvements:

With Knights Corner, we retain the programmability of an SMP system while optimizing for power efficiency and highly parallel workloads. It’s an SMP on-a-chip, with the following key improvements:

We have scale and power benefits from using a small, simple micro-architecture. We extended a Intel® Pentium® processor design with 64-bit support and a few additional instructions (like CPUID) which are documented in Appendix B of the “Knights Corner Instruction Set Reference Manual.”
For high degrees of data parallelism, we added the Knights Corner vector capability not found in any other Intel processor.
We connect the many cores using a high performance on-chip interconnect design.
Knights Corner is designed to be a coprocessor that lives on the PCIe bus, and requires a host processor to boot the system.
This combination of Linux, 64-bits, and new vector capabilities with an Intel® Pentium® processor-derived core, means that Knights Corner is not completely binary compatible with any previous Intel processor. Because of its unique nature, you’ll see statements like this in our code: “Disclaimer: The codes contained in these modules may be specific to the Intel® Software Development Platform codenamed: Knights Ferry, and the Intel® product codenamed: Knights Corner, and are not backward compatible with other Intel® products. Additionally, Intel® makes no commitments for support of the code or instruction set in future products.” This notice speaks to low level details and affects tool vendors primarily. Tools by Intel, GCC and other vendors will support Knights Corner by following the Instruction Set Architecture (ISA) and Application Binary Interface (ABI) documents so most developers can program at a completely portable level in their applications.
http://software.intel.com/en-us/blogs/2012/06/05/knights-corner-micro-architecture-support/

http://www.brightsideofnews.com/news/2012/7/13/xeon-phi-lacks-binary-compatibility2c-breaks-amd64-conventions.aspx

To make matters worse, Intel breaks quite a few conventions related to AMD64. The x87 FPU has been deprecated by both AMD and Intel in favor of SSE2. At one point during the development of the 64-bit kernel of Windows XP, Microsoft even wanted to throw it out altogether, but kept for backwards compatibility reasons (still considered deprecated, meaning Microsoft doesn't recommend using it). AMD and Intel encourage developers to use SSE (and now also AVX) instruction set extensions wherever possible. AMD mentions that in their software optimization guides for the K8 and up (

Software Optimization Guide for AMD64 Processors

page 83). Intel stresses on numerous occasions in chapter 5 and 6 of their “

Intel® 64 and IA-32 Architectures Optimization Reference Manual

” to prefer SSE and AVX over x87 whenever possible. Intel even states in the

release notes of their compiler

on page 13 "All Intel® 64 architecture processors support Intel® SSE2." Larrabee invalidates that statement.

Tailoring an application for Larrabee is not really less work than going the GPGPU route with CUDA or OpenCL. In any case, the code needs to be tailored for the specific hardware (even though OpenCL promises to be platform agnostic, in practice you need to optimize for specific architectures to get meaningful performance). Ignoring performance characteristics and availability, these technologies are roughly on equal grounds. If we factor in that CUDA and to a lesser degree OpenCL have a head start on the market, things start to look dire for Intel.

http://blogs.nvidia.com/2012/04/no-free-lunch-for-intel-mic-or-gpus/

Native Mode Complications

Functionally, a simple recompile may work, but I’m convinced it’s not practical for most HPC applications and doesn’t reflect the approach most people will need to take to get good performance on their MIC systems.

“A simple recompile may work, but
I’m convinced it’s not practical for
most HPC applications and doesn’t
reflect the approach most people will
need to take to get good performance
on their MIC systems.”

The idea of running flat MPI code (one rank per core) on a multi-node MIC system seems quite problematic. Whatever memory sits on the MIC PCIe card will be shared by more than 50 cores, leading to very small memory per core. From what I know of the MPI communication stack, that won’t leave much memory for the actual data – certainly far below the traditional 1-2 GB/core most HPC apps want. And 50+ cores all trying to send messages through the system interconnect NIC seems like a recipe for a network logjam. The other concern is the Amdahl’s Law bottleneck resulting from executing all the per-rank serial code on a lower-performance, Pentium-class scalar core.

The OpenMP approach seems only slightly better. You’d still have the very small per-core memory and the Amdahl’s Law bottleneck, but at least you’d have fewer threads trying to send messages out the NIC. Perhaps the biggest issue with this approach is that existing OpenMP codes, written for multi-core CPUs, are unlikely to have enough parallelism exposed to profitably occupy over 50 vector cores.

No “Magic” Compiler

The reality is that there is no such thing as a “magic” compiler that will automatically parallelize your code. No future processor or system (from Intel, NVIDIA, or anyone else) is going to relieve today’s programmers from the hard work of preparing their applications for the future.

With clock rates stalled, all future performance increases must come from increased parallelism, and power constraints will actually cause us to use simpler processors at lower clock rates for the majority of our work, further exacerbating this issue.

“The reality is that there is no such
thing as a “magic” compiler that will
automatically parallelize your code.”

At the high end, an exaflop computer running at about 1 GHz will require approximately one billion-way parallelism, but the same logic will drive up required parallelism at all system scales. This means that all HPC codes will need to be cast as throughput problems with massive numbers of parallel threads. Exploiting locality will also become ever more important as the relative cost of data movement versus computation continues to rise. This will have a significant impact on the algorithms and data layouts used to solve many science problems, and is a fundamental issue not tied to any one processor architecture.

darkangel308 · 发表于 2012-8-7 09:28

GPU与CPU竞争通用计算最大的劣势不是计算能力，而是效率，如果intel这玩意儿的效率能达到普通CPU的水平，那还是很有前途的

水星思路 · 发表于 2012-8-7 10:05

qwased 发表于 2012-8-7 07:31
Knights Corner: Highly programmable while being optimized for power efficiency and highly parallel ...

很好。我想说的就是 x86 原生代码可以在 MIC 上跑。性能上不说，至少做系统调用和 C 库调用一类会方便很多，而且更不要提已经支持 x86 的众多编程语言。没有编译器还没有 pthread 么？

gzpony · 发表于 2012-8-7 16:56

水星思路发表于 2012-8-7 10:05
很好。我想说的就是 x86 原生代码可以在 MIC 上跑。性能上不说，至少做系统调用和 C 库调用一类会方便很多 ...

“The reality is that there is no such
thing as a “magic” compiler that will
automatically parallelize your code.”

nom8393 · 发表于 2012-8-7 18:11

水星思路发表于 2012-8-6 11:04
这东西的主要竞争力在于 x86 核心的可编程性比 GPU 要好得多。我想 Intel 应该会支持现有的工具链，让没闲心 ...

敢问你到底了解过并行计算开发接口（比如OpenCL）么？在用OpenCL开发式的一个常识就是把底层指令集和并行计算架构给区分开来的。所以你说得什么X86可编程性好纯粹是无稽之谈。

水星思路 · 发表于 2012-8-8 10:16

nom8393 发表于 2012-8-7 18:11
敢问你到底了解过并行计算开发接口（比如OpenCL）么？在用OpenCL开发式的一个常识就是把底层指令集和并行 ...

放屁。你给我来个 memcpy printf sbrk 在 CUDA 上看看。有文件系统么？有中断么？能 fork 新线程么？

NORAWITHMYCALL · 发表于 2012-8-8 12:15

winson_surewin 发表于 2012-8-6 23:24
Intel如果能加入，我们玩家也可以有多个选择了

不过就怕intel太强，把nv和amd都搞死了

目前这领域和普通消费者没有任何关系。如果云计算得以大的发展的话，到时倒会间接体会到。

gzpony · 发表于 2012-8-8 14:08

水星思路发表于 2012-8-8 10:16
放屁。你给我来个 memcpy printf sbrk 在 CUDA 上看看。有文件系统么？有中断么？能 fork 新线程么？

http://we.pcinlife.com/thread-1958498-1-1.html

水星思路 · 发表于 2012-8-8 15:33

gzpony 发表于 2012-8-7 16:56
“The reality is that there is no such
thing as a “magic” compiler that will
automatically par ...

性能啊性能，性能不是 MIC 的主场，功能性才是啊兄弟！任何一个 GPU 编译器都不能编译任何一个系统调用也不能用任何 x86 已经存在的库，这些才是 MIC 的优势啊兄弟！

水星思路 · 发表于 2012-8-8 15:37

gzpony 发表于 2012-8-8 14:08
http://we.pcinlife.com/thread-1958498-1-1.html

你不能让某些胆小鬼迷住眼睛了啊，我也是 N 饭不过 CUDA 是个什么东西我是知道的。

我要说 MIC 会失败一定是因为市场上的原因，而不是技术上的问题。MIC 出了一定会有超算厂家买，但是超算市场就那么大，算上商用和民用市场也没有那么高的需求，这和 GPU 虽然编程模型还不好但可以商民通吃是不一样的。

nom8393 · 发表于 2012-8-8 16:44

水星思路发表于 2012-8-8 10:16
放屁。你给我来个 memcpy printf sbrk 在 CUDA 上看看。有文件系统么？有中断么？能 fork 新线程么？

你先把“通用计算”四个字什么意思搞懂再出来回帖，否则就是纯粹丢人。

水星思路 · 发表于 2012-8-9 00:15

nom8393 发表于 2012-8-8 16:44
你先把“通用计算”四个字什么意思搞懂再出来回帖，否则就是纯粹丢人。

你在一个做过 CUDA 的人面前问这种问题。

帐号		自动登录	找回密码
密码			注册

通用计算对决，目测N炮们又有新活了，这次不是AMD了哦

浏览过的版块