POPPUR爱换

标题: 通用计算对决，目测N炮们又有新活了，这次不是AMD了哦 [打印本页]

作者: Clockwork橘 时间: 2012-8-6 10:15
标题: 通用计算对决，目测N炮们又有新活了，这次不是AMD了哦

Intel在6月份的ISC超级计算机大会上正式推出了Xeon Phi品牌，它的设计思路与已经取消的Larabee有几分相似，集成了数十个X86架构核心，可以把它视为Intel自己的的通用计算架构。

　　Xeon Phi的实际产品要等到今年Q4季度问世，而同期还有NVIDIA先前公布过的71亿晶体管巨兽Tesla K20，后者使用GK110架构，与现有GK104专注游戏不同，GK110架构重点强化了通用计算能力，双精度浮点性能是Fermi架构的三倍之多。

　　Xeon Phi与Tesla K20免不了有一场对决，Intel在CPU上有NVIDIA无法企及的优势，不过在通用计算上性能要落后NVIDIA一拍，首款Xeon Phi产品有多少胜算呢，3DCenter综合了VR-Zone以及自己先前的报道将二者做了一番预估对比。

[attach]1975551[/attach]

注：Xeon Phi的频率是1.05-1.1GHz。上图中因为Google翻译导致此数据混乱了。

　　Xeon Phi基于通用的X86，确切地说是X64架构，将使用22nm 3D晶体管工艺，单精度：双精度效能为2：1，2MB L1缓存，31MB L2缓存，512bit GDDR5显存位宽，初期产品频率为1.05-1.1GHz，显存频率在5-5.5GHz（等效）左右，带宽350.7GB/s。

　　Xeon Phi使用了Intel的MIC多核架构，A0步进有48、52及60核心版本，显存频率约为2.4-4.5GHz，B0步进则有57、60及61个核心，显存频率也提高到了5-5.5GHz。

　　我们之前介绍过Tesla K20的架构和技术特性，它使用GK110架构，TSMC 28nm工艺，2880个CUDA核心，单精度：双精度效率为3：1，2MB L2缓存，384bit GDDR5显存，预计频率在850MHz，显存频率至少有6GHz（等效），带宽288GB/s。

　　单从性能上来看，K20的单精度、双精度分别是4.9TFLOPS、1.6TFLOPS，而Xeon Phi是2.2TFLOPS、1.1TFLOPS，处于下风，带宽上则略有优势。

　　Intel的Xeon Phi还在不断改进中，现在才是B0步进，还有时间进一步提升频率以增强双精度运算能力，毕竟Intel在制程工艺的优势是TSMC比不了的。

　　另外，3DCenter只分析了硬件上的规格（虽然都只是预估数值），但是软件环境上才是关键，Phi基于传统的X86架构，编程环境上有优势，再加上Intel雄厚的技术实力和金钱攻势，拉拢软件厂商的能力肯定要比NVIDIA强，最终的对决就让时间来说话吧。
[attach]1975552[/attach]

作者: CNBETA 时间: 2012-8-6 10:16
牛逼哥已经发过了，你骗分骗晚了

作者: NG6 时间: 2012-8-6 10:33
张的跟显卡是的

作者: gzpony 时间: 2012-8-6 10:44
这货一出来就落后了，没正式对决资格啦

作者: soloparadise 时间: 2012-8-6 10:57
暂时还不是威胁。但是就Intel那砸钱的态度来看，以后危险。

作者: 水星思路 时间: 2012-8-6 11:04
这东西的主要竞争力在于 x86 核心的可编程性比 GPU 要好得多。我想 Intel 应该会支持现有的工具链，让没闲心学 CUDA/OpenCL 的人更容易移植现有代码到加速器上。

GPU 能进超算的原因就是便宜。Intel 也愿意做便宜的加速器那当然好，虽然我很怀疑 Intel 是不是真的愿意做加速器。Titan 用了区区 18000 块 GK110，更何况 Titan 这种程度的超级计算机一年才能有几个。如果 NV 光卖 Tesla 那根本连本都回不来。Xeon Phi 又不可能卖成消费级 GPU，我很难想像 Intel 怎么能回本。

作者: devildush 时间: 2012-8-6 11:19
上礼拜就转过了吧

作者: NORAWITHMYCALL 时间: 2012-8-6 12:30
本帖最后由 NORAWITHMYCALL 于 2012-8-6 12:32 编辑

水星思路发表于 2012-8-6 11:04
这东西的主要竞争力在于 x86 核心的可编程性比 GPU 要好得多。我想 Intel 应该会支持现有的工具链，让没闲心 ...

这就涉及公司规模和盈利能力的问题了，INTEL可以亏得起这部分钱，而NV如果亏了这部分钱对其来说是会陷入很不利的财务问题。拿MS的XBOX和SONY的PS3来说，XBOX就2代短短数年间就达到SONY耕耘了十几年才拥有市场地位，这就是财大气粗的最大好处，我可以亏的起，而你却不行。这也就是当初SEGA被迫离开TV GAMER界，将大的市场份额被SONY取代，技术固然重要，但是技术需要资金来保证，这样才能得到进一步发展乃至量产应用，没有资金的保障，技术强有什么用，上升到国家层面拿前苏联来说，是当初仅次于US的科技强国，但是由于其经济制度设定的问题，导致其国内储备大量外流，又被迫走上US设定的军备怪圈（这需要大量的资金投入），最后在里应外合的共同作用下彻底垮了，其优秀科技人才也随之衰弱大量流失，举个例子来说，前苏联计算机领域的重镇-列别捷夫精密仪器与计算机工程研究所就流失了很多优秀人才，其中一部分就去了美国，这些人可能影响了INTEL的日后发展。

作者: 水星思路 时间: 2012-8-6 12:42

NORAWITHMYCALL 发表于 2012-8-6 12:30
这就涉及公司规模和盈利能力的问题了，INTEL可以亏得起这部分钱，而NV如果亏了这部分钱对其来说是会陷入 ...

没错，正如我所说的如果 NV 只卖 Tesla 那肯定不能回本。但是 NV 不做超算了也不会亏本，因为 NV 卖的本来就是游戏卡和专业卡。Intel 专门研发一个 Xeon Phi 的成本不会比研发一款 GPU 低，但是 Xeon Phi 只能用来做超算，怎样卖超算每年的市场也就只有那么大，要回本一代产品就要卖好多年。而 NV 随时可以降低身价甚至暂时退出市场都不会对 NV 的核心业务有太大影响。

这就像是 Xeon。Intel 只卖 Xeon 一定亏钱，但是同样的芯片可以卖给商界卖给消费市场，这样就可以赚钱。Xeon Phi 如果不是政府拨款支持我很难想像它要怎么撑下去。

作者: wwwz251 时间: 2012-8-6 13:03
老黄的Tesla能分摊到民用GPU，intel的Xeon Phi能分摊到民用CPU上吗？

作者: Windyson 时间: 2012-8-6 19:15
windows直接识别成48CPU

作者: gzpony 时间: 2012-8-6 22:24

水星思路发表于 2012-8-6 11:04
这东西的主要竞争力在于 x86 核心的可编程性比 GPU 要好得多。我想 Intel 应该会支持现有的工具链，让没闲心 ...

怎么又来了，之前有人转帖的类似内容的贴根本没看？
在那贴的回帖中，有人找到了intel的英文官方说明，写得清清楚楚，不兼容目前所有的工具！

其实有点技术基础的人也能理解，这货根本就是个GPU，走的PCIE界面。原有提供给CPU的工具怎么可能直接可以用上？而且GPU的并行编程和CPU根本就两码事，intel真能实现cpu工具直接能用在GPU上，那还真是打下飞碟了。还有说win下面识别成48个cpu的，太有想象力了，不过不可能。

这货出来性能就落后，软件支持环境目前没有看到，还没有成功应用案例，就算intel乱砸钱，目前都没有资格对tesla叫板！

作者: 水星思路 时间: 2012-8-6 23:19

gzpony 发表于 2012-8-6 22:24
怎么又来了，之前有人转帖的类似内容的贴根本没看？
在那贴的回帖中，有人找到了intel的英文官方说明，写 ...

把那个英文说明找出来。另外你可以看看这个：
http://software.intel.com/en-us/blogs/2012/06/05/knights-corner-open-source-software-stack/

"... the open source software stack consists of an embedded Linux, a minimally modified GCC, plus driver software. There is a package for GDB available separately as well."

"Using GCC to build an application for Knights Corner will most often result in low performance code due its current inability to vectorize for the new Knights Corner vector instructions. Future changes to give full usage of Knights Corner vector instructions would require work on the GCC vectorizer to utilize those instructions’ masking capabilities. This is something that requires a broader discussion in the GCC community than simply changing the code generator."

http://software.intel.com/en-us/blogs/2012/06/05/knights-corner-micro-architecture-support/

"These high-performance, familiar and popular tools simply support Knights Corner as another target without requiring new tools or separate product purchases."

作者: winson_surewin 时间: 2012-8-6 23:24
Intel如果能加入，我们玩家也可以有多个选择了

不过就怕intel太强，把nv和amd都搞死了

作者: CNBETA 时间: 2012-8-6 23:46

winson_surewin 发表于 2012-8-6 23:24
Intel如果能加入，我们玩家也可以有多个选择了

不过就怕intel太强，把nv和amd都搞死了

这你就错了，AMD天下无敌，绝对不会死的。

作者: winson_surewin 时间: 2012-8-6 23:51

CNBETA 发表于 2012-8-6 23:46
这你就错了，AMD天下无敌，绝对不会死的。

那就放心了，nv还有你们，那就更不会死了

作者: CNBETA 时间: 2012-8-6 23:58
本帖最后由 CNBETA 于 2012-8-6 23:59 编辑

winson_surewin 发表于 2012-8-6 23:51
那就放心了，nv还有你们，那就更不会死了

哪里哪里，AMD有你这种人，逆袭INTEL是迟早的事。一年干翻INTEL，三年干翻IBM，AMD逆袭之路。

作者: winson_surewin 时间: 2012-8-7 00:05
本帖最后由 winson_surewin 于 2012-8-7 00:05 编辑

CNBETA 发表于 2012-8-6 23:58
哪里哪里，AMD有你这种人，逆袭INTEL是迟早的事。一年干翻INTEL，三年干翻IBM，AMD逆袭之路。

nv有你这种高性能枪炮，上踢intel，下踩amd，说不定哪天，把微软和ibm都搞了

作者: CNBETA 时间: 2012-8-7 00:08
本帖最后由 CNBETA 于 2012-8-7 00:10 编辑

winson_surewin 发表于 2012-8-7 00:05
nv有你这种高性能枪炮，上踢intel，下踩amd，说不定哪天，把微软和ibm都搞了

AMD有你这种高性能枪炮，上踢IBM，下踩INTEL，说不定哪天，把谷歌和苹果都搞了

定会冲出太阳系，走向宇宙，统一宇宙指日可待，天顶星人都用AMD的显卡 CPU 芯片组内存

作者: winson_surewin 时间: 2012-8-7 00:10

CNBETA 发表于 2012-8-7 00:08
AMD有你这种高性能枪炮，上踢IBM，下踩INTEL，说不定哪天，把谷歌和苹果都搞了

统一宇宙指 ...

真没创意

作者: CNBETA 时间: 2012-8-7 00:11
本帖最后由 CNBETA 于 2012-8-7 00:12 编辑

winson_surewin 发表于 2012-8-7 00:10
真没创意

AMD的枪最有创意了。

AMD无敌了

壮哉我大农企

作者: winson_surewin 时间: 2012-8-7 00:14

CNBETA 发表于 2012-8-7 00:11
AMD的枪最有创意了。

AMD无敌了

恶人总是要先告状的

跟马甲小号，俺没什么好说的，惹不起，就躲~

作者: CNBETA 时间: 2012-8-7 00:22

winson_surewin 发表于 2012-8-7 00:14
恶人总是要先告状的

跟马甲小号，俺没什么好说的，惹不起，就躲~

哎呦，谦虚啥？再陪我水个300回合？

作者: winson_surewin 时间: 2012-8-7 00:25

CNBETA 发表于 2012-8-7 00:22
哎呦，谦虚啥？再陪我水个300回合？

俺不是拿钱，白天还要上班赚钱吃饭，哪有精力和时间跟职业的耗

还有拿大号跟马甲耗，我虽不聪明，可也不至于那么盲

作者: kinno 时间: 2012-8-7 01:19
intel出这玩意应该是作为配套产品的，单独卖这玩意很难赚钱，主要是配合他的整套系统的吧

作者: qwased 时间: 2012-8-7 07:31

水星思路发表于 2012-8-6 23:19
把那个英文说明找出来。另外你可以看看这个：
http://software.intel.com/en-us/blogs/2012/06/05/knigh ...

Knights Corner: Highly programmable while being optimized for power efficiency and highly parallel workloads.

The MIC architecture is specifically designed to provide the programmability of an SMP system while optimizing for power efficiency and highly parallel workloads. Knights Corner is the first product to use this architecture and deliver on this exciting vision. It’s an SMP on-a-chip, with the following key improvements:

With Knights Corner, we retain the programmability of an SMP system while optimizing for power efficiency and highly parallel workloads. It’s an SMP on-a-chip, with the following key improvements:

We have scale and power benefits from using a small, simple micro-architecture. We extended a Intel® Pentium® processor design with 64-bit support and a few additional instructions (like CPUID) which are documented in Appendix B of the “Knights Corner Instruction Set Reference Manual.”
For high degrees of data parallelism, we added the Knights Corner vector capability not found in any other Intel processor.
We connect the many cores using a high performance on-chip interconnect design.
Knights Corner is designed to be a coprocessor that lives on the PCIe bus, and requires a host processor to boot the system.
This combination of Linux, 64-bits, and new vector capabilities with an Intel® Pentium® processor-derived core, means that Knights Corner is not completely binary compatible with any previous Intel processor. Because of its unique nature, you’ll see statements like this in our code: “Disclaimer: The codes contained in these modules may be specific to the Intel® Software Development Platform codenamed: Knights Ferry, and the Intel® product codenamed: Knights Corner, and are not backward compatible with other Intel® products. Additionally, Intel® makes no commitments for support of the code or instruction set in future products.” This notice speaks to low level details and affects tool vendors primarily. Tools by Intel, GCC and other vendors will support Knights Corner by following the Instruction Set Architecture (ISA) and Application Binary Interface (ABI) documents so most developers can program at a completely portable level in their applications.
http://software.intel.com/en-us/blogs/2012/06/05/knights-corner-micro-architecture-support/

http://www.brightsideofnews.com/news/2012/7/13/xeon-phi-lacks-binary-compatibility2c-breaks-amd64-conventions.aspx

To make matters worse, Intel breaks quite a few conventions related to AMD64. The x87 FPU has been deprecated by both AMD and Intel in favor of SSE2. At one point during the development of the 64-bit kernel of Windows XP, Microsoft even wanted to throw it out altogether, but kept for backwards compatibility reasons (still considered deprecated, meaning Microsoft doesn't recommend using it). AMD and Intel encourage developers to use SSE (and now also AVX) instruction set extensions wherever possible. AMD mentions that in their software optimization guides for the K8 and up (

Software Optimization Guide for AMD64 Processors

page 83). Intel stresses on numerous occasions in chapter 5 and 6 of their “

Intel® 64 and IA-32 Architectures Optimization Reference Manual

” to prefer SSE and AVX over x87 whenever possible. Intel even states in the

release notes of their compiler

on page 13 "All Intel® 64 architecture processors support Intel® SSE2." Larrabee invalidates that statement.

Tailoring an application for Larrabee is not really less work than going the GPGPU route with CUDA or OpenCL. In any case, the code needs to be tailored for the specific hardware (even though OpenCL promises to be platform agnostic, in practice you need to optimize for specific architectures to get meaningful performance). Ignoring performance characteristics and availability, these technologies are roughly on equal grounds. If we factor in that CUDA and to a lesser degree OpenCL have a head start on the market, things start to look dire for Intel.

http://blogs.nvidia.com/2012/04/no-free-lunch-for-intel-mic-or-gpus/

Native Mode Complications

Functionally, a simple recompile may work, but I’m convinced it’s not practical for most HPC applications and doesn’t reflect the approach most people will need to take to get good performance on their MIC systems.

“A simple recompile may work, but
I’m convinced it’s not practical for
most HPC applications and doesn’t
reflect the approach most people will
need to take to get good performance
on their MIC systems.”

The idea of running flat MPI code (one rank per core) on a multi-node MIC system seems quite problematic. Whatever memory sits on the MIC PCIe card will be shared by more than 50 cores, leading to very small memory per core. From what I know of the MPI communication stack, that won’t leave much memory for the actual data – certainly far below the traditional 1-2 GB/core most HPC apps want. And 50+ cores all trying to send messages through the system interconnect NIC seems like a recipe for a network logjam. The other concern is the Amdahl’s Law bottleneck resulting from executing all the per-rank serial code on a lower-performance, Pentium-class scalar core.

The OpenMP approach seems only slightly better. You’d still have the very small per-core memory and the Amdahl’s Law bottleneck, but at least you’d have fewer threads trying to send messages out the NIC. Perhaps the biggest issue with this approach is that existing OpenMP codes, written for multi-core CPUs, are unlikely to have enough parallelism exposed to profitably occupy over 50 vector cores.

No “Magic” Compiler

The reality is that there is no such thing as a “magic” compiler that will automatically parallelize your code. No future processor or system (from Intel, NVIDIA, or anyone else) is going to relieve today’s programmers from the hard work of preparing their applications for the future.

With clock rates stalled, all future performance increases must come from increased parallelism, and power constraints will actually cause us to use simpler processors at lower clock rates for the majority of our work, further exacerbating this issue.

“The reality is that there is no such
thing as a “magic” compiler that will
automatically parallelize your code.”

At the high end, an exaflop computer running at about 1 GHz will require approximately one billion-way parallelism, but the same logic will drive up required parallelism at all system scales. This means that all HPC codes will need to be cast as throughput problems with massive numbers of parallel threads. Exploiting locality will also become ever more important as the relative cost of data movement versus computation continues to rise. This will have a significant impact on the algorithms and data layouts used to solve many science problems, and is a fundamental issue not tied to any one processor architecture.

作者: darkangel308 时间: 2012-8-7 09:28
GPU与CPU竞争通用计算最大的劣势不是计算能力，而是效率，如果intel这玩意儿的效率能达到普通CPU的水平，那还是很有前途的

作者: 水星思路 时间: 2012-8-7 10:05

qwased 发表于 2012-8-7 07:31
Knights Corner: Highly programmable while being optimized for power efficiency and highly parallel ...

很好。我想说的就是 x86 原生代码可以在 MIC 上跑。性能上不说，至少做系统调用和 C 库调用一类会方便很多，而且更不要提已经支持 x86 的众多编程语言。没有编译器还没有 pthread 么？

作者: gzpony 时间: 2012-8-7 16:56

水星思路发表于 2012-8-7 10:05
很好。我想说的就是 x86 原生代码可以在 MIC 上跑。性能上不说，至少做系统调用和 C 库调用一类会方便很多 ...

“The reality is that there is no such
thing as a “magic” compiler that will
automatically parallelize your code.”

作者: nom8393 时间: 2012-8-7 18:11

水星思路发表于 2012-8-6 11:04
这东西的主要竞争力在于 x86 核心的可编程性比 GPU 要好得多。我想 Intel 应该会支持现有的工具链，让没闲心 ...

敢问你到底了解过并行计算开发接口（比如OpenCL）么？在用OpenCL开发式的一个常识就是把底层指令集和并行计算架构给区分开来的。所以你说得什么X86可编程性好纯粹是无稽之谈。

作者: 水星思路 时间: 2012-8-8 10:16

nom8393 发表于 2012-8-7 18:11
敢问你到底了解过并行计算开发接口（比如OpenCL）么？在用OpenCL开发式的一个常识就是把底层指令集和并行 ...

放屁。你给我来个 memcpy printf sbrk 在 CUDA 上看看。有文件系统么？有中断么？能 fork 新线程么？

作者: NORAWITHMYCALL 时间: 2012-8-8 12:15

winson_surewin 发表于 2012-8-6 23:24
Intel如果能加入，我们玩家也可以有多个选择了

不过就怕intel太强，把nv和amd都搞死了

目前这领域和普通消费者没有任何关系。如果云计算得以大的发展的话，到时倒会间接体会到。

作者: gzpony 时间: 2012-8-8 14:08

水星思路发表于 2012-8-8 10:16
放屁。你给我来个 memcpy printf sbrk 在 CUDA 上看看。有文件系统么？有中断么？能 fork 新线程么？

http://we.pcinlife.com/thread-1958498-1-1.html

作者: 水星思路 时间: 2012-8-8 15:33

gzpony 发表于 2012-8-7 16:56
“The reality is that there is no such
thing as a “magic” compiler that will
automatically par ...

性能啊性能，性能不是 MIC 的主场，功能性才是啊兄弟！任何一个 GPU 编译器都不能编译任何一个系统调用也不能用任何 x86 已经存在的库，这些才是 MIC 的优势啊兄弟！

作者: 水星思路 时间: 2012-8-8 15:37

gzpony 发表于 2012-8-8 14:08
http://we.pcinlife.com/thread-1958498-1-1.html

你不能让某些胆小鬼迷住眼睛了啊，我也是 N 饭不过 CUDA 是个什么东西我是知道的。

我要说 MIC 会失败一定是因为市场上的原因，而不是技术上的问题。MIC 出了一定会有超算厂家买，但是超算市场就那么大，算上商用和民用市场也没有那么高的需求，这和 GPU 虽然编程模型还不好但可以商民通吃是不一样的。

作者: nom8393 时间: 2012-8-8 16:44

水星思路发表于 2012-8-8 10:16
放屁。你给我来个 memcpy printf sbrk 在 CUDA 上看看。有文件系统么？有中断么？能 fork 新线程么？

你先把“通用计算”四个字什么意思搞懂再出来回帖，否则就是纯粹丢人。

作者: 水星思路 时间: 2012-8-9 00:15

nom8393 发表于 2012-8-8 16:44
你先把“通用计算”四个字什么意思搞懂再出来回帖，否则就是纯粹丢人。

你在一个做过 CUDA 的人面前问这种问题。

欢迎光临 POPPUR爱换 (https://we.poppur.com/)