POPPUR爱换

标题: 万兆以太网传输速度实测 (结果: 单向9984Mbps, 双向19808Mbps) [打印本页]

作者: intel10k 时间: 2011-12-4 00:52
标题: 万兆以太网传输速度实测 (结果: 单向9984Mbps, 双向19808Mbps)
本帖最后由 intel10k 于 2011-12-20 20:25 编辑

随着价格的下降，目前万兆以太网正在逐渐步入寻常百姓家。万兆以太网的实际性能是人们感兴趣的一个话题。在理想情况下，通过万兆以太网是否能达到10000Mbps的传输速度？许多人对此心存疑虑，而这方面的评测文章目前网上也比较少见。最近手上拿到了两块万兆网卡，正好可以对此问题进行一番实际测试。

测试采用双机直连的方法，没有经过交换机。使用的网卡是两块intel的x520-t2 10Gbase-t万兆网卡，每块网卡拥有2个万兆端口，不过以下的测试只用到了其中的一个端口。两块网卡之间通过一条6类双绞线直连。

测试环境为Linux操作系统，内核版本是3.1.1。测试软件方面，为了达到最高的传输速率，采用的是直接socket编程的方法，在两台机器间通过tcp/ip协议发送/接收数据。实时流量则通过读取Linux系统的/proc/net/dev设备文件得到。这样得到的流量包含了tcp/ip包的头部数据，比程序中看到的要更准确一些。

测试前先用ethtool工具察看网卡的工作状态，确认都已处于10000Mb/s的状态：
root@pc2:~# ./ethtool eth2
Settings for eth2:
      Supported ports: [ TP ]
      Supported link modes: 1000baseT/Full
                              10000baseT/Full
      Supported pause frame use: No
      Supports auto-negotiation: Yes
      Advertised link modes:  1000baseT/Full
                              10000baseT/Full
      Advertised pause frame use: No
      Advertised auto-negotiation: Yes
      Speed: 10000Mb/s
      Duplex: Full
      Port: Twisted Pair
      PHYAD: 0
      Transceiver: external
      Auto-negotiation: on
      MDI-X: Unknown
      Supports Wake-on: d
      Wake-on: d
      Current message level: 0x00000007 (7)
                           drv probe link
      Link detected: yes

网卡参数方面，基本上均为系统默认值。所作的唯一优化是把网卡的mtu参数调大了一些，本测试中mtu的值设置为6000。

下面是一些具体的测试结果(这里1MB表示1000000字节)：

1. 单向单线程传输测试(从pc1向pc2单线程发送数据)
测试结果：pc1出流量的实测速度约为1036MB/s(涨落<2MB/s)。
这个传输速度相当于约8288Mbps，离万兆网络的理论带宽还有一些差距。

2. 单向双线程传输测试(从pc1向pc2同时用2个线程发送数据)
测试结果：pc1出流量的实测速度约为1245MB/s(涨落<1MB/s)。
这个传输速度(9960Mbps)很不错了，已经达到了万兆理论带宽的99%以上。
[11.12.05补充：经过一些优化后，最终实测稳定的极限速度为1248MB/s(9984Mbps)]

3. 单向多线程传输测试(从pc1向pc2同时用多个线程发送数据)
测试结果：传输速率基本没有什么提升，但涨落幅度变小，从1MB/s降低到0.1MB/s的量级。

4. 双向单线程传输测试(从pc1向pc2单线程发送数据，同时pc2向pc1单线程发送数据)
测试结果：pc1上出流量和入流量之和实测速度约为1700MB/s，涨落比较大，有100MB/s的量级。

5. 双向双线程传输测试(从pc1向pc2双线程发送数据，同时pc2向pc1双线程发送数据)
测试结果：pc1上出流量和入流量之和实测速度约为2330MB/s，涨落变小，只有10MB/s左右。

6. 双向多线程传输测试(从pc1向pc2多线程发送数据，同时pc2向pc1多线程发送数据)
测试结果：与前面单向多线程时不同，继续增加线程数传输速度会进一步缓慢提升。在双向10线程时传输速度之和达到2476MB/s(19808Mbps)，同样超过了理论带宽的99%，而涨落则下降到1MB/s左右。

从这些测试结果可以看到，不论是单向传输还是双向传输，万兆以太网的实际带宽都可以很容易地达到理论峰值99%以上。这个结果有点出乎事先的预料。测试结果证实了万兆以太网的理论带宽并非虚标，在实际程序应用中经过一些优化是完全可以达到的。

11.12.05补充：双端口并发测试结果见 http://we.pcinlife.com/thread-1796630-1-1.html

作者: zhdick 时间: 2011-12-4 01:03
万兆网卡的发热太猛了，要有风扇。什么时候能和现在板载千兆一样就好了。

千兆现在是完全不够用啊

作者: ilbwn 时间: 2011-12-4 01:50
价格是关键

作者: fenglu 时间: 2011-12-4 08:35

zhdick 发表于 2011-12-4 01:03
万兆网卡的发热太猛了，要有风扇。什么时候能和现在板载千兆一样就好了。

千兆现在是完全不够用啊

兄弟是啥应用哦，貌似一般的需求包括高清影音用千兆有线也足够了。

作者: andyrave 时间: 2011-12-4 17:39
数据流？

作者: gzdyhq 时间: 2011-12-4 19:18
虾米硬盘可以顶的住? ssd吗?

作者: clawhammer 时间: 2011-12-4 19:54
双绞线上的万兆还是算了，没啥实际意义

作者: myf94 时间: 2011-12-4 20:04
现在的机械盘要满足千M都成问题啊。

作者: wwwff 时间: 2011-12-4 20:28
家用，千兆够用了。
除非你没事，拷贝文件玩。。

作者: girlboy911 时间: 2011-12-4 21:09
价格是关键

作者: roamball 时间: 2011-12-4 22:30
楼主的测试和他的ID一样啊，10k.

作者: intel10k 时间: 2011-12-4 23:12
万兆网卡目前价格仍然不菲，不过比前几年已经有了很大下降，再过1-2年估计应该能降到合理程度了。至于性能上的追求是无止境的，网络速度超越硬盘的速度其实很正常，我觉得最好能达到内存的传输速度才好。

预告：明天继续上双万兆端口并发的测试结果。

作者: lucifersun 时间: 2011-12-5 07:53
单网卡没用，还要万兆交换机的价格跟着下来才行

作者: hermes 时间: 2011-12-5 11:19
单搞2块网卡啊直连互传只能做做秀
没有任何实用意义的。。。。。。要能在家搞起万兆组网才算具有实用价值
不过这种投入5年内普及基本是不可能的光纤组网的价格和易用性都超过这个

作者: aliguagua 时间: 2011-12-5 11:28
可以搞分布式运算么

作者: intel10k 时间: 2011-12-5 13:04

万兆交换机的价格也在下降，从前几年的一台几万美元到现在的2000-3000美元就可以拿到。当然，要想降到目前千兆交换机这样的价位暂时还不现实。不过如果机器数量比较少，其实不用交换机直接串起来也是勉强可以用的，这样成本就会更低一些。万兆网卡用于高性能计算显然是可以的，并且目前已经应用比较广泛了。

此外，利用多口千兆网卡端口汇聚也是一种廉价的万兆方案，目前二手的6口千兆网卡报价在几百-1000元，上4块就可以双机12000Mbps的理论互连速度了，这样连万兆交换机都可以省掉。不过线缆的数量会比较多。

最后对本测试结果作一补充：经过一些参数调整，单向传输速度极限又有微小提升，最后可以稳定在1248.0MB/s，相当于万兆理论带宽的99.84%。这是目前所见的最高纪录了。

作者: greney 时间: 2011-12-5 13:07
万兆的就是爽啊。

作者: glk17 时间: 2011-12-5 13:22
楼主，双口卡的话，再暴力一些，双万兆聚合单向2GB/s，双向4GB/s

作者: tcgg1983 时间: 2011-12-5 13:32
本帖最后由 tcgg1983 于 2011-12-5 13:33 编辑

看来硬盘成了微型计算机发展的最大制约了。。。家里10M光纤个人感觉100M都够我用了。。。我没没那么多数据需要天天从这台电脑折腾到那台电脑上面。没时间折腾也没时间看。。

作者: 飘然 时间: 2011-12-5 14:13
家用短时间千兆足够用很多年了。

作者: intel10k 时间: 2011-12-5 15:08
本帖最后由 intel10k 于 2011-12-5 15:09 编辑

回19#，双口并发的测试正在进行，从目前结果看效果不是太理想，远达不到单口速度的2倍。详细的测试结果过一会儿整理好了会发出来。

回20#，我觉得应该用发展的眼光来看，以前千兆网卡刚出现的时候当时主流硬盘的速度也才小几十MB/s的水平。硬盘技术也在同步发展，目前SSD读写几百MB/s已经很正常，实际上现在最顶级的SSD的读写速度已经远超万兆网络。（当然价格也非常昂贵）

作者: intel10k 时间: 2011-12-5 17:30
双端口并发的测试结果已发，见这个帖子：
http://we.pcinlife.com/thread-1795724-1-1.html

作者: killer123 时间: 2011-12-5 22:12
先把千兆普及了吧,可惜没有百来块的千兆无线路由

作者: stephenmaxmax 时间: 2011-12-6 00:55

lucifersun 发表于 2011-12-5 07:53
单网卡没用，还要万兆交换机的价格跟着下来才行

确实 ~

作者: darkool 时间: 2011-12-6 10:30
家里目前下行 50M 上行 20M够用了吧。。

作者: 飞羽 时间: 2011-12-9 12:22
展望下：将来，网卡都是光纤接口，然后我们拿着光纤线就是网线，想多快就多快，哈

作者: Romeo 时间: 2011-12-11 17:50
神器啊，光一块网卡就能买块顶级单芯片显卡

作者: aspireone 时间: 2011-12-11 19:03
依然认为双绞线好处多多，起码PoE这个功能光纤就不能，现在很多城市搞光进铜退，结果连个电话都得插电，万一家里停个电，连座机都打不了

作者: jaleofu 时间: 2011-12-14 14:33
提示: 作者被禁止或删除内容自动屏蔽

作者: intel10k 时间: 2011-12-14 17:43
本帖最后由 intel10k 于 2011-12-14 18:15 编辑

jaleofu 发表于 2011-12-14 14:33
99%是不可能的，10GE 使用64/66b编码，有效带宽96.96%

我们这里基于铜缆的10Gbase-T没有采用64/66b编码，能利用到的有效带宽是10Gbps。而10Gbase-LR, 10Gbase-SR等几种基于光纤的规范采用了64/66b的编码，同时把原始符号率提升到10.3125Gbps, 这样能利用到的有效带宽仍然是10.3125*64/66=10Gbps。

本文的测试结果应该是可信的。Redhat公司有技术人员在08年的时候做过和本文类似的测试，他们实测的结果是9888.66Mbps，和本文结果接近，不过他们没有测试双向传输的情况。参见：
http://www.redhat.com/promo/summit/2008/downloads/pdf/Thursday/Mark_Wagner.pdf

作者: intel10k 时间: 2011-12-15 12:53
10GBASE-T的物理层编码很复杂，IEEE标准中的描述也很晦涩，这里给大家简单介绍一下相关的原理：

10GBASE-T有4条数据通道, 每条为800M symbol/s的符号率, 而每个symbol有16级, 相当于4bit. 这样计算出来10GBASE-T物理层带宽就应该是4*800*4=12800Mbit/s.
(不知为何google上查不到这个物理层带宽值).

在发送数据时, 50个64bit数据编组(3200bit), 先采用64b/65b编码再加上1个字节CRC校验再加上1个额外的bit, 变成50*65+8+1=3259bit, 然后3259bit中的3*512bit直接传输, 剩下1723bit加上325bit的LDPC校验位, 总共是2048+1576=3624bits. 再经过一些编码, 最后组合成512个DSQ128 symbol(也就是1024个PAM16 symbol)发送.

这样总体的效果就是用1024个symbol发送了3200bit数据, 也就是说每个symbol发送的有效数据为3200/1024=3.125bit. 从而可以计算出数据层看到的带宽正好是4*800*3.125=10000Mbit/s.

以上描述不一定很准确，如有问题欢迎指正。

附IEEE 802.3an-2006中的相关描述:
The 10GBASE-T PHY employs full duplex baseband transmission over four pairs of balanced cabling. The aggregate data rate of 10 Gb/s is achieved by transmitting 2500 Mb/s in each direction simultaneously on each wire pair, as shown in Figure 55–2. Baseband 16-level PAM signaling with a modulation rate of 800 Megasymbol per second is used on each of the wire pairs. Ethernet data and control characters are encoded at a rate of 3.125 information bits per PAM16 symbol, along with auxiliary channel bits. Two consecutively transmitted PAM16 symbols are considered as one two-dimensional (2D) symbol. The 2D symbols are selected from a constrained constellation of 128 maximally spaced 2D symbols, called DSQ1287 (double square 128). After link startup, PHY frames consisting of 512 DSQ128 symbols are continuously transmitted. The DSQ128 symbols are determined by 7-bit labels, each comprising 3 uncoded bits and 4 LDPC-encoded bits. The 512 DSQ128 symbols of one PHY frame are transmitted as 4 × 256 PAM16 symbols over the four wire pairs. Data and Control symbols are embedded in a framing scheme that runs continuously after startup of the link. The modulation symbol rate of 800 Msymbols/s results in a symbol period of 1.25 ns.
The DSQ128 symbols are obtained by concatenating two time-adjacent 1D PAM16 symbols and retaining among the 256 possible Cartesian product combinations, 128 maximally spaced 2D symbols. The resulting checkerboard constellation is based on a lattice called RZ2 in the literature (see Forney [B28A]). DSQ constellations have previously been introduced under the name “AMPM” (see [B28C] for examples of 8 point and 32 point AMPM/DSQ constellations).

In the transmit direction, in normal mode, the PCS receives eight XGMII data octets provided by two consecutive transfers on the XGMII service interface on TXD<31:0> and groups them into 64-bit blocks with the 64-bit block boundaries aligned with the boundary of the two XGMII transfers. Each group of eight octets along with the data/control indications is transcoded into a 65-bit block. The resulting 65-bit blocks are scrambled and assembled in a group of 50 blocks. Adding CRC8 check bits yields a CRC-checked Ethernet payload of 50 × 65 + 8 = 3258 bits. An auxiliary channel bit is added to obtain a block of 3259 bits.
The 3259 bits are divided into 3 × 512 bits and 1723 bits. The 3 × 512 bits, among them the auxiliary channel bit, remain uncoded. The 1723 bits are encoded by a systematic LDPC(1723,2048) encoder, which adds 325 LDPC check bits to form an LDPC codeword of 2048 coded bits. The 3 × 512 uncoded bits and the 2048 = 4 × 512 coded bits are arranged in a frame of 512 7-bit labels. Each 7-bit label comprises 3 uncoded bits and 4 coded bits.
The 512 7-bit labels are mapped into 512 2D modulation symbols selected from a DSQ128 constellation. The DSQ128 symbols are obtained by concatenating two time-adjacent 1D PAM16 symbols and retaining
among the 256 possible Cartesian product combinations, 128 maximally spaced 2D symbols. The resulting checkerboard constellation is based on a lattice called RZ2 in the literature (see Forney [B28A]). The DSQ128 constellation is partitioned into 16 subsets, each subset containing 8 maximally spaced 2D symbols. The 4 coded bits of each 7-bit label select one DSQ128 subset, and the 3 uncoded bits of the label select one 2D symbol in this subset.
The obtained PHY frame of 512 DSQ128 symbols is passed on to the PMA as PMA_UNITDATA.request. The PMA transmits the DSQ128 symbols over the four wire pairs in the form of 256 constituent PAM16
symbols per pair. Details of the PCS function are covered in 55.3. In the receive direction, in normal mode, the PCS processes code-groups received from the remote PHY via the PMA in 256 4D symbol blocks and maps them to the XGMII service interface in the receive path. In this receive processing scheme, symbol clock synchronization is done by the PMA Receive function. The PCS functions and state diagrams are specified in 55.3. The signals provided by the PCS at the XGMII conform to the interface requirements of Clause 46. The interface to the PMA is an abstract message-passing interface specified in 55.2.

作者: qingyu 时间: 2011-12-16 01:11
唉，这个版面多久都见不到这么一个贴啊，，，谢谢楼主了。。。现在全是买宽带路由的贴。价格还都不想超过500.

欢迎光临 POPPUR爱换 (https://we.poppur.com/)