Performance of Parallel.For()

方法定义

public static ParallelLoopResult For(
	int fromInclusive,
	int toExclusive,
	Action<int> body
)

测试(C#)

下面的测试是在一个四核CPU上进行的,按Release编译,通过结果可以看出,并行的效率明显高于串行

static void Main(string[] args)
{
    long total = 1_000_000_000;
 
    using (var tc = new TickCounter()) {
        long sum = 0;
        for (long i = 0; i <= total; i++) {
            sum += i;
        }
        Console.WriteLine(sum);
    }
 
    using (var tc = new TickCounter()) {
        const int TASK_COUNT = 10;
        long pageSize = total / TASK_COUNT;
        ConcurrentQueue<long> queue = new ConcurrentQueue<long>();
 
        Parallel.For(0, TASK_COUNT + 1, index => {
            long sum = 0;
            long start = pageSize * index;
            long end = Math.Min(pageSize * (index + 1) - 1, total);
            for (long i = start; i <= end; i++) {
                sum += i;
            }
            queue.Enqueue(sum);
        });
 
        Console.WriteLine(queue.Sum());
    }
}
  • Result
500000000500000000
Ticks: 5638.9834
500000000500000000
Ticks: 758.5387
Press any key to continue . . .

测试(C++)

再对比一下C++执行同样代码的效率

#include "stdafx.h"
#include <iostream>
#include <Windows.h>
 
using namespace std;
 
int main()
{
    auto v1 = ::GetTickCount64();
    long long sum = 0;
    long long total = 1000000000L;
    for (long long i = 0; i <= total; i++) {
        sum += i;
    }
    auto v2 = ::GetTickCount64();
    cout << sum << endl;
    cout << "Ticks: " << v2 - v1 << endl;
    return 0;
}
  • Result
500000000500000000
Ticks: 860
Press any key to continue . . .

同样的机器,按照x64, Release编译,结果显示性能相当的好,可见C++在效率上的优势还是很明显

值得一提的是,对于C++的这个测试程序,x86的效率要比x64的效率低许多,各种组合的运行效率见下表:

Debug   x86   6937 (ms)
Debug   x64   5985
Release x86   2797
Release x64   859