为什么取消装箱比装箱快100倍

编程入门 行业动态 更新时间:2024-10-27 08:23:53
本文介绍了为什么取消装箱比装箱快100倍的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

为什么装箱与拆箱之间的速度变化如此之大?相差十倍.我们什么时候应该关心这个?上周,Azure支持人员告诉我们,应用程序的堆内存中有问题.我很想知道它是否可能与装箱/拆箱问题有关.

Why is there so much speed change between boxing and unboxing operations? There is 10 times difference. When should we care about this? Last week an Azure support told us there is an issue in the heap memory of our application. I am curious to know if it could be related to the boxing-unboxing issue.

using System; using System.Diagnostics; namespace ConsoleBoxing { class Program { static void Main(string[] args) { Console.WriteLine("Program started"); var elapsed = Boxing(); Unboxing(elapsed); Console.WriteLine("Program ended"); Console.Read(); } private static void Unboxing(double boxingtime) { Stopwatch s = new Stopwatch(); s.Start(); for (int i = 0; i < 1000000; i++) { int a = 33;//DATA GOES TO STACK object b = a;//HEAP IS REFERENCED int c = (int)b;//unboxing only hEre ....HEAP GOES TO STACK } s.Stop(); var UnBoxing = s.Elapsed.TotalMilliseconds- boxingtime; Console.WriteLine("UnBoxing time : " + UnBoxing); } private static double Boxing() { Stopwatch s = new Stopwatch(); s.Start(); for (int i = 0; i < 1000000; i++) { int a = 33; object b = a; } s.Stop(); var elapsed = s.Elapsed.TotalMilliseconds; Console.WriteLine("Boxing time : " + elapsed); return elapsed; } } }

推荐答案

尽管人们已经提供了绝妙的解释,说明为什么装箱比装箱快.我想再谈谈您用来测试性能差异的方法.

Although people have offered fantastic explanations already for why unboxing is faster than boxing. I want to say a little bit more on the methodology you used to test the performance difference.

您从发布的代码中得到了结果(速度相差10倍)吗?如果我在发布模式下运行该程序,则输出如下:

Did you get your result (10x difference in speed) from the code you posted? If I run that program in release mode, here is the output:

Program started Boxing time : 0.2741 UnBoxing time : 4.5847 Program ended

每当我执行微性能基准测试时,我都会进一步验证我确实在比较我打算比较的操作.编译器可以对您的代码进行优化.在ILDASM中打开可执行文件:

Whenever I am doing a micro performance benchmark, I tend to further verify I am indeed comparing the operation I intended to compare. Compiler can make optimization to your code. Open the executable in ILDASM:

以下是拆箱的IL :(我只包括最重要的部分)

Here is the IL for UnBoxing: (I only included the portion that matters most)

IL_0000: newobj instance void [System]System.Diagnostics.Stopwatch::.ctor() IL_0005: stloc.0 IL_0006: ldloc.0 IL_0007: callvirt instance void [System]System.Diagnostics.Stopwatch::Start() IL_000c: ldc.i4.0 IL_000d: stloc.1 IL_000e: br.s IL_0025 IL_0010: ldc.i4.s 33 IL_0012: stloc.2 IL_0013: ldloc.2 IL_0014: box [mscorlib]System.Int32 //Here is the boxing IL_0019: stloc.3 IL_001a: ldloc.3 IL_001b: unbox.any [mscorlib]System.Int32 //Here is the unboxing IL_0020: pop IL_0021: ldloc.1 IL_0022: ldc.i4.1 IL_0023: add IL_0024: stloc.1 IL_0025: ldloc.1 IL_0026: ldc.i4 0xf4240 IL_002b: blt.s IL_0010 IL_002d: ldloc.0 IL_002e: callvirt instance void [System]System.Diagnostics.Stopwatch::Stop()

这是装箱代码:

IL_0000: newobj instance void [System]System.Diagnostics.Stopwatch::.ctor() IL_0005: stloc.0 IL_0006: ldloc.0 IL_0007: callvirt instance void [System]System.Diagnostics.Stopwatch::Start() IL_000c: ldc.i4.0 IL_000d: stloc.1 IL_000e: br.s IL_0017 IL_0010: ldc.i4.s 33 IL_0012: stloc.2 IL_0013: ldloc.1 IL_0014: ldc.i4.1 IL_0015: add IL_0016: stloc.1 IL_0017: ldloc.1 IL_0018: ldc.i4 0xf4240 IL_001d: blt.s IL_0010 IL_001f: ldloc.0 IL_0020: callvirt instance void [System]System.Diagnostics.Stopwatch::Stop()

在Boxing方法中根本没有拳击说明.它已被编译器完全删除. Boxing方法除了迭代一个空循环外什么也不做.因此,在拆箱"中测量的时间成为装箱和拆箱的总时间.

No boxing instruction at all in the Boxing method. It has been completely removed by compiler. The Boxing method does nothing but iterating an empty loop. The time measured in UnBoxing therefore becomes the total time of boxing and unboxing.

微基准测试很容易受到编译器技巧的攻击.我建议您也查看一下自己的IL.如果您使用其他编译器,则可能会有所不同.

Micro-benchmarking is very vulnerable to compiler tricks. I would suggest you have a look at your IL as well. It may be different if you are using a different compiler.

我对您的测试代码做了一些修改:

I modified your test code a little bit:

装箱方法:

private static object Boxing() { Stopwatch s = new Stopwatch(); int unboxed = 33; object boxed = null; s.Start(); for (int i = 0; i < 1000000; i++) { boxed = unboxed; } s.Stop(); var elapsed = s.Elapsed.TotalMilliseconds; Console.WriteLine("Boxing time : " + elapsed); return boxed; }

和拆箱方法:

private static int Unboxing() { Stopwatch s = new Stopwatch(); object boxed = 33; int unboxed = 0; s.Start(); for (int i = 0; i < 1000000; i++) { unboxed = (int)boxed; } s.Stop(); var time = s.Elapsed.TotalMilliseconds; Console.WriteLine("UnBoxing time : " + time); return unboxed; }

以便可以将它们翻译成类似的IL:

So that they can be translated into similar IL:

对于装箱方法:

IL_000c: callvirt instance void [System]System.Diagnostics.Stopwatch::Start() IL_0011: ldc.i4.0 IL_0012: stloc.3 IL_0013: br.s IL_0020 IL_0015: ldloc.1 IL_0016: box [mscorlib]System.Int32 //Here is the boxing IL_001b: stloc.2 IL_001c: ldloc.3 IL_001d: ldc.i4.1 IL_001e: add IL_001f: stloc.3 IL_0020: ldloc.3 IL_0021: ldc.i4 0xf4240 IL_0026: blt.s IL_0015 IL_0028: ldloc.0 IL_0029: callvirt instance void [System]System.Diagnostics.Stopwatch::Stop()

对于拆箱:

IL_0011: callvirt instance void [System]System.Diagnostics.Stopwatch::Start() IL_0016: ldc.i4.0 IL_0017: stloc.3 IL_0018: br.s IL_0025 IL_001a: ldloc.1 IL_001b: unbox.any [mscorlib]System.Int32 //Here is the UnBoxng IL_0020: stloc.2 IL_0021: ldloc.3 IL_0022: ldc.i4.1 IL_0023: add IL_0024: stloc.3 IL_0025: ldloc.3 IL_0026: ldc.i4 0xf4240 IL_002b: blt.s IL_001a IL_002d: ldloc.0 IL_002e: callvirt instance void [System]System.Diagnostics.Stopwatch::Stop()

运行几个循环以消除冷启动效果:

Run several loops to remove the cold startup effect:

static void Main(string[] args) { Console.WriteLine("Program started"); for (int i = 0; i < 10; i++) { Boxing(); Unboxing(); } Console.WriteLine("Program ended"); Console.Read(); }

以下是输出:

Program started Boxing time : 3.4814 UnBoxing time : 0.1712 Boxing time : 2.6294 ... Boxing time : 2.4842 UnBoxing time : 0.1712 Program ended

是否证明拆箱比装箱快 10倍?让我们用windbg检出汇编代码:

Does that prove that unboxing is 10x faster than boxing? Let's checkout the assembly code with windbg:

0:004> !u 000007fe93b83940 Normal JIT generated code MicroBenchmarks.Program.Boxing() ... 000007fe`93ca01b3 call System_ni+0x2905e0 (000007fe`f07a05e0) (System.Diagnostics.Stopwatch.GetTimestamp(), mdToken: 00000000060040d2) ... //This is the for loop 000007fe`93ca01c2 mov eax,21h 000007fe`93ca01c7 mov dword ptr [rsp+20h],eax 000007fe`93ca01cb lea rdx,[rsp+20h] 000007fe`93ca01d0 lea rcx,[mscorlib_ni+0x6e92b0 (000007fe`f18b92b0)] //here is the boxing 000007fe`93ca01d7 call clr!JIT_BoxFastMP_InlineGetThread (000007fe`f33126d0) 000007fe`93ca01dc mov rsi,rax //loop unrolling. instead of increment i by 1, we are actually incrementing i by 4 000007fe`93ca01df add edi,4 000007fe`93ca01e2 cmp edi,0F4240h // 0F4240h = 1000000 000007fe`93ca01e8 jl 000007fe`93ca01c2 // jumps to the line "mov eax,21h" //end of the for loop 000007fe`93ca01ea mov rcx,rbx 000007fe`93ca01ed call System_ni+0x2acb70 (000007fe`f07bcb70) (System.Diagnostics.Stopwatch.Stop(), mdToken: 00000000060040cb)

拆箱的程序集:

0:004> !u 000007fe93b83930 Normal JIT generated code MicroBenchmarks.Program.Unboxing() Begin 000007fe93ca02c0, size 117 000007fe`93ca02c0 push rbx ... 000007fe`93ca030a call System_ni+0x2905e0 (000007fe`f07a05e0) (System.Diagnostics.Stopwatch.GetTimestamp(), mdToken: 00000000060040d2) 000007fe`93ca030f mov qword ptr [rbx+10h],rax 000007fe`93ca0313 mov byte ptr [rbx+18h],1 000007fe`93ca0317 xor eax,eax 000007fe`93ca0319 mov edi,dword ptr [rdi+8] 000007fe`93ca031c nop dword ptr [rax] //This is the for loop //again, loop unrolling 000007fe`93ca0320 add eax,4 000007fe`93ca0323 cmp eax,0F4240h // 0F4240h = 1000000 000007fe`93ca0328 jl 000007fe`93ca0320 //jumps to "add eax,4" //end of the for loop 000007fe`93ca032a mov rcx,rbx 000007fe`93ca032d call System_ni+0x2acb70 (000007fe`f07bcb70) (System.Diagnostics.Stopwatch.Stop(), mdToken: 00000000060040cb)

您可以看到,即使在IL级别进行比较似乎是合理的,JIT仍可以在运行时执行另一个优化. UnBoxing方法再次做空循环.除非您确认为这两种方法执行的代码具有可比性,否则很难简单地得出以下结论:拆箱比装箱快10倍"

You can see that even if at the IL level the comparison seems to be reasonable, JIT can still perform another optimization at runtime. The UnBoxing method is doing am empty loop again. Untill you verify the code executed for the two methods are comparable, it is very hard to simply conclude "unboxing is 10x faster then boxing"

更多推荐

为什么取消装箱比装箱快100倍

本文发布于:2023-10-18 11:38:06,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1504120.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!