Hi, thanks for looking into this! First, let me comment on a few points you’ve mentioned:

  • “The example uses a kind of anti-pattern for async usage in C#” — that’s true, and I’ve mentioned it’s an unfair benchmark for C#. I intentionally took a well-known benchmark for goroutines knowing it’s highly disadvantageous for C# to demonstrate that async/await should be comparable even in this case in terms of performance.
  • This explains why I didn’t try to use other schedulers — frankly, I should, but that’s not what you normally do by default.
  • This also explains why I had to use Task.Yield(). It was clear to me that you can get rid of it w/ your own scheduler, and moreover, it will also improve the speed a lot.
  • I am also 90% sure that getting rid of “await Task.Yield()” is the main driver of performance improvement in your version.
  • As for GC.Collect() after warmup, true, it’s really not that important, all you might expect is a bit better + more consistent timings if your working set is fully fitting into CPU caches. Unfortunately, I’ve forgot to do the same in Go test — clearly, it has to be done the same way at least.
  • As a side note, I never used StaTaskScheduler before — this is also why I was hesitant to invest into researching on what can be achieved with other schedulers. But thanks to you, I’ll take this into account next time, though I still lean to implementing a “perfectly slim” scheduler + probably, tasks as well. Based on your research, this should demonstrate that in this case C# should be light-years ahead, which is actually what I’d love to show: the asynchronous computation model there is way more extendable than in Go, so if it’s also better in terms of performance — even though you need some special tuning — this basically undermines the core promise Go makes.

Finally, performance isn’t the only focus point in this article. I mostly wanted to show how these languages differ in terms of their approach to asynchronous computation, and obviously, you can’t do this w/o such benchmarks.

No matter what, thanks a lot for quite useful feedback!

P.S. If you don’t mind, I’d love to reference your post if I’ll decide to publish another post on async/await vs Go here. I plan to do this closer to .NET Core 2.1 release — likely, with more benchmarks in general. Based on what’s known so far, we should expect pretty dramatic improvements in speed for C# with this release.

P.P.S. The funny part is: I actually spent some time to find out why missing tail / prologue Task.Yield() triggers recursion in this example. And my original plan for that post was to share stack traces to show where exactly it happens, why it happens, and why this optimization in TPL actually makes a lot of sense. But I ended up explaining only the side effect of that and cut the rest — solely because of a lack of time. And now I see why it looks like I didn’t dig deeply enough into why it happens, esp. taking into account the title. So maybe it’s a good lesson for my future posts :)

Creator of https://github.com/servicetitan/Stl.Fusion , ex-CTO @ ServiceTitan.com