Go vs C#, part 1: Goroutines vs Async-Await

Two remaining parts are here:

I am going to write a series of posts comparing some features of Go and C#. The core feature of Go — goroutines — is actually a very good point to start from. C#’s alternative for this is Task Parallel Library (TPL) and async-await support.

The implementations of these features are quite different:

  • Async-await in C# is implemented as a method body transform provided by the compiler similarly to what C# does for IEnumerable<T> / IEnumerator<T> methods. The compiler generates a method returning state machine (an instance of compiler-generated type) that is responsible for evaluation of asynchronous computation.

In this post I’ll focus on a relatively simple test:

  • Create N goroutines, each one awaits a number on its input channel, adds 1 to it, and sends it to the output.

Go code:

C# code:

Output for Go:

Output for C#:

Before we start discussing the results, some notes on the test itself:

  • This test “pre-crafted” for Go — in C# you normally never need channels for async tasks to communicate. Tasks there typically call each other and asynchronously await for a result. Nevertheless, the only option Go has for goroutine communication is channels, so I’ve decided to design a test that uses them.

Comparison of raw results:

  • First run of this test takes almost exactly the same time both on Go and on C#

So why the second run on Go is so much faster? The explanation is simple: when you start a goroutine, Go needs to allocate a 8KB stack for it. These stacks are reused, i.e. Go doesn’t need to allocate these stacks on the second run. The proof:

Go allocates almost ~ 9GB for 1M goroutines and channels. Assuming each goroutine consimes at least 8KB for its stack, 8GB are necessary just for these stacks.

If we increase the number of messages passed on this test to 2M, it already fails on my machine. 3M messages, and it wills fail even if no any other apps (except some background ones) are running.

So the difference is more or less clear. Let’s think on why C# is generally slower on these test:

  • System.Threading.Tasks.Channels is in preview state, i.e. its performance is probably far from perfect at this point. E.g. it’s clear that awaiting on a channel is ~ 2x more expensive than awaiting on a task.

Now, let’s modify a test a bit, and decrease the number of passed messages to 20K — a number that’s much closer to the maximum we expected have in real life (20K open sockets on servers, etc.):

As you can see, C# gets closer to Go here:

  • It beats Go during the first pass

And finally, the same test on 5K messages:

We see here that task-based test in C# outperforms the test on Go, though channel-based test on C# is still ~ 2x slower than the second pass on Go.

Why C# benefits from a smaller number of tasks?

  • 5K test on Go uses ~ 5MB RAM, which is still less than L3 cache size for Core i7, but much more than L2 cache size; on another hand, it’s not quite clear why performance isn’t as good as it should be on the second pass — CPU anyway caches only the accessed subset of data.

Goroutines vs async-await: conclusions

Let’s highlight the most important differences:

  • Goroutines are clearly faster. In real-life scenarios you can expect something like 2x … 3x per any await. On the other hand, both implementations are quite efficient: you can expect something like 1M “awaits” in C# per second, and maybe 2–3M in Go, which is actually a fairly large number. E.g. if you handle network messages, this probably translates to 100K messages per second on Core i7 in C#, i.e. way more on a real server. I.e. this isn’t expected to be a bottleneck anyway.

Overall, the implementation differences are quite significant, as well as the implications.

There is a good chance I’ll write more robust / real-life test for async-await-goroutines some day and discuss it in another post. But since this is definitely an interesting topic, I feel I have to at least reference someone else’s benchmark here. Unfortunately, there aren’t many of these — here is the best micro-benchmark close to real life scenarios I found so far:

That’s a simple web server benchmark with a middleware deserializing JSON and shooting an HTTP request to an external service. The description is right there; the final result for .NET Core is in comments (check out the whole thread to understand why his original benchmark for .NET was incorrect): https://stefanprodan.com/2016/aspnetcore-vs-golang-data-ingestion-benchmark/#comment-3158140604:

  • Go handles ~ 9K requests / second (concurrency level = 100, mean time per request = 15ms). There are multiple results for Go, so it’s actually unclear which one to take — I picked the best one I saw.

That’s all for today. Note that I am totally not an expert in Go — all Go code shown here is probably 50% of all Go code I wrote so far. Thus if you’re from Go camp, you’re definitely welcome to comment this post — I’ll be happy to extend or edit it based on your feedback.

Go to Part 2: Garbage Collection

P.S. Check out my new project: Stl.Fusion, an open-source library for .NET Core and Blazor striving to be your #1 choice for real-time apps. Its unified state update pipeline is truly unique and mind-blowing.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store