Critical sections are a hybrid of user-mode and kernel-mode. They try to keep your thread from context switching by using a spin lock (user-mode) before falling back on a more expensive semaphore (kernel-mode). This improves performance in real-world scenarios. In contrast, a mutex is purely kernel-mode and will immediately wait, performing a context switch.
By having 100% contention between 2000 threads, you've made it so the critical sections will almost certainly spin as much as possible, eating up CPU, before finally doing exactly what the mutex does and performing a wait in kernel-mode. So it makes sense for them to be slower in this situation.
And what japreiss said. Thread creation is very slow.