Enhancing the Linux Scheduler

July 25, 2000
J. Corbet

IBM's Mike Kravetz and Herbertus Franke presented their work with the Linux scheduler. They have some concern that the scheduler will not scale well toward the sorts of systems IBM has in mind, and they are looking to help. To that end, they discussed two scheduler variants designed for that scalability but which do not make significant changes to the Linux scheduling policy.

The problems they have identified with the current scheduler are:

There is a single spinlock which controls access to the scheduler. As the number of processors grows, contention for this lock will increase.
The entire run queue is scanned each time the scheduler must make a decision. That scan will get expensive as the number of tasks increases.

The first approach shown was the "Priority Level Scheduler." Here, they have taken the single run queue and split it into multiple lists, each representing a priority level. Since the run queue is now sorted, there is no more need to scan the whole thing. In benchmark runs, this scheduler was shown to perform better when a large number of tasks is running on the system; for lighter loads, it can be worse than what Linux has now.

The second approach is the "Multi-Queue Scheduler." This one maintains a single run queue for each processor; it also tracks the best "goodness" value (the scheduling priority, essentially) present on each processor's run queue. When scheduling time happens, only the local processor's run queue is scanned. If the best process there (adjusted with the local CPU bonus) has a higher priority than the best on all the other queues, it can be run without the need to scan any other processor's queue. Otherwise, it may be necessary to examine the queue for one or more processors. There is a separate spinlock for each processor, which decreases lock contention greatly. And, in fact, this scheduler tends to perform far better than both the default Linux scheduler and the priority-level scheduler, especially as the number of processors increases.

The numbers are clear, but there is still a lot of skepticism among the kernel developers; not everybody feels that the scheduler needs replacing. In particular, nobody wants to create a situation where 16-processor systems go very fast, but uniprocessor desktop systems are slowed down. Schedulers is still very much an open topic.

More information is available on the Linux Scalability Effort page.

Back to OLS 2001 coverage