The GNU/Linux kernel, version 2.6.23, comes with a modular scheduler core and a Completely Fair Scheduler (CFS), which is implemented as a scheduling module. If you’re interested in the workings of this scheduler, be sure to check out the following article at DevWorks. Below you can find an excerpt of the article, which is in my opinion the core of the article.
How CFS works
The CFS scheduler uses an appeasement policy that guarantees fairness. As a task gets into the runqueue, the current time is recorded, and while the process waits for the CPU, its wait_runtime value gets incremented by an amount depending on the number of processes currently in the runqueue. The priority values of different tasks are also considered while doing these calculations. When this task gets scheduled to the CPU, its wait_runtime value starts decrementing and as this value falls to such a level that other tasks become the new left-most task of the red-black tree and the current one gets preempted. This way CFS tries for the ideal situation where wait_runtime is zero!
CFS maintains the runtime of a task relative to a runqueue-wide clock called fair_clock (cfs_rq->fair_clock), which runs at a fraction of real time so that it runs at the ideal pace for a single task.
How are granularity and latency related?
The simple equation correlating granularity and latency is
gran = (lat/nr) – (lat/nr/nr)
where gran = granularity,
lat = latency, and
nr = count of running tasks.
For example, if you have four runnable tasks, then fair_clock will increase at one quarter of the speed of wall time. Each task then tries to catch up with this ideal time. This is a result of the quantized nature of time-shared multitasking. That is, only a single task can run at any one time; therefore, the other processes build up a debt (wait_runtime). So once a task gets scheduled, it will catch up its debt (and a little more because the fair_clock will not stop ticking during the catch up period).
Priorities are introduced by weighting tasks. Suppose we have two tasks: one is to consume twice as much CPU time as the other, yielding a 2:1 ratio. The math gets changed so that a task with weight 0.5 sees time pass by twice as fast.
We enqueue the task in the tree based on fair_clock.
As for time slices, remember, CFS does not use them, at least not in the prior way. Time slices in CFS are of variable length and are decided dynamically.
For the load balancer, scheduling modules implement iterators that are used to walk through all the tasks managed by that scheduling module to do load balancing.