[LWN Logo]

Date:	Mon, 22 Mar 1999 21:39:23 +0100 (MET)
From:	Gerard Roudier <groudier@club-internet.fr>
To:	Linus Torvalds <torvalds@transmeta.com>
Subject: Re: disk head scheduling


Linus,

I agree completely with all your proposals below. I just will try to 
detail what I understood and would be happy to have:

The per major mapping function (kdev_t), as you suggest, seems indeed a
excellent approach. Since ll_rw_blk is dealing gracefully with block_dev
structures, we may want to only provide it with such objects to handle.
So, just adding some get_blkdev(kdev_t) method to the blk_dev structure is
the mapping I understand and, btw, I also end up with thinking it is quite
nice and will not complexify ll_rw_blk.

The second problem to address is to have some reserved pool of requests in
order not to starve too long queues when we have much IOs to do. Since it
seems not simple to use queue depths from ll_rw_blk and may-be it is not
appropriate, we may simply use the request pool as follow: 

Default pool : used until no entry is available
  Some value from 64 to 256 should be used.
Reserved pool for READ: only used when we want to queue a READ to a queue 
  that is currently emply and the default pool is fully busy.
  Some value in te range 8..32 should be fine
Reserved pool for WRITE: only used when we want to queue a WRITE to a
  queue that is currently emply and the default pool is fully busy.
  Some value in te range 8..32 should suffice.

For the queues (that will be full blk_dev structures with plug semantic),
the easier approach is probably to export 1 queue per controller. We still
would have the 'one queue problem' per controller, but it seems easy to
ensure we will not hang since we have a 1:1 problem only. If the SCSI
subsystem ensure it is always able to buffer (or dequeue) at least 1
request at any time per queue (controller) this will work fine. There are
some resources that are shared by all controllers in the SCSI code, but it
would not be hard to ensure that no 'both FULL and EMPTY' condition will
hang the dequeuing machine. And since the number of controllers is low,
even scanning all the queues at each time we are called from ll_rw_blk
would be acceptable. Btw, I donnot like the 'one queue per controller'
solution for SCSI, as you know. 

Indeed, the right solution for SCSI is to export a blk_dev structure for
each device. In this case, the fact that the controller has a limited
command depth may complexify a bit the problem of not hanging the request
dequeuing machine. It would be fine to also be able to accept (or to
buffer) at least 1 request from each queue (device) in that situation so
that the dequeuing would be guaranteed not to hang.  A bit more complex
than the 1 queue per controller problem, but looks feasible to me. I
highly prefer the per SCSI device queue (full blkdev per device with plug
semantic)  solution, as you know. 

Will continue studiing how to implement that. Btw, I haven't any problem 
with ll_rw_blk code which is still clean code. All the real implementation 
problems are in the SCSI code, obviously, since the 'one queue thing' is 
probably the feature that allows it not to hang the request dequeuing.

Regards,
   Gérard.


On Sun, 21 Mar 1999, Linus Torvalds wrote:

> On Mon, 22 Mar 1999, Gerard Roudier wrote:
> > 
> > But the code seems to plug the major, thus the controller, but says that
> > it plugs the device. This perhaps didn't affected performances with early
> > controllers. Now, it is allowed to have queues based on kdev_t value, but
> > the code still seems to plug the major. This looks too me like some
> > incomplete enhancement, btw. 
> 
> Think of it as not "device" nor "controller" and certainly not as "major" 
> (even though the latter is the thing that is actually closest to the
> actual implementation). 
> 
> This of it as "scheduling entity". It then depends a lot on the actual
> hardware what such a scheduling entity is.
> 
> For example, for floppy drives and IDE disks, you really cannot schedule
> across the controller, because while there are multiple devices per
> controller, they are not independent of each other.
> 
> For SCSI, in contrast, the correct scheduling to use is probably
> device-based rather than anything else.
> 
> This is an example of something where the code really shouldn't make any
> policy decisions: there should be a mapping from device number to "request
> queue", and that mapping depends on the device. The obvious way to do this
> is to just have a per-major-number mapping function or similar. 
> 
> Note that it really shouldn't be all that much work to change the actual
> ll_rw_block.c layer - all the work is really in the devices themselves
> (adding the mapping function, and making sure the SCSI layer in particular
> can live with multiple queues). 
> 
> 			Linus
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.rutgers.edu
> Please read the FAQ at http://www.tux.org/lkml/
> 
> 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/