[LWN Logo]
[LWN.net]
From:	 Linus Torvalds <torvalds@transmeta.com>
To:	 Jens Axboe <axboe@suse.de>, Patrick Mochel <mochel@transmeta.com>
Subject: Re: [patch] 32-bit dma memory zone
Date:	 Thu, 7 Jun 2001 14:22:10 -0700 (PDT)
Cc:	 Alan Cox <alan@redhat.com>, "David S. Miller" <davem@redhat.com>,
	 MOLNAR Ingo <mingo@chiara.elte.hu>,
	 Richard Henderson <rth@cygnus.com>,
	 Kanoj Sarcar <kanoj@google.engr.sgi.com>,
	 Kernel Mailing List <linux-kernel@vger.kernel.org>


On Thu, 7 Jun 2001, Jens Axboe wrote:
> 
> I'd like to push this patch from the block highmem patch set, to prune
> it down and make it easier to include it later on :-)
> 
> This patch implements a new memory zone, ZONE_DMA32. It holds highmem
> pages that are below 4GB, as we can do I/O on those directly. Also if we
> do need to bounce a > 4GB page, we can use pages from this zone and not
> always resort to < 960MB pages.

Patrick Mochel has another patch that adds another zone on x86: the "low
memory" zone for the 0-1MB area, which is special for some things, notably
real mode bootstrapping (ie the SMP stuff could use it instead of the
current special-case allocations, and Pat needs it for allocating low
memory pags for suspend/resumt).

I'd like to see what these two look like together.

But even more I'd like to see a more dynamic zone setup: we already have
people talking about adding memory dynamically at run-time on some of the
server machines, which implies that we might want to add zones at a later
time, along with binding those zones to different zonelists.

This is also an issue for different architectures: some of these zones do
not make any _sense_ on other architectures. For example, what's the
difference between ZONE_HIGHMEM and ZONE_NORMAL on a sane 64-bit
architecture (right now I _think_ the 64-bit architectures actually make
ZONE_NORMAL be what we call ZONE_DMA32 on x86, because they already need
to be able to distinguish between memory that can be PCI-DMA'd to, and
memory that needs bounce-buffers. Or maybe it's ZONE_DMA that they use for
the DMA32 stuff?).

Anyway, what I'm saying is that "GFP_HIGHMEM" already traverses three
zones, and with ZONE_1M and ZONE_DMA32, you'd have a list of five of them.
Of which only _two_ would actually be meaningful on some architectures.

So should we not try to have some nicer interface like

	create_zone(&zone, offset, end);

	add_zone(&zone, zonelist);

and then we could on x86 have

	create_zone(zone+0, 0, 1M);
	create_zone(zone+1, 1M, 16M);
	create_zone(zone+2, 16M, 896M);
	create_zone(zone+3, 896M, 4G);
	create_zone(zone+4, 4G, 64G);

	.. populate the zones ..

	add_zone(zone+4, GFP_HIGHMEM);

	add_zone(zone+3, GFP_HIGHMEM);
	add_zone(zone+3, GFP_DMA32);

	add_zone(zone+2, GFP_HIGHMEM);
	add_zone(zone+2, GFP_DMA32);
	add_zone(zone+2, GFP_NORMAL);

	/* the 1M-16M zone is usable for just about everything */
	add_zone(zone+1, GFP_HIGHMEM);
	add_zone(zone+1, GFP_DMA32);
	add_zone(zone+1, GFP_NORMAL);
	add_zone(zone+1, GFP_DMA);

	/* The low 1M can be used for everything */
	add_zone(zone+0, GFP_HIGHMEM);
	add_zone(zone+0, GFP_DMA32);
	add_zone(zone+0, GFP_NORMAL);
	add_zone(zone+0, GFP_DMA);
	add_zone(zone+0, GFP_LOWMEM);

and eventually, when we get hot-plug memory, the hotplug event would be
just something like

	zone = kmalloc(sizeof(struct zone), GFP_KERNEL);
	create_zone(zone, start, end);

	.. populate it with the newly added memory ..

	/*
	 * Add it to all the appropriate zones (I suspect hotplug will
	 * only occur in high memory, but who knows? 
	 */
	add_zone(zone, GFP_HIGHMEM);
	...

(Note how this might also be part of the equation of how you add nodes
dynamically in a NuMA environment).

And see how the above would mean that something like sparc64 wouldn't need
to see five zones when it reall yonly needs two of them.

		Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/