Date: Wed, 21 Feb 2001 02:42:03 -0500 From: Richard Guy Briggs <rgb@conscoop.ottawa.on.ca> To: Linux Ipsec mailing list <linux-ipsec@freeswan.org>, Subject: FreeS/WAN redesign thoughts (KLIPS, IPSEC) -----BEGIN PGP SIGNED MESSAGE----- Here is a third edition of the FreeS/WAN redesign plans. Please pick it apart. Some glaring errors have been fixed. A thousand pardons for the potentially multiple posting, I feel like a complete boob forgetting to include a subject line... FreeS/WAN IPSEC -- KLIPS2 DESIGN THOUGHTS ========================================= Wed Feb 21 02:17:58 EST 2001 This document was originally written 2.5 weeks after OLS2000, inspired from a meeting with Rusty and Marc in Montreal in November 1999 and two meetings at OLS2000. Current kernel version reference is 2.4.0 The idea is to redesign KLIPS (kernel parts of FreeS/WAN) to avoid all the 'stoopid routing tricks' (TM) to which we have had to resort over the last 2+ years by disassociating any ipsec devices from physical devices and to add a proper SPDB to do proper incoming IPSEC policy checks. We are hoping to use existing pattern-matching tools rather than invent our own. NetFilter appears to have all the pattern matching capabilities, but is limited in other ways. There is also a significant interest in enabling FreeS/WAN to communicate with routing daemons and be able to do load sharing and failover: http://www.quintillion.com/fdis/moat/ipsec+routing/ This is an exploratory document. Please comment, particularly if I have missed or mis-understood something, to the linux-ipsec, netfilter-devel or netdev lists. The basic architecture of NetFilter is: --->[1]--->(ROUTE)--->[3]--->[4]---> where: | ^ [1] NF_IP_PRE_ROUTING | | [2] NF_IP_LOCAL_IN | (ROUTE) [3] NF_IP_FORWARD v | [4] NF_IP_POST_ROUTING [2] [5] [5] NF_IP_LOCAL_OUT | ^ | | v | The basic path through the kernel as it concerns IPSEC for the three types of packets is as follows: IN: NIC sanity check NF_IP_PRE_ROUTING route-in ip-options processing defragment NF_IP_LOCAL_IN layer3demux application FORWARD: NIC sanity check NF_IP_PRE_ROUTING routing-in ip-options processing ttl decrement and check NF_IP_FORWARD fragment NF_IP_POST_ROUTING output() NIC OUT: application layer3mux NF_IP_LOCAL_OUT route-out NF_IP_POST_ROUTING output() NIC Destination NAT (port forwarding) gets applied in NF_IP_PRE_ROUTING, NF_IP_LOCAL_OUT and Source NAT (masquerading) gets applied in NF_IP_POST_ROUTING. Filtering is applied in NF_IP_LOCAL_IN, NF_IP_FORWARD and NF_IP_LOCAL_OUT. Hook processing order would generally be: NF_IP_PRI_IPSEC_IN? NF_IP_PRI_CONNTRACK NF_IP_PRI_IPSEC_IN? NF_IP_PRI_MANGLE NF_IP_PRI_NAT_DST NF_IP_PRI_FILTER NF_IP_PRI_NAT_SRC NF_IP_PRI_IPSEC_OUT Not all modules are present at each hook. I am uncertain still if IPSEC_IN should be before or after CONNTRACK. Any comments? - ----------- There is more than one possible approach. The following is not exhaustive. So far, the first is much better thought out and so far, preferred. --- 1 --- Treat incoming IPSEC encapsulation as a layer 3 protocol and decapsulate it at the Layer 3 demultiplexer. An incoming packet starts off with a sanity check. It then goes through all the NF_IP_PRE_ROUTING hooks starting with the SPDB checking. Since it is a fresh ESP or AH packet, it will not have any nfmarks and unless that outer IP header should have been processed by another SG in between, no policy will have been required, letting it through. The rest of the NF_IP_PRE_ROUTING hooks may cause it to be DNATed and defragmented. It then goes through routing which thinks it is a local packet, deals with any outer header IP options, then defragmentation and NF_IP_LOCAL_IN filter (allow ESP,AH) before getting to ipsec_rcv() where the outer bundle is authenticated and decrypted and nfmarked to indicate what decapsulation happenned before being passed back to netif_rx(). The next IP header is now visible. The packet now gets re-injected at the beginning. It goes through the incoming sanity check again, getting checked at NF_IP_PRE_ROUTING for policy using previously set nfmark from decryption. It may again be DNATed and defragmented. Routing looks at the now-visible next IP header and routes it locally or via the forward hook. If it is a local packet, IP options and defragmentation are processed. NF_IP_LOCAL_IN then gets to check filtering policy for other L3 protocols. If it is the endpoint for multiple bundles, it is sent back to netif_rx(), having exposed the next IP header. If it is not a local packet, routing has selected a route, potentially through an existing virtual IPSEC device, one per connection, not per physical I/F. IP options and TTL are processed before being filtered at NF_IP_FORWARD, fragmented, then sent to NF_IP_POST_ROUTING. If it is a locally generated packet, it would go through normal filtering at NF_IP_LOCAL_OUT, then go through routing, then go to NF_IP_POST_ROUTING. At NF_IP_POST_ROUTING, an IPSEC matching module would make a decision about the fate of the packet. It would have several possible targets: ACCEPT would allow the packet through with no processing. ENCRYPT would steal the packet. If the SA(s) do(es)n't exist(s), it would send up an ACQUIRE to all listening key management daemons and stash the last copy of the packet, waiting for the appropriate SA(s). If or once the SA(s) is/are available, it then ecrypts the packet, then re-injects the packet at NF_IP_LOCAL_OUT (since the packet now appears to originate from this host) and setting nfmark to indicate what processing happenned. The packet would then be routed and sent back to NF_IP_POST_ROUTING. If no new nfmark is generated, the IPSEC module would ACCEPT it. DROP would drop the packet if previous attempts to do opportunistic encryption failed and the default policy was to block non-IPSEC packets. A packet routed through an optional IPSEC virtual I/F simply gets assigned a specific source address and has the nfmark preloaded. Does this sound correct? The way that nfmark is used is rather vague. It is presently only 32 bits. Ideally, I would like to be able to indicate exactly which SAs were processed on the way in, which would most easily be represented by as many as 4 SAs (AH, ESP, IPCOMP, IPIP), each having an 8 bit protocol field (absolute minimum of 2-bits), 32-bit destination address field (for IPv4, IPv6 would be 128) and a 32-bit SPI. This is a potential maximum of 672 bits. A way of mapping 672 bits on to the 32 bits available would be required to use this. A lookup table could be used to map nfmarks to SAIDs, not the SAs themselves, since the SAs could disappear at any time the tdb table is not locked. It should be able to represent a bundle of SAs where one SA could be used in more than one bundle. There could also be more than one right answer for the incoming SPDB. I have an idea how to accomplish this by changing/extending nfmark by converting it to a list of nfmark structures that contain a pointer to the next item on the list, a cookie for the specific netfilter function that owns the data and a pointer to a data structure. nfmark may not be the right tool for this. Another possible solution is to add a member to the struct sk_buff to point to this information. This has the benefit of not depending on anyone else, but the drawback of needing to patch a header file *and recompiling the entire kernel*. The SADB would be managed via the PF_KEYv2 socket I/F. The SPDB would be managed via a combination of PF_KEYv2 socket I/F extensions and iptables. A separate NetFilter table called 'ipsec' (as opposed to 'filter' or 'nat') would have the first hook at NF_IP_PRE_ROUTING and the last hook at NF_IP_POST_ROUTING. iptables uses the AF_NETLINK socket family. - ----------- --- 2 --- Treat incoming IPSEC encapsulation as an enhancement of the layer 2 protocol and decapsulate it at the NF_IP_PRE_ROUTING hook. This option is less favourable as it stands since it involves creating our own SPDB engine. An incoming packet starts off with a sanity check. It then goes through the NF_IP_PRE_ROUTING match hook for IPSEC, which would be the first in priority, matching every single packet to force it through a policy check. If it was an ESP or AH packet with a local destination address, it would then be sent to ipsec_rcv() and the first bundle would be processed, keeping state until that bundle is completely processed. At this point the incoming SPDB would be checked to ensure that the proper policy had been applied to it. If there is another bundle inside with an ESP or AH header, that bundle is processed, storing the new and old state. This SPDB check would not be iptables-based since we have already gone through the match and target hooks and would have too much state to store in nfmark. The result of the SPDB check would be ACCEPT or DROP (It could also be STOLEN or QUEUEd at this point for opportunistic encryption). The SADB and SPDB entries would be managed via the extended PF_KEYv2 socket I/F. The rest of the NF_IP_PRE_ROUTING hooks may cause it to be DNATed and defragmented. It then gets routed. For local packets, inner IP options and defragmentation are processed. NF_IP_LOCAL_IN then gets to check filtering policy for layer 3 protocols. For non-local packets, IP options and TTL are processed before being filtered at NF_IP_FORWARD then fragmented. Packets are then go through the NF_IP_POST_ROUTING hooks potentially for SNAT, after which the last hook would force all packets to go through the IPSEC outgoing processing module. Here outgoing policy would be checked, again not necessarily by iptables. A result could be ACCEPT, DROP or STOLEN. The last would result in encryption and authentication would be applied as available, then the result would be re-injected at NF_IP_LOCAL_IN, since it would now have a local address, a potentially different destination address and need to be re-routed. A mechanism would need to be used here to prevent recursion. - ------------------ If there are any other directions we should be considering, please suggest... slainte mhath, RGB - -- Richard Guy Briggs -- PGP key available Auto-Free Ottawa! Canada <www.conscoop.ottawa.on.ca/rgb/> <www.flora.org/afo/> Prevent Internet Wiretapping! -- FreeS/WAN:<www.freeswan.org> Thanks for voting Green! -- <green.ca> Marillion:<www.marillion.co.uk> -----BEGIN PGP SIGNATURE----- Version: 2.6.3i Charset: noconv iQCVAwUBOpNxS9+sBuIhFagtAQE8UAP/WF4OwXopq7HhJPSuK5a8XyiZSUJpQcbC IHefyFMFzswQAJDAu4JrRIWevwHPWTrm5PZ7zsALkQM0WwbcRCz8uueItcg2sKmS aMfp1dbbMlmgPk1HTwIDBaeHOEIf8yyyy4S6W0gIyb8x4mdI4nx0zbEbNPXkjG/H gB9G69Fod+M= =wdwQ -----END PGP SIGNATURE-----