OLS 2001 coverage
Weekly Edition |
From: Peter Badovinatz <tabmowzo@yahoo.com> Subject: Notes from OLS Clustering Working Group 25 July 2001 - Ottawa Linux Symposium Clustering Working Group Discussion leaders Lars Marowsky-Bree and Alan Robertson Notes by Peter Badovinatz (note on typography: I attempted to identify when Lars or Alan was speaking. Most audience questions/comments are noted with "Comment:" or "C:". I identified a couple of audience members when I could easily get the name, otherwise they remain anonymous.) HA Working group BOF Large attendance, hopefully folks will sign up charts. Opening: Alan: Opening comments about framework - different components - different requirements - HPC - HA - compare & contrast what's needed Lars: Many, many different clustering solutions Which one to write to Too many to choose from All essentially do the same thing Alan: Discussing his framework, purpose to provide a set of components that can be mixed and matched. Flip chart: Create HPC clusters Create HA clusters Viable OSS 'clusters' project Be able to vary cluster characteristics based on component choice - able to have people understand and contribute to individual pieces - smaller, more understandable, don't necessarily need to know "everything" that's there to contribute. - hopefully though strengthen the ovderall project. Two APIs - internal - components that provide the clustering solution that need to work together to provide an overall system view - heartbeating - membership services - monitoring - external - monitor scripts - control scripts - membership/cluster manager interactions Rename internal API to "CPI" Component Programming Interface. - comment audience: want to normalise APIs, not start OSS project - Alan: no, we are trying to do both, because one follows from the other Lars: Customers want one clustering view Still debating a bit with audience member, he still seems to view that this isn't quite an OSS project Alan: Pieces - M Membership - C Cluster manager - R Resource monitor - S Resource scripts APIs and reference implementations Comment: programmers not good at defining APIs Alan: yes, but someone has to do it Comment: why the reference implementation? Alan: need the implementation to verify that the APIs work, are useful, are good - will allow us to evolve the APIs through usage and experience But, implementation is NOT binding Hopefully the eventual APIs will be relatively "binding" C: two working groups, one for APIs and one for components? - not really, need to know the APIs to know the components - recursive descent, must do each one partially to keep making progress on both C: if this is a standards project, it's wider than Linux, is there a consortium or group to work with? - nope, that's actually a point with clusters, is that everyone already has their own view of clusters and HA and HPC... C: two working groups, cross-test API & ref. implementers - wants to exploit all of the current interested parties to get this C: will this be standard implementation or reference implementation? - well, probably be reference, not everyone will want all of it - no linear process to settle this, must be iterative Alan's STONITH API, still iterating on it, 2 or 3 rounds and still wants to do more. Alan: - Lars mostly interested in how it manages resources. - Compaq mostly interested in membership, system structure. Resource - resources are 'entities' that are managed, things that the customers perceive as providing service - every vendor requires that resource scripts be written differently - very annoying to customers, admins, etc. Want/need these APIs/functions be usable within the kernel as well as at user-space. - where the function is implemented should be transparent - APIs are hopefully the same - well, as close to being the same as they can be, there will be differences but these should be "insignificant" C: need to be careful about the kernel implementation, it's a spartan environment - yes, must make tradeoffs C: are you talking about modifying the kernel? this could be dangerous and difficult and paint us into a kernel. - no, we aren't viewing that the kernel need to be modified - think of this as kernel modules C: buy in from current implementers? - Alan mostly talking to OSS folks - Lars talked to closed source vendors, they are interested in standardising on the resource script level, boring for them, and would like to have that. C: what about HPC parties? - don't know yet, we haven't necessarily talked to them too much - input from linux-cluster has been useful but not deep enough to know precisely what to do C: need to ensure everyone on the working group identifies what are their IP backgrounds to avoid polluting the definitions or the implementations. - Peter identified himself as avoiding and being careful about the membership and n-phase stuff due to patents C: it's not a standard if Sun, e.g., can't implement a closed-source - yes, but, we may need to 'work around' this - Alan says that IBM may release patents for OSS but not closed-source (wish Alan hadn't gotten us into the IP/patent area...) Alan says that there may be 'shims' or 'impedance matchers' that allow their existing products/implementations simply match only a small number of APIs, e.g., the resource script level. - acceptable for this C: agreement to do anything? - Lars/Alan yes we need that to make progress CPI/API is often the same thing, but not always - Alan difference matters sometimes, but not always - closed source probably won't provide them but may use them C: two kinds of interfaces, programming and binary... maybe don't want the split - some agreement here - must ensure no recompilation at application level - but, may require that components need recompilation C: anyone disagree with the general idea? - no takers :) Alan wants to minimise the effort for an application to become cluster aware C: policy document about goals, IP, etc. - will do, need to be codified - concern about understanding how contributions will be managed - licensing, etc. - Alan IP not be a barrier to implementation C: group mgmt level, 'who is in the group'. will we specify the protocols? will it be protocol free? - Lars/Alan no, probably can't be completely, we'll have a group comm service, but applications will define what they do C: group must be synchronised, scaling effects - ordered messaging, virtual synchrony, - low level messages, no ordering, just reliable delivery, up nodes receive it - build on this to add ordering, synchrony, etc. - group delivery C: OSI-like (networking definitions), multi-layer definition for group communication? - not sure, maybe - Lars still thinking of the application layer C: define some components only communicate horizontally, other communication goes vertically (as does OSI) - Alan some go both ways, barrier with applications - but, it is unclear who are the clients of the group components C: try to minimise number of interfaces? - Lars want minimal, but, can only be so minimal, need as many as are needed C: wants to avoid any-to-any communication requirements, scaling problems - makes sense, he's right C: discussions about XML, scripts on mail lists - scripts are 'standard' way that resources are managed, just the way things are done - XML, need some way to intercommunicate among nodes/components C: heterogeneous clusters? - yes, intent is that everything will work across heterogeneous clusters - however, many implementations using the framework, code, etc. may not work in such a cluster, e.g., cluster file system - some specific implementations may optimise to target specific hardware and my not be generally useful - APIs and definitions should NOT be exclusive - you can conform to the framework without supporting all possible hardware combinations Coffee break discussion: C: Drew Streib 'Free Standards Group' - important to work out policy document issues, what is it, how licensed, etc. Can still move forward on technical group, but need to have this policy side going. - how to get document, how to comply with standard - royalties necessary? compliance, how to get it? - certification, costs, etc. - licensing the spec? - state all of this up front, copyright on name of document - prevent certification process, etc. being done by someone else - name and copyright on the NAME of the document, bit different from what and how the document is done Reconvene Alan - make list of components - not the standard - but want to understand disagreements/agreements Lars still looking at the application side, what kinds of APIs does he want - membership - cluster communication - resource control methodology Alan, what are components that we may want low level, communications - raw - reliable - ordered - group Ordered is that all recipients receive all messages in same order - avoid 'how' to do it for now, although some comments started to go that way... Cluster membership - when do nodes join, when do nodes leave, evict node Group membership Tangent onto security Alan - cluster is a set of machines whose backplane is the internet - security issues - need single administrative domain for a cluster - can't cross admin domains C: need function for authenticate/authorise - Alan cluster needs to trust each other C: authentication on cluster joining - Acceptable in principle C: more trust issues, geographically separate, need security between the two, needs to be part of the framework definitions - initially a piece that always says 'yes' - Alan feels every message is signed/authenticated/etc. C: DMA across nodes, authenticate messages is absurd (HPC clusters have different view) - Lars clusters that trust the hardware level have different needs for security of messages Sum: security IS an issue, but we can't solve it now Back to components... Barrier services Resources Event services - confusion here, one C: and Peter understand this - Alan/Lars push things to unifying, calling event services as listening to groups - ordered events / unordered events user interface C: are we trying to define too much / too closely? - Alan wants to throw a bunch of stuff out C: a bunch of modules that may be available - Alan application doesn't care much about what's there C: wants a nice extensible approach - Alan my paper does that, some criticism too much C: naming, be separate standard, let applications worry about that, short term, basic standard about communications and interconnection, let applications worry about naming at their level - Alan some naming needed, e.g., are node numbers needed running out of time here... Lars how to proceed? suggests Linux Kongress in Enschede. how to come to agreement? Alan some issues about LSB 'holes', e.g., no specific start/stop daemon started Which mailing list? sorta undecided... C: discuss internal kinds of things with time left so more on components Alan RPC (using term generally) based on some sort of guaranteed communications - not sockets, not quite right model - not always using networks (serial, over disks, etc.) - deliver is to "cluster members" or "group members" - all or some or one marshaling/demarshaling of data I/O protection (fencing, etc.) logging facility heartbeating initialisation quorum mechanisms/policies configuration (Lars: a database) "repository" user interface interaction with config "repository" (battery running down rapidly here :) Alan want configuration of objects be uniform, although data may be different C: versioning information, which version are you talking to - Alan group membership layer for this - C: communication - C: versions of the APIs - migration of the individual nodes - must be able to upgrade cluster on the fly (node at a time) C (Albert Calahan): real time and resource minimisation issues, avoid hitting cpu or network at inopportune times, e.g., don't require a regular heartbeat - Alan be able to tune components or timing, or use different methods to determine cluster membership C (Albert Calahan): clusterwide shared memory - optional component, not all clusters want this C: Capability map - Alan show what functional pieces are available. - C: bitmap - Alan not taken with bitmap idea C: Real-time - Lars comments about tightly synchronised time across cluster, where subsequent time calls on different nodes always show time incrementing. |
Copyright 2002
Eklektix, Inc. all rights
reserved.
Linux ® is a registered trademark of Linus Torvalds