[LWN Logo]
[LWN.net]
From: announce-admin@opennms.org
To: announce@www.opennms.org
Date: Tue, 6 Mar 2001 21:51:58 -0600 (CST)
Subject: [OpenNMS-Announce] OpenNMS Update v2.10

==================
  OpenNMS Update
==================
 Vol 2., Issue 10
==================
  March 6, 2001
==================
   
   In this week's installment...

     * Project Status
          + Challenging Week
          + New Releases Pending
          + Office Move - Still Pending
          + Coding Projects Underway
     * Upcoming Road Shows
     * Early Adopter Program Status
     * The Wish List
       

==============
Project Status
==============
   
Challenging Week:
     
     So you think you've had router problems...
     
     Last week, our web site/email server/CVS tree were not accessible
     for about 7 hours on Thursday afternoon. Since everything is hosted
     in Kansas City (at our parent company's co-lo site), it makes it
     difficult for us to do much hands on troubleshooting. So, whenever
     things go down, we take the following steps:
     
     * Blame BellSouth. We're on xDSL from our current offices and with
       the stunning reliability of their service offering, this is
       usually a pretty safe bet. It all goes back to Occham's Razor...

     * Wait either 5 minutes or reboot the xDSL modem and/or our NAT
       gateway.

     * If the rest of the world is reachable except for our stuff, ping
       www.atipa.com. Since they are co-lo'ed at the same site, it's a
       good check to see if it is just us or a connectivity issue.

     * If it's not just us, call Jay. Our fearless (and remarkably
       talented) local sys-admin in KC.

     * If it is just us, call Jay. He usually doesn't do have to do
       anything, but he calms us down and the problem is generally
       cleared by the time we're off the phone. We chalk it up to "black
       magic". Jay chalks it up to PEBKAC. (Problem Exists Between
       Keyboard And Chair)

     * If all else fails, whine to Ben.
       
     Well this week, while following these instructions, we hit two
     snags we hadn't encountered before. We couldn't get to our
     equipment or the Atipa gear either. So call Jay. Jay wasn't there.
     He was at another Atipa facility in New Hampshire. Thus, screwed we
     were. But, we didn't yet know to what degree.
     
     So, we then call Jay's back-up in KC. After some serious exhaustion
     of ideas and what could possibly be the problem, our crack team of
     technicians note something peculiar--the router is gone. Not gone
     as in "not working", or even gone as in "smoldering in a heap in
     the corner". This particular "gone" is as in stolen. Snatched.
     Pilfered. An extremely neat and tidy little 1U rack space opened up
     for us involuntarily by a third-party technician. It was gone.
     
     Fortunately, upon noticing this trifling bit of technical detail,
     we were able to call in the local router-jock-for-hire company (who
     did a great job under odd circumstances, but whose name I currently
     have forgotten), and they configured up a spare which was hidden in
     a somewhat less conspicuous spot. Considering the fact that we were
     shy a router and we had to involve a hired-gun router jock at the
     last minute, I figure 7 hours really isn't that bad. Hell, Mike's
     broken the build for longer than that, and more than once!
     
     Anyway, that was the excitement for last week. Well, part of it...
     
     This week's excitement technically started last Friday, when
     BellSouth installed our new phone lines at our new digs. Since none
     of us have actually been there yet, I'm going to give them the
     benefit of the doubt and assume that they are installed, up and
     working. Why give them this benefit? Only because they so
     thoroughly proved their efficiency at disconnecting our current
     phone and xDSL connections in our current office space.
     
     Evidently, every company that ever moves always moves their
     service. They never install new service at a new facility and leave
     their existing service in place to allow for overlap. Evidently.
     
     So we find out that the data service is down late Friday evening.
     Since it is down with such shocking regularity, we didn't think
     much of it. When it was still down all day on Saturday, we called
     in to BellSouth FastAccess support (1-888-321-2375, ask for Sherard
     and tell him Shane sent ya) and were told that BellSouth was
     "experiencing problems in the Raleigh area". So, we thought nothing
     more of it. Until Sunday, when it still wasn't up, at which point
     it was easier to assume we just needed to reboot the modem than to
     actually drive in and verify it. Bad move #1.
     
     When we got in Monday morning, we found out that both data and
     voice lines had been disconnected. The FastAccess billing folks
     told us that what was a T-order (To:) had been completed as a
     T&F-order (To: and From:), meaning that our service now existed at
     the new office. Which is great for the painters and carpet-layers
     who are currently occupying that space. We, in the meantime, were
     officially screwed, with a big ol' BellSouth seal of approval.
     
     So the lady at BellSouth billing who I had to deal with (who was
     very pleasant and was trying to help) actually got our phone line
     repair expedited and said that she'd turn the xDSL over to the xDSL
     support group. At this point, I had no idea how badly I would be in
     need of an xDSL support group...
     
       Shane: Hi, my name is Shane, and I'm an xDSL user.
       Chorus: (in unison) Hi, Shane!
       
     So, 3:30pm Monday rolls around and finally--dial-tone! Sweet, sweet
     dial-tone. And no xDSL.
     
     So, I call BellSouth FastAccess support and talk to someone who
     says that they can't put xDSL service on that line because there is
     a disconnect order on that line. So I ask them how they can take
     that order off, and they tell me they can't, without disconnecting
     the line, of course. Well, if that's not an option, we'll have to
     re-provision xDSL, and that will take a minimum of another 4 days.
     This is despite an earlier rep telling me that it was just a matter
     of "turning it on" once the phone service was in.
     
     Unsettled and not content with that answer, I hang up and call
     right back and get someone who knows what they are doing. Or at
     least that's what I assumed, since they were telling me what I
     wanted to hear. After 2.6 hours on hold (check the logs), and me
     having them call me back on my cell so I could at least go home, my
     man Sherard calls me back and says no sweat, all I've got to do is
     call in the next morning at 8am and tell billing to turn it on.
     Piece of cake. So until 8am, I drink hard and sleep well.
     
     The day is now Tuesday. The time is 7:45am and I'm in my car on the
     way to work and anxiously watching the clock on my dashboard so I
     can call in right at 8am. At 7:59, I start the process, so as to
     allow time to get through the menus. 8am hits, and I'm on hold
     again, being advised that an operator would be with me in less than
     one minute. 2 minutes later (but I'll let it slide), I start my
     shpiel again. Finally, I'm advised that whoever told me that all I
     had to do to get my service back up today was to call in just
     didn't know what they were talking about! I truly hope Sherard is
     reading this to hear just how quickly his co-workers turned on him.
     They tell me that the best they can do is 4 days. So I play that
     ultimate trump card--"Can I speak with your supervisor?"
     
     Mitchell hops on the phone roughly 5 minutes later totally
     unapprised of my situation, so again I roll with the shpiel. He
     confirms to me that Sherard (without mentioning a name) was
     certifiably nutso, but that he'd try to help. So he put me on hold
     for 15 minutes while trying to reach xDSL provisioning, who told
     him we couldn't do anything until we cancelled our service
     cancellation order (the "F" in T&F), which we never placed in the
     first place. So to make everyone happy, we cancelled our unordered
     cancel order. And now I'm getting dizzy.
     
     Finally, they say "hopefully, sometime today". I then get to the
     office to find Ben on the phone with them as well. After trying the
     whole thing from another angle, we end up with the same answer. We
     then focused on getting a dial-up connection from our NAT gateway
     up so we could at least get email. God bless wvdialconf, possibly
     the greatest utility ever written for the Linux platform. We were
     soon up and running, but using the only dial-up line in the office,
     and the line that BellSouth was going to call back on. Nonetheless,
     we went forward and dared them to call us.
     
     After another full day of being down (except we now had a shared
     dial-up connection and an rsync-ed copy of CVS so we could work), I
     left at 5pm (?!?!?) to write this from home.
     
     Anti-climactic Conclusion: I just checked and the connection is now
     up. The only remaining questions are: how much is BellSouth going
     to charge us for a disconnect and an expedited re-connect, and how
     much is Mindspring going to charge me for leaving my dial-up
     account nailed up all day. And of course the ultimate question: Do
     the lines at the new office really work?
     
     Current odds around the office are 8:1 against it, and I'm taking
     as much of that action as I can get. And besides, if they aren't
     up, I might at least get to break the news to Sherard about his
     fickle co-workers...
     
New Releases Pending:
     
     Effective with some successful merging of development branches
     which began today, we anticipate a 0.7.1 release as early as Friday
     of this week, which will be based on tomorrow night's CVS snapshot.
     
     Pending successes there, we'll likely have a 0.6.2 stable release,
     including RPMs, available sometime early the following week.
     
     These new releases will include some bug-fixes and more Web UI
     functionality. They'll be worth the download for the bug-fixes, but
     you'll stay for the functionality.
     
     Watch for these later on. They'll also be announced on Freshmeat,
     as always.
     
Office Move - Still Pending:
     
     Some minor construction still going on, as well as questionable
     data service. The furniture is supposed to be delivered on the 20th
     or so, and we've got some hardware that's supposed to show up not
     too long after that, so we'll probably be in later that week.
     
     The offices are going to be a refreshing change of scenery for us.
     It gets old when the most interesting part of your work environment
     is the fact that you are between the high school and the mall. We
     can't count on power, and we sure can't count on phone service, but
     we can bank on the steady stream of high school kids showing us a)
     their baggy pants, b) their cool tattoos, c) their hair cuts (tres
     chic), or d) just how old we really are.
     
     What will we miss about the current offices? Oh, there's plenty.
     For example, the family of ladybugs that lives in the window track
     (yes, we work in a building where the windows actually open!). And
     of course, the rust-stained ceiling tiles in the second floor of a
     third-story office building. How'd they get stained? Your guess is
     as good as mine. And I would be remiss not to mention old Mr.
     Chalky, the faint chalk outline on the floor of the prior office
     resident.
     
     But enough of my rantings, don't we actually work for a living?
     
Coding Projects Underway:
     
     * Snort Integration -- Initial design work is underway, with some
       pre-alpha functionality demo'd in Perl. Need to do some serious
       nuts-and-bolts analysis of this integration before proceeding.
       Still very early in this effort.

     * Solaris Port Postgres Procedures -- Underway. No update.

     * Postgres for NT -- As far as we know, this will work, but we still
       haven't heard back definitively from someone who has tested it.
       There are some additional hurdles to jump for the Win32 platform,
       now that we have a dependency on a portmap service for NT...

     * Portmap for NT -- There is one that ships with NT/2000 that
       _should_ work, but we haven't tested it. There is another one
       referenced at http://www.plt.rwth-aachen.de/ks/english/oncrpc.html
       which is basically from the same project as the Java RPC libraries
       we are using. This is probably worth a look for those of you
       interested in running on NT.

     * SNMP Poller/Data Collection -- The Web UI is alive, and we are
       talking about some tweaks to the default RRD formats. Thoughts on
       this? Let us know.

     * Event DTD -- Changed yet again.

     * User Interfaces -- Some bug fixes are in. Others pending. Larry's
       still adding features/functionality to the Web UI.

     * SCM UI -- Replaced with "./opennms.sh scm status"

     * LDAP Poller -- We're in the infancy of this one. If you want in,
       let me know.

     * Maji Prelim Work -- Rick is building Perl code that is
       successfully parsing MIB files. Check him out, in all his glory,
       on the "events" list.

     * Notification Configuration -- Actively being moved to the Web UI.

     * Swing Interface -- Fighting random oddities. Proceed with caution.

     * Discovery/CAPSD/Database Review -- Revisiting the way Discovery
       and capsd communicate, verifying that stuff is accurately written
       to the database, and adding some maintenance functionality we
       didn't have previously. Mike's the man!
       
===================
Upcoming Road Shows
===================
   
   Hopefully, we'll have to add a regular section on "Seeing OpenNMS in
   Print"! If you aren't on Network World Fusion's email list on network
   and systems management, you missed their article on ten cool open
   source network management tools, which mentioned us. Kind of a goofy
   article overall, but hey, it's nice to see the name in print.
   
   There's also a nasty rumor about this month's issue of Enterprise
   Linux magazine, but I'll believe it when I see it...
   
   On with the road shows...
   
     * May 5th - Twin Cities LUG, Minneapolis, MN

     * May 10th - Boulder LUG, Boulder, CO

     * June 1st - NOVALUG BBQ!! Fire-eaters Unite!!

     * June 2nd - Northern Virginia LUG (NOVALUG), Alexandria, VA

     * June 11-15 - OpenView Forum 2001, New Orleans, LA

     * July 23-27 - O'Reilly Open Source Convention, San Diego, CA
       
   For additional details on these appearances and others, check out the
   web site at http://www.opennms.org/sections/opennms/events
   
============================
Early Adopter Program Status
============================
   
   Jeff has had some minor successes. At one site, Jeff was fighting with
   notification (which is in the product and works, thank you very much),
   and was having some problems with false outages. A few tweaked
   parameters that he hadn't tweaked before (maybe hadn't SEEN before)
   and suddenly, we fixed that, but also may have exposed a
   misconfiguration that could have been causing other problems. Gotta
   love the minor wins!
   
   We've added another site to the EAP program, and are getting close to
   a saturation point. If you or your company may be interested in
   participating, go to the web and fill out the form. Luke and Jeff will
   be in touch.
   
=============
The Wish List
=============
   
   Later this week, we'll begin working with our first potential
   contributor working under a government grant. That whole deal is not
   yet final, so we're not resting on those laurels yet, but hey, if
   Uncle Sam wants to give us some money, all we need is an ethernet jack
   in Cheney's pacemaker, and we'll do our best to help out where we can.
   
   Otherwise, on with the list...
   
     * In the 0.6.x release (and CVS), checkout the TODO file

     * More Data Collection configs wanted for the DataCollection.xml

     * Any interest in more TCP pollers? Let us know (or better yet,
       build one yourself...)

     * LDAP Poller

     * nmap Poller (That idea came in via email this week. Cool!)

     * Documentation and development your game? How about a white paper
       on how to extend OpenNMS with custom pollers, custom configs,
       and/or your own scripts/code.

     * Testing on new, exciting platforms is always appreciated. Somebody
       want to mess with the Cygwin port of our Postgres stored
       procedures and see where we stand?

     * Any additional help we can get proving our documentation either
       right or wrong is appreciated. Thanks.

     * Got any creative applications for OpenNMS that we haven't
       considered? Let us know!

     * A Security analysis of OpenNMS?


=============       
Afterthoughts
=============       
   
   Following the first section, I'm just about all ranted out. So I'll
   take this chance to catch up on the comments we've been receiving
   regarding name resolution and dependencies between an NMS and external
   DNS servers.
   
   We've discussed several options, including running a cacheing DNS
   locally (which still leaves you with dependencies on an external DNS),
   creating our own local /etc/hosts file for exclusive name resolution,
   built from an nslookup or zone transfer script (kludgy at best), and
   then the solution we're pretty much settled on, which comes pretty
   close to Roger Zenker's description. Somebody buy that man a beer.
   
   In brief, what we're currently thinking about doing is resolving the
   IP address to a name as we do the capabilities check on a node at
   discovery time, and writing that name to the database, then refreshing
   that name during the capabilities re-scan, which happens, by
   default, on a 24-hour interval (but is configurable). Of course, we'll
   also provide a utility to force that change if you need it to happen
   prior to the re-scan.
   
   Your ideas were all helpful (except for the questions about how we,
   Snort, and Samba use different files for syslogging ?!?!), and all
   figured into the direction we're currently pursuing. Thanks again for
   the open discussion and stay tuned--there are plenty more questions to
   come, like this one:
   
   What's the best algorithm for associating a name with a node? DNS only
   associates a name with an IP address, which is associated with an
   interface, and by definition, a node can have more than one interface.
   So which is the right name to associate with it? We're familiar with
   OpenView's algorithm, which seems reasonably good, but are the
   situations that aren't well-addressed by it (briefly, it's node name
   equals the first of whichever is available: SNMP sysName, hostname for
   software loopback interface, or hostname of interface with
   lowest-numbered IP address.) So, whaddaya think? Please take your
   responses to the [discuss] list.
   
   And as always, thanks for your support. Not you, Sherard.
   
XXXs and OOOs to BellSouth,

Shane O.
========
Shane O'Donnell
OpenNMS.org
shaneo@opennms.org
==================
_______________________________________________
announce mailing list (announce@opennms.org)
To subscribe, unsubscribe, or change your list options, go to:
http://www.opennms.org/mailman/listinfo/announce