From: Evelyn MitchellTo: frypthoneers@lists.tummy.com Subject: FRPythoneers Meeting Notes 2001-04-16 Date: Tue, 17 Apr 2001 10:10:27 -0600 Cc: claird@neosoft.com, lwn@lwn.net, webmaster@python.org Front Range Pythoneers April Meeting 10 in attendance Mike Olson of Fourthought (www.fourthought.com) presented on Processing XML in Python. I'd heard a bit about XML before this presentation, but had never programmed an XML application in any language. Mike covered the available tools in Python for working with XML including parsers, DOM (Document Object Model) tools, XSLT (Style sheet transformation) tools, XPath (a query language) tools, and RDF (Resource Description Format, the semantic web) tools. Fourthought has been developing XML tools for several years, and is active in core Python development of XML tools through the XML-SIG. All of the Fourthought tools described are available for download from their website. They've developed a rather nice server (4Suite Server) which stores XML and RDF parsed documents, allowing for quick deployment of XML-based applications. Most of the people in the audience had done some XML development, so the questions were on target. Mike was very well prepared, with deep knowledge of both XML and Python. While I was taking notes during the presentation, I was reminded once again while I like Python. I was transcribing the code samples on his slides, and I realized that in the few moments I had before the discussion moved to the next topic, I was able to visually proof the code for correctness. I don't guarantee that any of the code samples in my notes will actually run, but they might. Thanks Mike, for a great presentation! And, thanks to Ken for volunteering to help out with our (non-existant) website! We'll be putting up a Zope site on www.fr.pythoneers.com within the next few days. Our next meeting will be at Abacus Direct in Interlocken in Broomfield CO on May 21 at 6:30 (presentation starts at 7:15, arrive early as the doors are locked after 7). Mailing list information and archives are at http://lists.tummy.com/mailman/listinfo/frpythoneers Evelyn Mitchell efm@tummy.com XML eXtensible Markup Language tag based everything needs a closing tag SGML spawned both HTML and XML SGML is too difficult to parse XML is a more restricted format, easier to use Included in Python 2.0 xmllib simple python parser pyexpat wrapper around expat parser minidom dom implementation Other: xmlProc validating parser 4DOM DOM 4XSLT XSLT processor Redland and 4RDF RDF implementation 4XLink XLink implementation Python has strong XML support in the language, and from third parties pyxml.sourceforge.net XML Parsers xmllib 100% pure python uses regular expressions to parse XML user defines call-backs on a per element basis namespace support not validating (Validating: well formed [has closing tags], valid when compared against the DTD for that document, the correct character set [expand entities]) xmllib-ex.py ----------------------------------------------------------- import xmllib class Parser(xmllib.XMLParser): def __init__(self): xmllib.XMLParser.__init__(self) self.elements = {'ENTRY':(self.start_entry, None),} def start_entry(self,attrs): print "Entry with ID=%s" % attrs['ID] p = Parser () p.feed(XML_SRC) p.close() ----------------------------------------------------------- Pyexpat James Clark's expat parser very fast handlers added as any python callable object pyexpat-ex.py ----------------------------------------------------------- import string from xml.parsers import expat def start_elements(name, attrs): print 'Start element:', name, attrs def end_element(name): print 'End element:', name def char_data(data): data = string.strip(data) if data: print 'Character data:', string.strip(data) parser = expat.ParserCreate() parser.StartElementHandler = start_element parser.EndElementHandler = end_element parser.CharacterDataHandler = char_data parser.Parse(XML_SRC) ----------------------------------------------------------- xmlProc only fully validating python parser Lars Marius Garshol import string from xml.parsers.xmlproc import xmlproc class Application(xmlproc.Application): def handle_start_tag(self, name, attrs): print 'Start element:', name, attrs def handle_data(self, data, start, end): data = string.strip(data[start:end]) if data: print 'Character data:', string.strip(data) def handle_end_tag(self.name): print 'End element:', name builder = Application() parser=xmlproc.XMLProcessor() parser.set_application(builder) parser.parse_resource('addr_book1.xml') ----------------------------------------------------------- Simple API for XML 2 (SAX2) written in python builds on top of pyexpat and xmlproc to provide SAX and SAX2 interfaces DOM Document Object Model miniDOM, included in Python 2.0 4DOM, included in XML-SIG pDomlette/cDomlette, included in 4Suite a DOM is like a UML class diagram for the document miniDom Paul Prescod lightweight, 'pythonic' DOM DOM XML and Core recommendations only children of the document are a python list minidom.parse(stream, parser=None) minidom.parseString(string, parser=None) returns an instance of the document interface Node.toxml() Node.toprettyxml(indent='\t', newln='\n') import sys from xml.dom import minidom, Node dom = minidom.parseString(XML_SRC) for c in dom.documentElement.childNotes: if c.nodeType == Node.ELEMENT_NODE { print "ID = %s" % c.getAttribute('ID') if sys.hexversion < 0x2000000 { dom.unlink() ----------------------------------------------------------- 4DOM written by Fourthought Fully compliant DOM level II some 'pythonic' interfaces DOM Core, XML, HTML, Partial Events, Traversal, and Partial Ranges xml.dom.ext.reader.Sax xml.dom.ext.reader.Sax2 xml.dom.ext.reader.PyExpat xml.dom.ext.reader.HtmlLib xml.dom.ext.Print(root, stream=sys.stdout, encoding='UTF=8') xml.dom.ext.PrettyPrint(root, stream=sys.stdout, encoding='UTF=8', indent=' ') xml.dom.ext.XHtmlPrettyPrint(root, stream=sys.stdout, encoding='UTF=8', indent=....) import sys from xml.dom.ext.reader import PyExpat from xml.dom import Node reader = PyExpat.Reader() dom = reader.fromString(XML_SRC) for c in dom.documentElement.childNodes: if c.nodeType == Node.ELEMENT_NODE { #PyExpat Reader is NS aware! print "ID = %s" % c.getAtttributesNS('','ID') if sys.hexversion < 0x2000000 { reader.releaseNode(dom) ----------------------------------------------------------- XPath primarily used to select a set of nodes from a single document a language which is used to select from a given context build complex searches in steps powerful and extensible filtering mechanism one liners (no comments) From Root: descendant::ENTRY --> All ENTRY elements in the docuemnt From ADDRBOOK: child::ENTRY/ADDRESS[text() = '42 Spam Blvd'] --> Returns ADDRESS of second ENTRY From anywhere: //ENTRY[PHONENUM/@DESC = "Work"] --> Returns all ENTRY elements with a PHONENUM child that has a DESC attribute equal to "Work" 4XPath included in 4Suite PyPath, included in XML-SIG (based of 4XPath) 4XPath written by Fourthought fully XPath 1.0 compliant import sys from xml.dom.ext.reader import PyExpat from xml import xpath reader = PyExpat.Reader() dom = reader.fromString(XML_SRC) #XPath expressions can be precompiled #expr = xpath.Compile("/ADDRBOOK/ENTRY/@ID") ids =xpath.Evaluate("/ADDRBOOK/ENTRY/@ID", contentNode=dom) for id in ids: print "ID = %s" % id.value if sys.hexversion < 0x2000000 { reader.releaseNode(dom) ----------------------------------------------------------- Extensible Style sheet Language Transformation (XSLT) Used to transform XML into different format (HTML, XML, PDF, RTF, etc) Matches nodes from a source document Builds from XPath language Extremely extensible a real programming language 4XSLT included in 4Suite Sablotron, written in C++ with Python wrapper Xalan, python wrapper 4XSLT written by Fourthought Fully XSLT 1.0 compliant Stream, Uri, String, Dom stylesheets and documents import sys from xml.xslt import Processor processor = Processor.Processor() processor.appendStylesheetUri('addr_book1.xsl') res = processor.runUri('addr_book1.xml') print res ----------------------------------------------------------- Resource Description Framework (RDF) A framework for making metadata statements about resources A statement is a simple assertion of subject, predicate, object RDF schemas allow for validation of statements Redland RDF, C based by Dave Beckett, wrapped for Python 4RDF for Fourthought 4RDF Model manipulation and basic searching Serialization and Deserialize of standard RDF Schema support Optional persistence through Postgres, Oracle and Shelve RDF Inference Language import sys from xml.om.ext.rader import PyExpat from Ft.Rdf.Serializers.Dom import Serializer from Ft.Rdf.Drivers import Memory form xml.dom.ext import PrettyPrint from Ft.Rdf import Model mem = Memory.DbAdaptor('') mem.begin() 4Suite Server XML Server Optimized XML processing for XSLT, XLink, Xpath and XPointer Automatic RDF Meta-data Management Access to XML data through CORBA, raw HTTP, Soap, webDAV, GUI, and command line User and group based security Distributed Transaction system Event generation and trigger system