[LWN Logo]
[Timeline]
Weekly Edition
Daily updates
Events Calendar
Book reviews
Penguin Gallery

About LWN.net


This article was contributed by Giorgio Zoppi, who translated it from the original Italian. For those who prefer it, the Italian text is also available.

Exploring SGML DocBook

April 24, 2000
Giorgio Zoppi

Introduction

Suppose you have to write a handbook for your project and you have no much time, which editing tool would you use? In Linux several solutions exist, one of these is using SGML DocBook.

SGML, acronym for Standard Generalized Markup Language, is a markup structured language. The essential characteristic of a markup language is that it explicitly distinguishes between document's semantics and its structure, emphasizing the latter.

Moreover, since document's final result is usually based on its structure and its contents, SGML marks and determines how the general lines of a given document it will appear to the user, opening the possibility to use a single source in order to create differents formats. Every SGML document makes reference to a set of declarations, called the "DTD" (Document Type Definition), which contains a list of the available tags and rules on how they may be combined.

The most famous DTD for SGML is certainly HTML, that surely you used when you built your web site. Docbook is a DTD purposely designed (and widely used) in order to create articles, books, and other technical works, but that does not exclude that it cannot be used for other document types.

Document Creation Process

Docbook document creation process is substantially different from the simple editing with a wordprocessor, because document's structure is partially independent from its representation.

For example, when we create a table with a wordprocessor, in practice we "draw" it. Instead in Docbook, we indicate that in a such position, there's a table with given characteristics. We can characterize this creation process in this way:


The figure indicates to us the different steps in order to obtain a document using SGML DocBook:

  • Edit SGML source.

    This first phase consists of the structured creation of a document, which can be a book, an article or a presentation, describing through tags defined in the DTD how the document is made up. Substantially an SGML file is a text file, with some markers (tags). These markers structure the document, dividing it if necessary into paragraphs, giving it a title, assigning an author, and supplying all the characteristics necessary in order to represent it in the best way possible.

  • SGML file verification.

    This part of the job is generally executed by a SGML parser (two common parsers are SP and OpenSP), which verify the document's syntactic structure, checking against the DTD's declarations. This verification is not necessary to be able to execute the successive phases: if the parser discover some errors, you'll still be able to use DSSSL parser, but in the output of this last one, there will be errors in the DSSSL stylesheets application. The DSSSL (Documentation Style Semantics and Specification Language) is a language for creating stylesheets. Stylesheets are used both is for creating a printable document and for publishing the document online. Informally speaking a stylesheet gives form to your resulting document.

  • Converting SGML file to an output format using DSSSL stylesheets.

    In order to transform the simple SGML applying DSSSL stylesheets (for further information about this topic see this site: http://www.netfolder.com/DSSSL/index.html) there is a utility called OpenJade. OpenJade allows us to obtain from a SGML file, as processing result, different formats: including RTF, HTML and TeX.

    Obviously to create a stylesheet is not a simple task, because you need to know the Scheme language. This isn't a problem for us, because there are many stylesheets already available: Norman Walsh's modular stylesheets, available at http://nwalsh.com/docbook/dsssl

  • Transformation to the TeX Format.

    One of the possible format output resulting from OpenJade processing, is TeX. From TeX format, we can obtain several other formats, such as Adobe PDF and PostScript. In order to have as final results the PDF or PS format we need a set of macros for TeX, called JadeTeX. In this case we have the following phases:


    Lets suppose that your document is called miodoc.sgml and we want todescribe the above figure

    miodoc.sgm --> openjade --> miodoc.tex
    
    # if the format that I want is pdf then:  
    
    $ pdfjadetex miodoc.tex;  
    
    # otherwise if it is PS then:  
    
    $ jadetex miodoc.tex
    
    $ dvips - or miodoc.ps miodoc.dvi
    
    # if I want to obtain file pdf from the PS then:  
    
    $ ps2pdf miodoc.ps
    
    

    Now that you know what Docbook is, I will show you how to setup DocBook tools, needed for processing our documents, before creating our first document.

Installing the tools for DocBook

The installation of all the utilities which are necessary in order to create a correct DocBook document requires the following steps:

  1. Install ISO entities
  2. Install Docbook DTD Catalogue
  3. Install OpenJade and OpenSP
  4. Install Norman Walsh's stylesheets
  5. Download JadeTex macros, compile them in order to recognize our language
  6. Optionally install sgml2x

Before proceeding to the several installations, we should create the directory in which we would want to put the DTD, the ISO Entities and the stylesheets. We suppose that the directory is /usr/lib/sgml.

Installing ISO entities

DocBook makes reference to a standard entities series, that you can download from http://www.oasis-open.org/cover/ISOEnts.zip. These entities come from the docbook.cat file; to install it, you have to download it, and simply unpack in the directory /usr/lib/sgml.

Unfortunately the names in the docbook.cat do not match, we must edit that file as described below in order that all work fine.

Installing the DocBook catalogue

I do not want to spend more than two words on this step because the documentation at http://www.docbook.org is really exhaustive. However it's sufficient to download the DocBook files, unpack them, put them in the directory /usr/lib/sgml. At this point you have to edit the catalog file (docbook.cat or catalog) in order to match ISO entities and finally to setup the enviroment variable SGML_CATALOG_FILES in your .bash_profile. This variable must point at your catalog file (i.e. SGML_CATALOG_FILES=/usr/lib/sgml/catalog).

If you don't have time to spare for these two first steps, I advise you to download a docbook rpm or deb package.

Installing OpenJade

In this case the things are quite simple; once you have downloaded the package from the http://openjade.sourceforge.net, unpack it with tar and then do the classic:

$ ./configure 
$ make 
$ make install

There still are no rpm or deb packages of this software, however for those people who are not able to compile the application there is a similar application. It's called jade, and you can find it at http://www.jclark.com.

Installing stylesheets

Download them from http://nwalsh.com/docbook/dsssl, and unpack them in the directory /usr/lib/sgml.

Installing JadeTeX

This is the most boring part of this article, above all for those who do not know TeX well. JadeTex in fact is not other that a set of macros for TeX, we must therefore compile these macros. In fact every time you give to TeX a file resulting from OpenJade processing, TeX must parse this file and give you an output format such as PDF or DVI.

In order to make so that all works correctly, we must extend TeX, because it was designed to use a limited amount of resources during documents processing. For example, there is a maximum dimension and a maximum strings number that TeX can allocate in the stack.

If a processing in execution exceeds these limits, TeX will complain and refuse to continue processing. Unfortunately, the OpenJade backend for TeX exceeds these limits.

In order to extend Tex it is necessary to modify the texmf.cnf file in your TeX installation, adding the corrected parameters in order to JadeTeX to work. You can find the file texmf.cnf with the following commands:

$ kpsewhich - expand-var '$TEXMFMAIN' 
/usr/share/texmf

and therefore then:

$ cd /usr/share/texmf/web2c $ 
ls - l texmf.cnf 
-rw-r--r-  1 root root 1997 Apr 6 1999 texmf.cnf

Before modifying this file it is necessary to check if the hugelatex application exists in the current path:

$ which hugelatex

If you use TeTeX, this file is not present. hugelatex is a LaTex version with higher resource limits for the ability to processing of documents. If it is not present it is necessary to recompile LaTex. To recompile you have to go in the directory /usr/share/texmf/tex/latex/config, where you will find all the files needed in order to recompile LaTex. Copy these files into /tmp/config. And type:

$ cd /tmp/config 
$ tex -ini -progname=hugelatex latex.ini
This last command will make a file called latex.fmt, rename it to hugelatex.fmt, and then move it to the directory web2c:
$ mv latex.fmt hugelatex.fmt 
# mv hugelatex.fmt /usr/share/texmf/web2c

Now we need to update the file texmf.cnf in the directory web2c, but first we save the original file:

# cp texmf.cnf texmf.cnf.orig
You open the textmf.cnf file, in the middle you will find the parameters similar to this:
pool_size = 125000 
pool_size.context = 750000
Make in way that yours parameter they correspond to those indicated over and then you add the following lines:
% hugelatex settings 

extra_mem_top.hugelatex=400000
extra_mem_bot.hugelatex=400000 
hash_extra.hugelatex = 15000
pool_size.hugelatex = 5000000 
string_vacancies.hugelatex = 45000
max_strings.hugelatex = 55000 
pool_free.hugelatex = 47500
nest_size.hugelatex = 500 
param_size.hugelatex = 1500
save_size.hugelatex = 5000 
stack_size.hugelatex = 15000 

% jadetex and pdfjadetex 

hash_extra.jadetex = 20000 
hash_extra.pdfjadetex = 20000
pool_size.jadetex = 300000 
pool_free.pdfjadetex = 300000

Now we've to create a link symbolic from Tex to hugelatex.

# ln -s /usr/bin/tex /usr/bin/hugelatex

Creating jadetex and pdfjadetex

At this point it we need to compile JadeTex. You go under the directory dsssl in the OpenJade package, where you will find all the necessary files in order to create jadetex and pdfjadetex. Before running the compilation, if you do not speak English, JadeTeX must be told to recognize your language. To do this you have to modify the file called jadetex.dtx, placed in OpenJade dsssl directory. You have to open that file and when you find:

      \RequirePackage[german, frenchb, english]{babel}[1997/01/23 ] 

Replace such line in this way:

      \RequirePackage[your_language, german, frenchb, english]{babel}[1997/01/23 ]

Now save all and you're ready for the compilation, from root do:

      # make -f Makefile.jadetex install 

This command creates the file jadetex.fmt and pdfjadetex.fmt and it puts them in the directory /usr/share/texmf/web2c. Now all is left to do is create the following symbolic links:


      # ln -s /usr/bin/tex  /usr/bin/jadetex 
      # ln -s / usr/bin/pdftex /usr/bin/pdfjadetex. 

and to execute the command texhash so that yours Tex package sees the newly installed files.

Optionally install sgml2x

sgml2x is a Perl script, that I've written in order to simplify the formatting process of a SGML file. It is available at http://sgml2x.sourceforge.net. Its installation is simple, you have only to run make, after downloading and unpacking the tarball. Subsequently you have to edit the file /etc/sgml2x.conf, putting the stylesheets directory.

I'd like spend two words in more on this script. It's a frontend for OpenJade and OpenSp, executing automatically all the phases described in figure 2. Let's see an example:

 
$ sgml2x --format html /home/deneb/articles/docbook.sgml 

/usr/bin/onsgmls:I: OpenSP version " 1.3.4 " 
sgml2x - (c) 2000 Giorgio Zoppi - email: gzoppi@cyberspace.org 
Loading init file: / etc/sgml2x.conf 
Just a moment: I m searching SGML Parser... found 
Just a moment: I' m searching openjade... found 
Start parsing SGML File... 
Parsing Ok. 
Let' s starting coversion in your specified format! 
Check out parsing.log, for warnings!
Press any key to continuous... 
Start parsing SGML File by jade... 
Your html web pages are ready in the directory:/home/deneb/articles/docbook 

And simply you will have your files in the directory indicated. Now I belive that we are ready in order to write our first document in DocBook.

Are you ready? Here the first document

Now i'd like to show you a hypothetical article, containing an image, one list and one formal table. Every line of this example will be explained, so that then you can with little effort, reading the handbook of DocBook at http://www.docbook.org, and create your own documents. Let's see the example then (the words after # is a my comment and they are not integrating part of the text):

Example 1. The first document in DocBook

<!DOCTYPE article PUBLIC " - / / OASIS//DTD DocBook V3.1//EN">

The first thing that you should note in a SGML file is the DTD definition. Every SGML document must begin with one document type declaration. This identifies the DTD that will be used in the document and which will be the element root, in this case article.

< article class="whitepaper" id="first-example" lang="en">

Every article begins with a header, where there are the title, author and the several revisions of the same article.

# tag of beginning header <artheader> # Title's document definition <title>First Example</title> # tag inherent information about author <author> # author's firstname <firstname>Giorgio</firstname> # author's lastname < surname>Zoppi</surname> # other information about author <affiliation> # author's address <address><email>gzoppi@cyberspace.org</email></address># end tag for other information </affiliation> # end tag for information about the author </author> # tag for end header </artheader>

This is the classic definition of header for an article. Now I will present pair of essential notes. The first is that every article must have a definition for the header. The second one is that every tag that previously had been opened (i.e. < nametag>), must then be closed (i.e. < nametag>). Well, we can continue with the example:

# I open one section. Every section needs a title. Our title is First Para. 
<sect1><title>First para</title>

This creates a first levelsection (subdivision of the document), that it needs of a title.

# I open a paragraph with the tag para 
<para> In this paragraph I insert a figure:  
# and I insert a figure 
<figure><title>A penguin in action</title>
<graphic fileref="fig/fig2">
</graphic> 
# I close the element figure 
</figures> 
# I close the paragraph 
</para> 
#I open a subsection of the previous section and i assign to it a title 
<sect2><title>First subsection</title> 
# I open a paragraph 
<para> Here a table:  
# I add one table 
<table> 
# I give to the table a caption 
<title>Table 1</title> 
# I logically divide one table by tgroup, and i give it the number of 
# columns (cols) and the name of the same ones (colname).  
<tgroup cols="5"> 
<colspec colname="coln1"> 
<colspec colname="coln2"> 
<colspec colname="coln3"> 
<colspec colnum="5" colname="coln5"> 
#it spans two columns, specifically the first and second column 
<spanspec namest="coln1" nameend="coln2" spanname="span-horiz" align="center"> 
# it spans two columns vertically specifically the second and the third column 
<spanspec namest="coln2" nameend="coln3" spanname="span-horiz-vert"align="center">
#defines the headings of the table through the section thead 
#beginning thead (header table) 
<thead> 
# heading line 
<row> 
# horizontal cells span 
<entry spanname="span-horiz"> 
<tbody> 
# first line 
<row> 
# various entries from first to last column 
<entry>Data11</entry> 
<entry>Data12</entry> 
<entry>Data13</entry> 
<entry>Data14</entry> 
<entry>Data15</entry> 
# end first line 
</row> 
# second line 
<row> 
# various entries from first to last column
<entry>Data21</entry> 
<entry>Data22</entry> 
<entry>Data23</entry>
<entry>Data24</entry> 
# more lines, span vertical, the attribute valign is the alignment 
<entry morerows="1"valign="middle"> vertical Span </ entry> # fine second gear line <
/ row> 
# beginning third tiga <row> <entry>Data31</entry> <entry
spanname="span-horiz-vert" morerows="1"  table </tgroup> 
# fine
table 
</table> 
# fine paragraph 
</para> 
# end subsection 
</sect2> 
<sect2><title>Second subparagraph</title> 
<para>Here's a list:</para> 
# the tag itemizedlist it's for create a list, 
# instead the attribute mark is to indicate the pointer type to the list, 
# two used values are:  opencircle and bullet. 
# Every element comes preceded from a tag listitem, 
<itemizedlist mark=opencircle> 
# first element 
<listitem><para>First item</para>
</listitem> 
#according to element 
<listitem>
<para>Second item</para>
</listitem> 
#end list
</itemizedlist> 
# end second subsection 
</sect2> 
# end section 
</sect1> 
# end article 
</article>

Conclusion

Scope of this article was essentially to introduce you to DocBook, and to make to see you how works, therefore this article is now finished. We've learned to install and to make a short document. This article is not a complete description of DocBook; a complete description requires more then an article, but writing such articles I've afraid to duplicate already existing documentation. Do you want know more? Go to http://www.docbook.org.



Eklektix, Inc. Linux powered! Linux ® is a registered trademark of Linus Torvalds