Enforcing XML Development Guidelines using Schematron

In SOA projects, we produce lots of XML artifacts. There are BPEL processes, WSDLs and XSDs, SCA composite descriptions, and OSB Proxy Service definitions, to name just a few. They are written using special-purpose languages, as opposed to general-purpose 3G languages like Java. To ensure consistency and quality, we usually define development guidelines. We define naming conventions, or ways to realize certain aspects like error handling, monitoring, or reporting in a coherent way. The problem with such conventions is: if you don’t enforce (automatically check your code against) them, they will not be followed by developers. As manual reviews are cumbersome and error-prone, we need an automated way to do this.

From Checkstyle to Schematron

In the Java world, there is Checkstyle. But what do we have for BPEL, WSDL, and the like? They are based on XML, but is there some kind of “XML style checker” that goes beyond checking for well-formedness, DTD/Schema compliance (validity), and indentation? While I am not aware of any such dedicated tool, there is the Schematron technology.
Using Schematron rules, we can express constraints that go beyond XML Schema’s structural definitions. As such, they are very well suited for formalizing our development guidelines.

Formalizing development guidelines

For a general explanation of Schematron rules, I refer to the official website. Let’s start with a simple example. Let’s say you want your XML Schema files to have the attribute elementFormDefault with the value qualified. Here is the respective rule:

<rule context="/xsd:schema">
    <assert test="@elementFormDefault='qualified'">
        [ERROR] elementFormDefault should be "qualified"
    </assert>
</rule>

You want your Schema’s complex types by names start with an uppercase letter and end with “Type”? No problem:

<rule context="/xsd:schema//xsd:complexType[@name]">
    <assert test="matches(@name,'^([A-Z]|_)+\w*Type')">
        [ERROR] name of complexType "<value-of select="@name" />" should be in camelCase, 
        starting with an uppercase letter and ending with "Type"
    </assert>
</rule>

To provide a more complex example, you could mandate that a BPEL rethrow activity is named Rethrow_<faultName> like this:

<rule context="bpel:process//bpel:catch/bpel:rethrow">
    <assert test="@name = concat('Rethrow_', substring-after(../@faultName, ':'))">
        [ERROR] name of rethrow element "<value-of select = "@name"/>" should be 
        "Rethrow_<value-of select="substring-after(../@faultName, ':')"/>"
    </assert>
</rule>

Or you could mandate that every BPEL sequence has a name. While the assertion’s test itself is very simple (boolean(@name)), we have to limit the context where it is applied. This is because there might be some generated code in your BPEL process that you do not want to touch. This is an example that excludes BPEL sequences that are under a scope that have are marked with the pattern name “bpelx:decide”:

<rule context="bpel:process//bpel:sequence 
    [false()=exists(ancestor::bpel:scope
        [exists(bpelx:annotation/bpelx:pattern[@patternName='bpelx:decide'])]
    )]">
    <assert test="boolean(@name)">
        [ERROR] sequence elements should have a name attribute
    </assert>
    <assert test="matches(@name,'Sequence[0-9]+')=false()">
        [ERROR] Sequence "<value-of select = "@name"/>" should not use JDeveloper 
        default naming ("Sequence1" etc.)
    </assert>
</rule>

Integrating XML checks in your build process

Performing Schematron validations of your SOA artifacts can be integrated into your Ant or Maven based build process quite easily. The following picture illustrates this process in the context of a Jenkins CI server. Of course, developers can also execute builds on their local machines.

schematron-600

The solution consists of the following parts:

  • Schematron rules formalizing your development guidelines (*.sch)
  • The Schematron Ant Plugin (ant-schemtron.jar and saxon9he.jar)
  • The Maven AntRun plugin configuration (inside pom.xml)
  • An XSLT transforming the Schematron output into a more readable form (svrlToSimpleReport.xsl or svrlToHtmlReport.xsl)
  • The configuration instructing Jenkins/Hudson to display validation results in the context of each build.

I assume your .sch and .xsl files are located at a well-defined location and ant-schematron.jar and saxon9he.jar are installed in your Maven repository (you will have to do this manually). Declare them as dependencies of the AntRun plugin like this:

<dependency>
    <groupId>net.sf.saxon</groupId>
    <artifactId>Saxon-HE</artifactId>
    <version>9</version>
</dependency>
<dependency>
    <groupId>com.schematron.ant</groupId>
    <artifactId>schematron</artifactId>
    <version>1.6</version>
</dependency>

The execution definition for the AntRun plugin will look like this:

<execution>
    <id>sca-validate</id>
    <phase>validate</phase>
    <configuration>
        <schematron schema="${schematronDir}/${schematronRuleset}/bpel.sch"
               failOnError="false"
               outputFileName="${basedir}/reports/schematron/bpel.svrl.xml">
            <fileset dir="${basedir}" includes="*.bpel"/>
        </schematron>
        <xslt processor="trax"
               in="${basedir}/reports/schematron/bpel.svrl.xml"
               style="${schematronDir}/${schematronRuleset}/svrlToSimpleReport.xsl"
               out="${basedir}/reports/schematron/bpel.validationReport.xml"
               force="true" failOnError="false"/>
    </configuration>
</execution>

There will be a schematron/xslt pair for each type of artifact you want to validate (composite.xml, WSDL, etc.). Here’s the code of svtlToSimpleReport.xsl:

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0" xmlns:svrl="http://purl.oclc.org/dsdl/svrl"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" indent="yes"/>
    <xsl:template match="/">
        <validationReportList>
            <xsl:for-each select="fileset/file">
                <validationReport>
                    <xsl:attribute name="filename">
                        <xsl:value-of select="@name"/>
                    </xsl:attribute>
                    <xsl:for-each select="svrl:schematron-output/svrl:failed-assert">
                        <violation>
                            <xsl:value-of select="svrl:text"/>
                        </violation>
                    </xsl:for-each>
                </validationReport>
            </xsl:for-each>
        </validationReportList>
    </xsl:template>
</xsl:stylesheet>

svtlToHtmlReport.xsl is similar, but omitted for the sake of brevity. After a local build, developers can check the console output or the generated XML / HTML files for violations.

On a the Jenkins CI Server, you can instruct the Task Scanner Plugin to take into account your *validationReport.xml files like this:

jenkins-taskscanner-schematron

Alternatively or in addition you can use the HTML Publisher Plugin to make generated HTML reports available via the CI servers web interface:

jenkins-htmlpublisher-schematron

Happy validating!

You may also like...

800 Responses

  1. Sameer says:

    Hi
    Thanks for the amazing article.
    I am able to do xml validations using Jenkins, Maven and schematron as given on this page.
    But I also want to enforce file naming convention rules like a OSB proxy service name should end with _PS.
    How to get the file name in .sch file. When I see generated .svrl file, I could see input file names so I am sure there has to be some way to get the file name passed in .sch file and then check if the file name adheres to the standards.

    Any ideas?
    Thanks.