Quality SOA Testing

Enforcing XML Development Guidelines using Schematron

In SOA projects, we produce lots of XML artifacts. There are BPEL processes, WSDLs and XSDs, SCA composite descriptions, and OSB Proxy Service definitions, to name just a few. They are written using special-purpose languages, as opposed to general-purpose 3G languages like Java. To ensure consistency and quality, we usually define development guidelines. We define naming conventions, or ways to realize certain aspects like error handling, monitoring, or reporting in a coherent way. The problem with such conventions is: if you don’t enforce (automatically check your code against) them, they will not be followed by developers. As manual reviews are cumbersome and error-prone, we need an automated way to do this.

From Checkstyle to Schematron

In the Java world, there is Checkstyle. But what do we have for BPEL, WSDL, and the like? They are based on XML, but is there some kind of “XML style checker” that goes beyond checking for well-formedness, DTD/Schema compliance (validity), and indentation? While I am not aware of any such dedicated tool, there is the Schematron technology.
Using Schematron rules, we can express constraints that go beyond XML Schema’s structural definitions. As such, they are very well suited for formalizing our development guidelines.

Formalizing development guidelines

For a general explanation of Schematron rules, I refer to the official website. Let’s start with a simple example. Let’s say you want your XML Schema files to have the attribute elementFormDefault with the value qualified. Here is the respective rule:

<rule context="/xsd:schema">
    <assert test="@elementFormDefault='qualified'">
        [ERROR] elementFormDefault should be "qualified"
    </assert>
</rule>

You want your Schema’s complex types by names start with an uppercase letter and end with “Type”? No problem:

<rule context="/xsd:schema//xsd:complexType[@name]">
    <assert test="matches(@name,'^([A-Z]|_)+\w*Type')">
        [ERROR] name of complexType "<value-of select="@name" />" should be in camelCase, 
        starting with an uppercase letter and ending with "Type"
    </assert>
</rule>

To provide a more complex example, you could mandate that a BPEL rethrow activity is named Rethrow_<faultName> like this:

<rule context="bpel:process//bpel:catch/bpel:rethrow">
    <assert test="@name = concat('Rethrow_', substring-after(../@faultName, ':'))">
        [ERROR] name of rethrow element "<value-of select = "@name"/>" should be 
        "Rethrow_<value-of select="substring-after(../@faultName, ':')"/>"
    </assert>
</rule>

Or you could mandate that every BPEL sequence has a name. While the assertion’s test itself is very simple (boolean(@name)), we have to limit the context where it is applied. This is because there might be some generated code in your BPEL process that you do not want to touch. This is an example that excludes BPEL sequences that are under a scope that have are marked with the pattern name “bpelx:decide”:

<rule context="bpel:process//bpel:sequence 
    [false()=exists(ancestor::bpel:scope
        [exists(bpelx:annotation/bpelx:pattern[@patternName='bpelx:decide'])]
    )]">
    <assert test="boolean(@name)">
        [ERROR] sequence elements should have a name attribute
    </assert>
    <assert test="matches(@name,'Sequence[0-9]+')=false()">
        [ERROR] Sequence "<value-of select = "@name"/>" should not use JDeveloper 
        default naming ("Sequence1" etc.)
    </assert>
</rule>

Integrating XML checks in your build process

Performing Schematron validations of your SOA artifacts can be integrated into your Ant or Maven based build process quite easily. The following picture illustrates this process in the context of a Jenkins CI server. Of course, developers can also execute builds on their local machines.

schematron-600

The solution consists of the following parts:

  • Schematron rules formalizing your development guidelines (*.sch)
  • The Schematron Ant Plugin (ant-schemtron.jar and saxon9he.jar)
  • The Maven AntRun plugin configuration (inside pom.xml)
  • An XSLT transforming the Schematron output into a more readable form (svrlToSimpleReport.xsl or svrlToHtmlReport.xsl)
  • The configuration instructing Jenkins/Hudson to display validation results in the context of each build.

I assume your .sch and .xsl files are located at a well-defined location and ant-schematron.jar and saxon9he.jar are installed in your Maven repository (you will have to do this manually). Declare them as dependencies of the AntRun plugin like this:

<dependency>
    <groupId>net.sf.saxon</groupId>
    <artifactId>Saxon-HE</artifactId>
    <version>9</version>
</dependency>
<dependency>
    <groupId>com.schematron.ant</groupId>
    <artifactId>schematron</artifactId>
    <version>1.6</version>
</dependency>

The execution definition for the AntRun plugin will look like this:

<execution>
    <id>sca-validate</id>
    <phase>validate</phase>
    <configuration>
        <schematron schema="${schematronDir}/${schematronRuleset}/bpel.sch"
               failOnError="false"
               outputFileName="${basedir}/reports/schematron/bpel.svrl.xml">
            <fileset dir="${basedir}" includes="*.bpel"/>
        </schematron>
        <xslt processor="trax"
               in="${basedir}/reports/schematron/bpel.svrl.xml"
               style="${schematronDir}/${schematronRuleset}/svrlToSimpleReport.xsl"
               out="${basedir}/reports/schematron/bpel.validationReport.xml"
               force="true" failOnError="false"/>
    </configuration>
</execution>

There will be a schematron/xslt pair for each type of artifact you want to validate (composite.xml, WSDL, etc.). Here’s the code of svtlToSimpleReport.xsl:

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0" xmlns:svrl="http://purl.oclc.org/dsdl/svrl"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" indent="yes"/>
    <xsl:template match="/">
        <validationReportList>
            <xsl:for-each select="fileset/file">
                <validationReport>
                    <xsl:attribute name="filename">
                        <xsl:value-of select="@name"/>
                    </xsl:attribute>
                    <xsl:for-each select="svrl:schematron-output/svrl:failed-assert">
                        <violation>
                            <xsl:value-of select="svrl:text"/>
                        </violation>
                    </xsl:for-each>
                </validationReport>
            </xsl:for-each>
        </validationReportList>
    </xsl:template>
</xsl:stylesheet>

svtlToHtmlReport.xsl is similar, but omitted for the sake of brevity. After a local build, developers can check the console output or the generated XML / HTML files for violations.

On a the Jenkins CI Server, you can instruct the Task Scanner Plugin to take into account your *validationReport.xml files like this:

jenkins-taskscanner-schematron

Alternatively or in addition you can use the HTML Publisher Plugin to make generated HTML reports available via the CI servers web interface:

jenkins-htmlpublisher-schematron

Happy validating!

Architecture Programming Testing

New is the new goto

I really like this catchphrase! While I’m pretty sure I’ve read it somewhere, Google-searching it only points to Adventures in Software. Why is instantiating a class using the new keyword usually as bad thing? For two reasons:

  • It ties two implementation classes strongly together. You will have a hard time testing one class in isolation or changing its behavior without modifying it.
  • It  hides the dependency somewhere in your code. The dependencies of a class are not obvious if there are new statements scattered all over the place.

A simple approach

Without going into much detail (there is a lot of literature about it), here’s a simple strategy to avoid new statements:

  • pass in all the dependencies of a class as constructor arguments and assign them to private final fields
  • make the types of the constructor arguments interfaces, not classes
  • do all the instantiation work in dedicated factory classes (or using a dependency injection framework)

Here’s a synthetic example:

public class Foo {
    private final IFooCollaborator fooCollaborator;

    public Foo(final IFooCollaborator fooCollaborator) {
        this.fooCollaborator = fooCollaborator;
    }
}

The number of constructor arguments will show you the exact number of collaborators. This is a very valuable hint about your design quality. If you have too many (real) collaborators, your class either does too much, or does something at too low a level of abstraction. In this case, consider combining some of the collaborators into a new class. If your class uses a collaborator just to obtain some other class, it is not a real collaborator, and your design is flawed.

Wiring it up

Of course, your new keywords will have to go somewhere. Because after all, you will need to bind concrete implementations to your interfaces. If you don’t use some kind of dependency injection framework, this will be done in factory classes. They are responsible for building object graphs (wiring collaborators), and this responsibility is cleanly separated from your business logic code. See the classic “Where Have All the Singletons Gone?” for more about this topic.

Some related anti-patterns

Of course, you could avoid using new statements by calling static factory methods or some global (Singleton) getInstance method. We all know this is bad, bad, bad, bad. Using the Service Locator or Service Registry pattern isn’t much better. Using some registry object (or even worse, static methods of a registry class) will hide your dependencies. And it will make almost every class dependent of your registry, even though it is not a real collaborator but just used to obtain the actual collaborators.

Exceptions to the rule

Of course there are exceptions to the rule. Value objects, exception objects, or data transfer objects are examples of objects whose direct creation you might accept. And you might accept directly instantiating runtime or standard library classes (think Date and String). But reconsider your abstractions when thinking about instantiating something like FileInputStream.

What about Dependency Injection?

I prefer constructor injection over setter injection. The Spring Framework does the opposite. I think it is much more explicit to have all collaborators listed as constructor arguments and assign them to final fields. Again, Miško Hevery explains the advantages of this approach very well.

If you use Enterprise Java Beans, you can have collaborators injected by annotating a field with @EJB:

@Stateless 
public class MyClass implements IMyClass {
    @EJB
    private DaoFactoryLocal daoFactory;

    public MyClass() {}

    public MyClass(final DaoFactoryLocal daoFactory) {
        this.daoFactory = daoFactory;
    }
}

Unfortunately, being an EJB itself, MyClass will have to provide a no-argument constructor, and therefore you cannot mark your collaborator fields final. But I still recommend having a second constructor listing all collaborators, for the reasons stated above.

Architecture Programming SOA

Good and bad Dependencies between SOA Artifacts

My post “Towards Architecture-aware Dependency Metrics” was focused on dependencies between classes and modules in object oriented programming languages. But the idea of analyzing dependencies and classifying them as either “good” or “bad” can be applied to SOA artifacts as well.

For example, your target architecture might distinguish between the concepts of “Adapter Services”, and “Business Services”. And might want to enable Adapter Services to call Business Services and vice versa, but you might not want Adapter Services to call each other. By making rules like these explicit and mapping your implementation artifacts to elements of an architecture definition, you could automatically find architecture violations. But I am not aware of any tool supporting this kind of analysis. Determining and visualizing SOA artifact dependencies can be seen as a first step towards such a tool.

Architecture Programming Quality

Towards Architecture-aware Dependency Metrics

Dependencies are at the heart of software architecture. Getting your dependencies right is the basis of keeping your software maintainable and evolvable. There are “good” and “bad” dependencies. Bad dependencies will make your system hard to understand, change and test. 

Types of Dependencies

A dependency between two artifacts A and B exists if A somehow “needs” B. In OOP, the most important artifacts are classes (and interfaces, which are technically just pure abstract classes). We can further differentiate the type of dependency between two given classes: A might

  • inherit from (or implement) B
  • accept B instances as input
  • retrieve B instances
  • create B instances
  • access a B instance (method, constant, or field)
  • return B instances

Usually multiple of these types apply. For example, A somehow needs to get a hold of a B instance in order to access or return it. So if A is not B instance itself, it could either create one, (actively) retrieve one or (passively) accept one as input. In this article, I do not distinguish between compile time and runtime dependencies, nor do I explicitly consider resource dependencies (files, local hardware, remote systems etc.).

Dependency Graphs

A set of interdependent classes forms a directed graph, with the classes being the nodes and the dependencies being the edges. Each dependency has a direction, and other properties such as its type from the list above. Using this information, we can derive some quality metrics. We could count the number of incoming or outgoing dependencies, detect cycles, or consider the types of the dependencies. There are some well known metrics, probably the most well-known being the Metrics Suite for Object Oriented Design by Chidamber and Kemerer.

Awareness of the Target Architecture

The common metrics consider all classes equal. But in fact, they are not. For example:

  • Interfaces and API classes are different from implementation classes
  • Infrastructure classes are different from business logic classes
  • GUI classes are different database access classes

What makes them different? The answer is: their relation to a target architecture. Technically, a Factory is just a regular class. But it is our design/architecture that defines that this class may instantiate classes, while other (business logic) classes may not instantiate any classes. Furthermore, considering the architecture, classes not only belong to packages, but to some higher level “thing” (call it component, module, subsystem…). Usually, we design a system with such high level abstractions in mind. And we define allowed dependencies at this level of abstraction. But our implementation language might not provide mechanisms to ensure our architectural guidelines are followed. For example, Java still does not provide a standard way of organizing our classes in modules and cleanly separating their APIs from their implementations (like OSGi does). And some other architecture principles will never be expressible in the language itself (see the factory example above).

The Vision

My Vision is an architecture-aware dependency analysis that enables developers and architects to spot problems before it is too late to fix them. The analysis will classify dependencies as  “good” (valid) or “bad” (invalid), just like Unit Tests that either pass or fail. Developers will be reminded that they violated the target architecture if they imported a class from a wrong package or tried to instantiate a class in business logic code.

What do we need? First, we need to make our target architecture explicit using a model. We then need to map our concrete software artifacts (classes, packages etc.) to the elements of our architecture model. This may be simplified by using some naming conventions for the implementation artifacts. Finally, we need to specify general constraints (like “classes may not create other classes” or “implementation classes of one module may not access implementation classes of other modules”) and exceptions from these constraints (like “Factory classes may create classes from their own package”). The architecture definition will need to be evolved along with the software.

OSB Quality SOA

Clean Code in a SOA world: Don’t Repeat Yourself

Don’t repeat yourself – this is one of the oldest principles in information technology.  The inventions of subroutines, classes, and modules are all related to reuse, which means avoiding duplicating things over and over again. If you do duplicate programming language code in a 3GL, something might be wrong with your design. If you do duplicate XML fragments or files in your SOA, something might be wrong with your design, too. Unfortunately, this is sometimes exactly the thing you have to do.

Why is that? BPEL processes don’t support inheritance or aspect weaving (yet). This way, similar process will look very similar, and you will end up with lots of duplicated XML code. The same is true in OSB Proxy Services (until Oracle gives us a template or inheritance mechanism). Another example are configuration files like SOA Suite configuration plans. If you have one such configuration plan per composite, chances are they look very similar. Generally speaking, it is harder to follow the DRY principle in an XML world. There are no generic modularization and reuse concepts, not even a generic “include” mechanism for XML files. You can use an XSLT transformation to combine common XML code with specific XML code. But you would have to edit the separate XML files manually since graphical editors can only write single files.

Think again

Whenever you have the feeling you are copying files or XML fragments over and over again, think again. Is it really necessary? What are the implications for maintainability? Can you avoid it? If you can’t avoid copying things, at least try to automate this process. This will save you some time, and even more important, makes the intent of having the content in multiple locations clear.

Quality SOA Testing

Following common Software Engineering practices in Oracle SOA products

When working with “traditional” 3rd Generation Languages, there are lots of tools and frameworks that help you  refactor, test, and analyze your code. Following best practices like Continuous Refactoring, Unit Testing/Test Driven Development (TDD), Root Cause Analysis, and Static Code Analysis is supported very well.

tools-03

But when you work with SOA products like Oracle SOA Suite or Oracle Service Bus, following such best practices is harder. You don’t use 3GL code for all the aspects of your software. Instead, you create specialized (XML) artifacts for the different aspects, often using graphical editors. Still, these artifacts have internal structures and dependencies that need care (refactoring). And still, you create software that is supposed to fulfill requirements (so you want to test and analyze it). But the tool support in Oracle SOA Suite and OSB is very limited. In this post, I will analyze the current situation.

Refactoring

Continuous Refactoring means to continually restructure your code and adjust your design to keep it maintainable (readable, testable, evolvable). Unfortunately, Oracle SOA products only have very limited refactoring capabilities. You can hardly even change the names of variables and services without breaking dependencies. Let alone more complex modifications like extracting parts of your BPEL process into a separate process.

Unit Testing

SOA Suite features built-in support for unit tests, while OSB doesn’t. The SOA Suite unit test framework is helpful, but is limited in its assertion capabilities. For example, you cannot assert that a references service is not called, and you cannot use dynamically calculated values in your assertions. TDD is hardly possible because you need to deploy your composite application in order to run its unit tests, which is very time-consuming.

Root Cause Analysis

If there is a problem, you should not just fix its symptoms, but analyze and remedy the root cause. Debuggers are a valuable tool to do just that. The SOA Suite doesn’t come with a debugger, while OSB does. Both feature message tracing.

Static Analysis

If you use Static Analysis, your code is automatically checked for potential problems. Such problems can range from simple style violations to real bugs. Both SOA Suite and OSB have only rudimentary support for analyzing your code, not much more than syntax checking. If you want to enforce development guidelines, you need to employ custom solutions (like my Schematron-based approach I will explain in a different blog post).

Safety

While type safety is a built-in feature of strongly-types languages like Java, you don’t have much support when you work with XML messages. Even though interfaces based on WSDL and XML schema have well-defined input and output parameters, the design-time tool support for ensuring correct parameter values in Oracle SOA products is rather poor. If you change the signature of a method or the package name of a class in a 3GL, the compiler will tell you which dependencies you need to adjust so things don’t break. If you change an operation name or a namespace in a WSDL (remember, we don’t have refactoring capabilities), you don’t have much support either. You need to be very careful and manually adjust many places, for example in your OSB proxy flow. If you are not, your XQuery expressions might just stop returning the desired values. Because you don’t have anything like a compiler, good runtime test coverage is essential. Using Validate actions you can assure that messages comply with a certain XML Schema.

Much room for improvement

Here’s the result of my little non-scientific evaluation:

ora-ep-eval

To sum up, my top three wishes for future Oracle SOA products are:

  • Support for fast and more flexible (unit) tests,
  • Improved Refactoring capabilities (supporting renaming of things like service operations or namespaces would be a good starting point)
  • Design-time support for type safety

Picture: Some of my real-world Tools.

Architecture SOA

Ground to Cloud – some SaaS integration considerations

Integrating Software as a Service (SaaS) offerings into your IT landscape is similar to integrating local “on premise” applications in many ways, especially if Web Services are your main information exchange technology. But in many other ways, it’s different. Obviously you have the “insecure internet” in between. Extending your enterprise wide master data and authorization management to external applications is a challenge. And you have less control about the remote packaged application, for example you cannot directly access its file system or its underlying database (which isn’t a good integration approach after all, but can be useful for things like monitoring). 

vws2012

Oracle released a whitepaper about integration SaaS (“Cloud-based”) applications. They ask if we need to reinvent the wheel after having moved “from CORBA to Client/Server to Web services, EAI and SOA”. Of course, their answer is no, if you use Oracle Fusion Middleware products. I agree that a middleware layer capable of accessing heterogeneous systems, validating, enriching, transforming and routing messages, orchestrating processes, as well as monitoring and securing your whole IT landscape is a key component. The Oracle paper assumes that “most cloud applications support Web Services”. While this is true, I recommend carefully examining the extent of this support. Given your functional requirements, look at the APIs very closely. If it enables you to do everything you can do using the UI, plus bulk data import and export, you are probably fine.

For nonfunctional requirements, like monitoring, transactions, and security, things are a bit more complicated. For example, if the cloud application employs its own user management, you might have to replicate the user accounts from your company’s master system to the cloud app and make sure the “cloud copies” are not modified in an inappropriate way. Modern SaaS offerings might support SAML or OAuth, but don’t bet on it. When considering a cloud application vendor, I recommend scrutinizing the following points (this is a short summary of an evaluation catalog I developed for a client a couple of months ago):

  • Protocols, APIs, and data formats (considering versioning strategies and documentation)
  • Authentication Management (users, groups, replication or assertion propagation? Support for Single-Sign-On, SAML, OAuth etc.)
  • Authorization (granularity, support for XACML?)
  • Backup / bulk data exchange
  • Master data management (replication/synchronization)
  • Staging (including data and configuration migration between stages)
  • Supported SLA types and their enforcement, Monitoring capabilities

Picture: Mount Pilatus. Taken March 2012 from a ship on Lake Lucerne.