This short Ruby script is an attempt to reduce the pain of working in XML Schema. The idea is to carve out individual snippets and hammer on them in isolation1. It also makes it easy to verify XML that should be flagged as invalid doesn't sneak past2.

There's no need to use it with simple schemas. It's when working on complicated bits (e.g. trying to build a crazy restriction scheme for an attribute) that it's most useful.

Here's the script:

#!/usr/bin/env ruby

require "minitest"
require "minitest/rg"
require "nokogiri"

class Validator

  attr_reader :errors, :xml, :xsd

  def load_schema path
    @xsd = Nokogiri::XML::Schema(File.read(path).strip)
  end

  def load_xml string
    @xml = Nokogiri::XML(string)
    @errors = xsd.validate(xml).to_a
  end

  def is_valid
    errors.each { |error| puts error } # Debugging output
    errors.empty? 
  end
  
end

class ValidatorTest < MiniTest::Test

  attr_reader :v

  def setup
    @v = Validator.new
    v.load_schema('schema.xsd')
  end

  def test_valid_sample
    v.load_xml('<testNode key1="a"/>')
    assert v.is_valid
  end

  def test_sample_with_invalid_attribute
    v.load_xml('<testNode key1="bad_value"/>')
    assert_match(
      /The value 'bad_value' is not an element of the set/, 
      v.errors[0].to_s 
    )
  end
end

MiniTest.run

And a few notes about it:

  • The Validator class provides the main functionality. It doesn't need to change unless new functionality is desired.

  • ValidatorTest is what gets edited and where tests are defined.

  • v.load_schema expects a path3 to the schema detailing the snippet to test. In this case, it points to a schema.xsd file in the same directory. A copy of the schema used for this example is further below.

  • Individual tests are defined using MiniTest's standard test_ prefix methods.

  • Individual XML snippets to test are sent via v.load_xml4.

  • The assert v.is_valid call is used for examples that should pass.

  • The errors.each { |error| puts error } in is_valid provides debugging output when working on changes that cause a failure. Capturing them makes it easy to pull the messages to use when looking for expected failure cases.

  • Validation errors are stored in an errors array. Using assert_match with part of the validation error string ensures errors occur where expected.

  • The full error string in the test for invalid data is Element 'testNode', attribute 'key1': [facet 'enumeration'] The value 'bad_value' is not an element of the set {'a', 'b'}. To help decouple the test a little, it only matches against the The value 'bad_value' is not an element of the set.

Here's the example schema.xsd:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">

  <xs:element name="testNode" type="testNode"/>

  <xs:complexType name="testNode">
    <xs:attribute name="key1" type="_value_list" use="required"/>
  </xs:complexType>

  <xs:simpleType name="_value_list">
    <xs:restriction base="xs:string">
      <xs:enumeration value="a"/>
      <xs:enumeration value="b"/>
    </xs:restriction>
  </xs:simpleType>

</xs:schema>

While there's not a lot to the setup, it's greatly reduced the amount of time I spend banging my head against the XML Schema wall.


Footnotes

  1. Of course, the script works fine with full schemas and XML documents too.

  2. Making sure data that should be invalid doesn't pass was the biggest driver for making this script. I use Oxygen XML Editor. It makes verifying valid files easy but doesn't appear to have a good way to check for false negatives (i.e. data you expect to fail that ends up passing validation).

  3. Keeping the actual schema snippet in it's own file is my preferred way to work. Of course, it's possible to modify the script to include the schema directly as a string.

  4. As with the schemas, it's possible to modify the script to use actual XML files instead. It's generally adds more overhead than it's worth.