XML Schema Snippet Tester

November 27, 2016

This short Ruby script is an attempt to reduce the pain of working in XML Schema. The idea is to carve out individual snippets and hammer on them in isolation1. It also makes it easy to verify XML that should be flagged as invalid doesn't sneak past2.

There's no need to use it with simple schemas. It's when working on complicated bits (e.g. trying to build a crazy restriction scheme for an attribute) that it's most useful.

Here's the script:

#!/usr/bin/env ruby

require "minitest"
require "minitest/rg"
require "nokogiri"

class Validator

  attr_reader :errors, :xml, :xsd

  def load_schema path
    @xsd = Nokogiri::XML::Schema(File.read(path).strip)
  end

  def load_xml string
    @xml = Nokogiri::XML(string)
    @errors = xsd.validate(xml).to_a
  end

  def is_valid
    errors.each { |error| puts error } # Debugging output
    errors.empty? 
  end
  
end

class ValidatorTest < MiniTest::Test

  attr_reader :v

  def setup
    @v = Validator.new
    v.load_schema('schema.xsd')
  end

  def test_valid_sample
    v.load_xml('<testNode key1="a"/>')
    assert v.is_valid
  end

  def test_sample_with_invalid_attribute
    v.load_xml('<testNode key1="bad_value"/>')
    assert_match(
      /The value 'bad_value' is not an element of the set/, 
      v.errors[0].to_s 
    )
  end
end

MiniTest.run

And a few notes about it:

  • The Validator class provides the main functionality. It doesn't need to change unless new functionality is desired.

  • ValidatorTest is what gets edited and where tests are defined.

  • v.load_schema expects a path3 to the schema detailing the snippet to test. In this case, it points to a schema.xsd file in the same directory. A copy of the schema used for this example is further below.

  • Individual tests are defined using MiniTest's standard test_ prefix methods.

  • Individual XML snippets to test are sent via v.load_xml4.

  • The assert v.is_valid call is used for examples that should pass.

  • The errors.each { |error| puts error } in is_valid provides debugging output when working on changes that cause a failure. Capturing them makes it easy to pull the messages to use when looking for expected failure cases.

  • Validation errors are stored in an errors array. Using assert_match with part of the validation error string ensures errors occur where expected.

  • The full error string in the test for invalid data is Element 'testNode', attribute 'key1': [facet 'enumeration'] The value 'bad_value' is not an element of the set {'a', 'b'}. To help decouple the test a little, it only matches against the The value 'bad_value' is not an element of the set.

Here's the example schema.xsd:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">

  <xs:element name="testNode" type="testNode"/>

  <xs:complexType name="testNode">
    <xs:attribute name="key1" type="_value_list" use="required"/>
  </xs:complexType>

  <xs:simpleType name="_value_list">
    <xs:restriction base="xs:string">
      <xs:enumeration value="a"/>
      <xs:enumeration value="b"/>
    </xs:restriction>
  </xs:simpleType>

</xs:schema>

While there's not a lot to the setup, it's greatly reduced the amount of time I spend banging my head against the XML Schema wall.


Footnotes

  1. Of course, the script works fine with full schemas and XML documents too.

  2. Making sure data that should be invalid doesn't pass was the biggest driver for making this script. I use Oxygen XML Editor. It makes verifying valid files easy but doesn't appear to have a good way to check for false negatives (i.e. data you expect to fail that ends up passing validation).

  3. Keeping the actual schema snippet in it's own file is my preferred way to work. Of course, it's possible to modify the script to include the schema directly as a string.

  4. As with the schemas, it's possible to modify the script to use actual XML files instead. It's generally adds more overhead than it's worth.


Embedding a Test Suite in a Single-file Ruby App (Part 1)

May 22, 2016

"You only write code because you expect it to get executed. If you expect it to get executed, you ought to know that it works. The only way to know this is to test it."

– Robert "Uncle Bob" Martin1


Test Driven Development2 has become the foundation of my coding practice. Knowing that, under it all, I can have math3 actively and automatically proving my code works4 has become so fundamental that I'm reluctant to do anything without it. That reluctance extends all the way to simple, single-file apps5.

Testing generally involves splitting code into two files:

  1. Code that performs a task
  2. Code to test the Code that performs a task

Most projects contain lots of files with application and testing code, documentation, supporting assets, etc… Separating testing concerns into multiple, separate files not only works, it's desirable. Unfortunately, it's completely at odds when the goal is a self-contained, single-file tool. I've been struggling with this a lot. Regularly falling back to manually testing7 instead of creating automated ones that would lead to a second file.

After some experimentation, I'm happy to present a nice solution for packing a test suite directly into the same file as the main application code.

The key: Don't worry about separating test execution from actual execution. Just run the test suite every time the app is started.

Here's an example [filename: drink-example.rb]:

#!/usr/bin/env ruby

require 'minitest'
require 'minitest/rg'


class Drink                                    # The Code to Test
  attr_reader :type
  
  def initialize
    @type = "water"
  end
  
  def describe_type
    puts "This is a drink of #{type}."
  end
end


class DrinkTest < MiniTest::Test               # The Test Suite
  def test_that_the_drink_is_water
    drink = Drink.new
    assert_equal "water", drink.type
  end
end


if MiniTest.run                                # The Run/Kill Switch
  puts "Tests Passed! Process can proceed."
  drink = Drink.new
  drink.describe_type
else
  puts "Tests Failed! Drink *is not* safe!"
  puts "-- No process run --"
end

The Drink and DrinkTest classes are standard Ruby and MiniTest8 fare. The MiniTest.run conditional at the end provides the magic. Running the file with ruby drink-example.rb kicks off MiniTest from there. If all the tests pass, the app gets on with its actual business.

Here's what that looks like:

$ ruby drink-example.rb
Run options: --seed 39971

# Running:

.

Finished in 0.000739s, 1352.7535 runs/s, 1352.7535 assertions/s.

1 runs, 1 assertions, 0 failures, 0 errors, 0 skips

Tests Passed! Process can proceed.
This is a drink of water.

If MiniTest finds a problem it returns false. This triggers the else block which contains only an error message. The app shuts down gracefully without attempting to do potentially dangers operations in its unstable state.

For example, changing @type = "water" to @type = "poison" in the Drink class produces:

$ ruby drink-example.rb
Run options: --seed 44252

# Running:

F

Finished in 0.001335s, 749.1551 runs/s, 749.1551 assertions/s.

  1) Failure:
DrinkTest#test_that_the_drink_is_water [drink-example.rb:16]:
Expected: "water"
  Actual: "poison"

1 runs, 1 assertions, 1 failures, 0 errors, 0 skips

Tests Failed! Drink *is not* safe!
-- No process run --

So, not only does this approach keep everything in one file, it also does a TDD sanity check before each and every run.

I love everything about that.


I'll show more detailed examples of how I use this approach in Part 2.

Footnotes

  1. From Clean Code: A Handbook of Agile Software Craftsmanship by Robert C. Martin. A book that'll rank high when I make my list of recommended reads for other coders.

  2. Test Driven Development still feels like a quantum leap in my ability to make things. I recently finished a mad-dash migration project using languages and systems I wasn't really familiar with. While it's some of the least efficient code I've ever written, it has three things going for it. First, we launched on time. Second, everything worked. Neither would have been possible without the test suite I built as my first step and used throughout the migration. And third (saving the best for last), with the test suite as my backstop, I'm now removing all the cruft carried over from the migration while being confident the system still works as expected.

  3. That's right, math. Because it's all ones and zeros inside the machine and every test case boils down to a "1" if everything worked as expected and a "0" if it didn't.

  4. Critical point: "works" in this context means only that the code is responding in a way the test case expects. There are a host of reasons (like testing the wrong thing) that it may not be doing what's actually desired even if all the tests pass.

  5. After building web pages, writing small, self-contained Perl scripts is how I got started coding. While it's been a couple decades and I've moved on to Ruby, the power of small, custom tools that fit in a single file still amazes me. At any given time, I've got 50 or more floating around6 that get varying degrees of use. Some only last an hour. Some have been around for years.

  6. I use Code Runner to house these apps. Makes it super easy to jump to and run any one of them in the blink of an eye. (It bugs out from time to time. Though, not enough to warrant looking for a replacement.)

  7. Testing by hand was all I used to know. Trying to imagine going back to that makes me wonder how I got anything done. While I have no real complaints about my coding journey so far, learning how to build automated tests from the start is one thing I'd absolutely change if I could go back in time.

  8. The initial tutorials I went through to learn Ruby used RSpec for testing. While I can see some of the appeal, I was happy when I found MiniTest. It makes more sense to my brain and has less overhead since it doesn't require learning a Domain Specific Language (which slowed my overall learning progress considerably).


An XML Schema (XSD) Definition to Prevent Leading Zeros in Integers

March 10, 2016

The XML Schema specification1 provides several handy data types. For example, xs:positiveInteger2 produces "the standard mathematical concept of the positive integer numbers."

Well, mostly.

There's a hidden gotcha in positiveInteger. It allows an anathema. Leading zeros.

For example, given the definition:

<xs:element name="node">
  <xs:complexType>
    <xs:attribute name="number" type="xs:positiveInteger"/>
  </xs:complexType>
</xs:element>

These are all valid3:

<node number="1"/>
<node number="100"/>
<node number="8675309"/>
<node number="007"/>

That last one can cause all kinds of havoc.

When leading zeros are involved in data feeds, they have to be treated as either a string (to maintain the zeros) or converted into an actual integer. Given a system of any size or longevity, the likelihood of different processes making opposing choices approaches 100%. Subtle, super-annoying bugs are born. Ones that take a surprisingly large amount of time to fix4.

Thankfully, XML Schema is robust enough that we can define data types that prohibit leading zeros. For example:

<xs:simpleType name="_positive_integer_without_leading_zeros">
  <xs:restriction base="xs:positiveInteger">
    <xs:pattern value="[123456789]\d*"/>
  </xs:restriction>
</xs:simpleType>

<xs:element name="node">
  <xs:complexType>
    <xs:attribute name="number" type="_positive_integer_without_leading_zeros"/>
  </xs:complexType>
</xs:element>

This works by using a Regular Expression5 to enforce the data format. It's a little easier to understand by breaking pattern's value into two parts. First:

[123456789]

Anything inside square brackets identifies possible values for a single character. So, [123456789] at the start of the pattern value means the first character must be either: 1, 2, 3, 4, 5, 6, 7, 8, or 9. The lack of zero means anything starting with a "0" won't match and will therefore be rejected as invalid.

The second part of the pattern is:

\d*

A \d (without the *) tells the pattern matcher to look for any single digit. If it was by itself, it would mean there always has to be a second character and that character must be a digit. The * modifies \d to allow "zero or more" digits.

If the data being matched is a single character, the \d* has no real effect. If there are two or more characters, it enforces the restriction that every character from the second until the end must be a digit. Unlike the earlier [123456789], the \d pattern includes all possible digits, including zero.

Combined, the [123456789]\d* pattern produces the desired behavior. These actual integers all pass the validation test:

<node number="1"/>
<node number="100"/>
<node number="8675309"/>

But this one's accurately rejected as invalid:

<node number="007"/>

This little snippet is now safe from those pesky little leading zeros sneaking in.

Count yourself lucky if you've never had to deal with leading zeros. If you want to avoid them in future XML use this type of custom data type instead of xs:positiveInteger.


Notes

  1. This technique works equally well in XML Schema 1.0 and 1.1.

  2. The xs:positiveInteger data type allows "+" at the start of the number (e.g. "+8675309"). The definition above doesn't. I built it to deal with unique IDs from a database. None of which contain the "+". Changing the pattern value to \+?[123456789]\d* would accommodate the plus if you need it. Other variations are left as exercises for the reader.

Footnotes

  1. Official XML Schema Documentation

  2. xs:positiveInteger details

  3. While not all validators find the same things, the "007" string was a valid xs:positiveInteger_ value in Saxon-EE 9.6.0.5, LIBXML, and Xerces running in oXygen XML Editor. Speaking of which, if you do any XML work at all and don't know about oXygen, you should check it out. It's expensive but totally worth it.

  4. This doesn't even begin to get into what happens when everyone agrees that strings are the way to go but you run out of numbers and need to add a new zero to the front.

  5. Regular Expressions - a sequence of characters that define a search pattern - the heart and soul of text processing for an old Perl coder like me.


Protection from Adobe Creative Cloud's Folder Erasing Bug

February 12, 2016


Preface: This post tells you to run commands in the Terminal on your Mac. It's a powerful way to tell a computer what to do and well worth learning a little bit about. However, blindly following these type of directions from unknown folks on the Internet can be dangerous. Sneaky folks can trick you into installing viruses/spyware and other bad things. Always ask a tech-buddy you trust to look at anything like this before you follow the directions to run or install it. (Especially if you see the word sudo which has Godlike abilities on Macs.)


Update: Good news, everybody. Further reports indicate the bug doesn't delete the folder, just the contents. So, all that needs to be done is to make a protection folder one time. That can be done with:

sudo mkdir /.aaaaaaProtectionFromAdobeCC

Since the folder itself isn't deleted, there's no need to go through the hassle of the rest of the stuff below.


A February 2016 update from Adobe Creative Cloud is deleting the first folder it finds alphabetically on Macs.

This is bad. It's breaking things like Backblaze's backup service.

Until it's fixed, the safest thing to do is create an empty, throw-away folder that it'll see. Creative Cloud will kill it while leaving the stuff that makes your Mac actually run alone. And, because there are reports of it happening multiple times, you'll want to setup to automatically recreate it.

I created a script that will make the folder then check to make sure it stays there. To install it, copy and paste the lines below into your Terminal application (hit "Return/Enter" after each one to run them).

  1. This line downloads the script file and puts it in a folder that Mac's use for setting up automation:

    sudo curl -s -o "/Library/LaunchAgents/com.alanwsmith.adobeCreativeCloudProtection.plist" "http://alanwsmith.com/com.alanwsmith.adobeCreativeCloudProtection.plist"
  2. The script will automatically start if the Mac is rebooted because it's in the /Library/LaunchAgents folder. To start it without rebooting, run this:

    sudo launchctl load "/Library/LaunchAgents/com.alanwsmith.adobeCreativeCloudProtection.plist"

That should protect you until Adobe corrects the behavior.

Here's a video on how to open the Terminal if you need help with that. You'll also need to use an Admin account and enter your password after running the first command. Finally, these lines are long and some will scroll. Be sure to copy the entire thing.


To remove the script after Adobe gets their side fixed, run these three lines in the Terminal to turn off the script:

  1. This stops the script from running:

    sudo launchctl unload /Library/LaunchAgents/com.alanwsmith.adobeCreativeCloudProtection.plist
  2. This deletes the file (so it doesn't start again next time you reboot):

    sudo rm /Library/LaunchAgents/com.alanwsmith.adobeCreativeCloudProtection.plist
  3. And this removes the throw-away folder that provided the protection.

    sudo rmdir /.aaaaaaProtectionFromAdobeCC

(Note: You'll need to use an Admin account and enter your password with these too.)

Software development is hard. Adobe's software is incredibly complex. Sure this sucks, but it's worth keeping that in mind before blasting Adobe. The real tests are how quickly they respond and if this same thing ever happens again.


Convert a Ruby Array into the Keys of a New Hash

December 02, 2015

The need to migrate an array into a hash crops up on occasion. The simplest approach is to turn each array item into a hash key pointing at an empty value. A situation where the Ruby Array object's .collect method works great. For example:

hash = Hash[array.collect { |item| [item, ""] } ]

Fleshing it out a bit more, here's a full demo showing it in action:

#!/usr/bin/env ruby

require 'pp'

array = %w(cat hat bat mat)
hash = Hash[array.collect { |item| [item, ""] } ]

pp array
pp hash

which produces the output showing the original array and then the hash with the desired structure:

["cat", "hat", "bat", "mat"]
{"cat"=>"", "hat"=>"", "bat"=>"", "mat"=>""}

Of course, the processing block can assign values as well. For example, changing the above example to use:

hash = Hash[array.collect { |item| [item, item.upcase] } ]

would produce the hash with:

{"cat"=>"CAT", "hat"=>"HAT", "bat"=>"BAT", "mat"=>"MAT"}

Good stuff.


P.S. Let me know if you have a simpler way to turn ["cat", "hat", "bat", "mat"] into {"cat"=>"", "hat"=>"", "bat"=>"", "mat"=>""}.


Using the Twitter API without 3rd Party Libraries

October 27, 2015

I've been playing with Twitter's REST API to collect stats for work. Data like: follower counts, what Tweets are being faved, etc… There are libraries for working with the API but no good examples for using it directly. So, I wrote my own1. It's not super complicated, but there are some tricky parts. I'm posting my code here for posterity's sake.

Each example follows the same basic flow outlined in Twitter's Application-only Authentication approach.

  1. Combine the Consumer Key and Consumer Secret (generated when you create an app) to make a Bearer Token.
  2. Base64 encode the Bearer Token (making sure newlines aren't introduced).
  3. Use the Bearer Token to obtain an Access Token.
  4. Use the Access Token to make the actual API call (in this case the basic info for my Twitter account).
  5. Print a dump of the JSON data that's returned.

Step 5 being where real work would actually happen. It's also where looping for to make additional API requests would take place since the Access Token only needs to be pulled once2.

Here's the code:

Ruby

require "base64"
require "json"
require "net/http"
require "uri"

### Setup access credentials

consumer_key = "YOUR_CONSUMER_KEY_STRING"
consumer_secret = "YOUR_CONSUMER_SECRET_STRING"

### Get the Access Token

bearer_token = "#{consumer_key}:#{consumer_secret}"
bearer_token_64 = Base64.strict_encode64(bearer_token)

token_uri = URI("https://api.twitter.com/oauth2/token")
token_https = Net::HTTP.new(token_uri.host,token_uri.port)
token_https.use_ssl = true

token_request = Net::HTTP::Post.new(token_uri)
token_request["Content-Type"] = "application/x-www-form-urlencoded;charset=UTF-8"
token_request["Authorization"] = "Basic #{bearer_token_64}"
token_request.body = "grant_type=client_credentials"

token_response = token_https.request(token_request).body
token_json = JSON.parse(token_response)
access_token = token_json["access_token"]

### Use the Access Token to make an API request

timeline_uri = URI("https://api.twitter.com/1.1/users/show.json?screen_name=TheIdOfAlan")
timeline_https = Net::HTTP.new(timeline_uri.host,timeline_uri.port)
timeline_https.use_ssl = true

timeline_request = Net::HTTP::Get.new(timeline_uri)
timeline_request["Authorization"] = "Bearer #{access_token}"

timeline_response = timeline_https.request(timeline_request).body
timeline_json = JSON.parse(timeline_response)

puts JSON.pretty_generate(timeline_json)

Python

import base64
import json
import urllib2

### Setup access credentials

consumer_key = "YOUR_CONSUMER_KEY_STRING"
consumer_secret = "YOUR_CONSUMER_SECRET_STRING"

### Get the Access Token

bearer_token = "%s:%s" % (consumer_key, consumer_secret)
bearer_token_64 = base64.b64encode(bearer_token)

token_request = urllib2.Request("https://api.twitter.com/oauth2/token") 
token_request.add_header("Content-Type", "application/x-www-form-urlencoded;charset=UTF-8")
token_request.add_header("Authorization", "Basic %s" % bearer_token_64)
token_request.data = "grant_type=client_credentials"

token_response = urllib2.urlopen(token_request)
token_contents = token_response.read()
token_data = json.loads(token_contents)
access_token = token_data["access_token"]

### Use the Access Token to make an API request

timeline_request = urllib2.Request("https://api.twitter.com/1.1/users/show.json?screen_name=TheIdOfAlan")
timeline_request.add_header("Authorization", "Bearer %s" % access_token)

timeline_response = urllib2.urlopen(timeline_request)
timeline_contents = timeline_response.read()
timeline_data = json.loads(timeline_contents)

print json.dumps(timeline_data, indent=2, sort_keys=True)

Perl

use strict;
use Data::Dumper;
use HTTP::Request::Common;
use JSON;
use LWP::UserAgent;
use MIME::Base64;
use Mozilla::CA; # Gets HTTPS working on Mac OSX (10.10)

### Setup access credentials

my $consumer_key = "YOUR_CONSUMER_KEY_STRING";
my $consumer_secret = "YOUR_CONSUMER_SECRET_STRING";

### Get the Access Token

my $bearer_token = "$consumer_key:$consumer_secret";
my $bearer_token_64 = encode_base64($bearer_token, "");

my $user_agent = LWP::UserAgent->new;

my $token_request = POST(
  "https://api.twitter.com/oauth2/token",
  "Content-Type" => "application/x-www-form-urlencoded;charset=UTF-8",
  "Authorization" => "Basic $bearer_token_64",
  Content => { "grant_type" => "client_credentials" },
);

my $token_response = $user_agent->request($token_request);
my $token_json = decode_json($token_response->content);

my $timeline_request = GET(
  "https://api.twitter.com/1.1/users/show.json?screen_name=TheIdOfAlan",
  "Authorization" => "Bearer " . $token_json->{access_token}
);

my $timeline_response = $user_agent->request($timeline_request);
my $timeline_json = decode_json($timeline_response->content);

print Dumper $timeline_json;

A few things to keep in mind with these examples:

  • Authentication credentials should not be hard coded in source files.
  • They are very high level. Just one step beyond pseudo-code.
  • The code is straight procedural and almost certainly not how you should actually do it.
  • There is no error handling in these examples.
  • There is no accounting for throttling based on the API rate-limits.
  • No step to transform strings into RFC 1738 (which the docs say to do) was added since there is no change at this point.
  • Probably some other caveats that I'm forgetting.
  • Authentication credentials should not be hard coded in source files (in case you missed it the first time).

Generally, I'm all for using code libraries to make life easier. Working with APIs is a different story. Unless whoever makes the API is also providing the library, I'd rather use the service directly. Avoiding third-party dependencies is well worth the little extra work it takes. And, if there are good examples available there's not a huge difference between the levels of effort.


Footnotes

  1. The person I'm helping with this is interested in using the pandas data analysis library which is written in Python. I wrote the first version in that language. I decided to port it to Ruby since that's my go-to language these days. Once I started down the path of multiple versions it was easy to thrown in Perl (my previous language of choice). This was the first time in years that I've written any Perl. It's also the first time I've ever done anything beyond "Hello, World" in Python. All said, it was a good language exercise. It's fun to see the difference between them.

  2. The Access Token only needs to be pulled once for as long as it's valid. It should stay that way unless a specific request to invalidate it occurs.


Configuration Files and Multiple Ruby Object Constructors

October 12, 2015

I mostly write command line apps. Mostly. It's the nature of the gig, even though I work on a relatively big website1. There are lots of moving parts, many of which are well suited for automation, that combine to produce the overall site. As someone who likes avoiding hard-coded variables, I use configuration files to tell my various apps/scripts/services/software what to do.

The simplest way to instantiate a Ruby object with parameters (like a config file path) is to send them directly to the initialize method. For example2:

class Robot

  attr_reader :config

  def initialize(config_path)
    @config = parse_config(config_path)
  end
  
  def parse_config(config_path)
    YAML.load(ERB.new(File.read(config_path)).result)
  end
end

robot = Robot.new("~/robot-config.yml"))

That approach works but makes testing problematic. Unit tests for individual method can't be run without loading a config file. An undesirable tight coupling is created. If the object validates the config, changes to its format can trigger multiple, otherwise unrelated tests. A strong "Shotgun Surgery" Code Smell3 indicating a great refactoring opportunity.

My first attempt at a remedy was to make the config file optional. For example:

class Robot

  attr_reader :config

  def initialize(config_path = nil)
    if config_path
      @config = parse_config(config_path)
    else 
      @config = {}
    end
  end
  
  def parse_config(config_path)
    YAML.load(ERB.new(File.read(config_path)).result)
  end
end

robot = Robot.new("~/robot-config.yml"))

robot_for_unit_tests = Robot.new

That works, but feels rough. Conditional logic has become a red flag ever since attending Sandi Metz's wonderful Practical Object-Oriented Development (POOD) course4. Also, having recently kicked around with The Big Nerd Ranch's Objective-C book5, I'd just seen examples of using multiple object constructors. After some investigation6, experiments, and hacking, I ended up with this approach:

class Robot

  attr_reader :config

  def initialize
    @config = {} # defaults can be applied here too.
  end
  
  def initialize_with_config config_path
    initialize
    @config = parse_config(config_path)
  end
  
  def self.new_with_config config_path
    forerunner = allocate
    forerunner.send(:initialize_with_config, config_path)
    forerunner  
  end
  
  def parse_config(config_path)
    YAML.load(ERB.new(File.read(config_path)).result)
  end
end

The primary way to instantiate objects changes from:

robot = Robot.new("~/robot-config.yml"))

To use the newly created class method:

robot = Robot.new_with_config("~/robot-config.yml"))

The .new_with_config class method bounces to the .initialize_with_config object method which in turn calls the default .initialize object method before doing its work.

Unit tests can now use the built-in .new() without params or worrying about a config file. An added bonus is the nice way this separates setting defaults (and any other initialization requirements) from loading the config.

This approach isn't limited to config files. It works any time there's a need to create objects with different parameters/options from a single class. The few extra extra lines for the class methods are totally worth the clean separation provided by multiple constructors.

If you're interested in more details on how this works, check out Ruby Constructors from @terminalbreaker.


Footnotes

  1. Well, I really work on a few big sites: PGATOUR.COM, PresidentsCup.com, and WorldGolfChampionships.com but they are all directly related and powered by the same tools.

  2. The examples in this post contain working code with two caveats. First, they need require 'erb' and require 'yaml'. Those lines were omitted to save a little vertical height. And, of course, a config file sitting at ~/robot-config.yml. I also used a minimal approach with no error handling to keep the size down. So, while they work, they are only proof-of-concept examples.

  3. The Wikipedia Shotgun Surgery is a pretty formal. The simplified way I've heard it described and think about it is: An update in one place that requires modifying multiple classes/methods/functions in other places.

  4. I can't recommend Sandi's Practical Object-Oriented Design Course course highly enough. It was mind-expanding. The huge focus on "Practical" is hard to beat. (And yes, that's the back of my head with the WWSMD neck-tattoo.)

  5. I haven't read any other Objective-C books. So, I don't have a basis for comparison, but Objective-C Programming: The Big Nerd Ranch Guide seems like a decent intro. More than anything, it makes me want to attend a course at the ranch.

  6. This was a rare case where I wasn't able to score a hit on StackOverflow. I found the solution in Ruby Constructors and the related Constructors, Intro to Ruby Classes Part II slides. Both by Juan '@terminalbreaker' Leal.

P.S. Yes, I know making two calls works just fine. Something like:

robot = Robot.new
robot.prase_config("~/robot-config.yml"))

But since production scripts require the config, I prefer to make it happen in one line.


The Density of Kanji

September 04, 2015

Japanese writing, called Kanji, is beautiful. It's a "logographic writing system1". A fancy way of saying the written symbols primarily represent words (or parts of words) instead of sounds. That approach allows it to be much denser than English. Something I realized after picking up a Japanese Twitter follower.

Take this Tweet2 for example:

最近有科学家研究发现,人类最理想的身高是168厘米,上下变动的范围可在167厘米至170厘米之间。研究指出,身材高大的人血液循环路线较长,心脏负担也较重,因此可能会对寿命有影响。矮个子体表面积相对较小,日常的能量消耗较少,所需营养物质相应减少,身体的耐受力较强。

It's 131 characters long. Nine shy of the Twitter's limit. After running it through Google translation engine it becomes:

Recently, scientists found that the human ideal height is 168 cm, the upper and lower range of variation can be between 167-170 cm. Research indicates that tall people the blood circulation route is longer, it is also heavier burden on the heart, and therefore might affect life expectancy. Shorty body surface area is relatively small, less daily energy consumption, a corresponding reduction in the required nutrients, strong physical endurance.

That's 447 characters. More than 3x the length. I doubt the most efficient human translator/editor could get the same information across in English while staying under Twitter's 140 character limit.

It blew my mind a little to figure this out.


Footnotes

  1. Here's the Wikipedia section on logographic writing systems I used when writing this post.

  2. And here's the original Tweet that got me thinking about all this.


Go To Index Page: