Progressive Data Constraints

As a follow up to my last post, Richard Veryard described the concept of Post Before Processing whereby rules are applied once you have safely captured the initial information. This is a good way of managing unstructured or semi- structured data and it reminded me of other cases where data is progressively constrained and or enriched during processing.

Content management systems use this technique at the unstructured end of the spectrum where source and draft content is pulled into the system from the “jungle” and then progressively edited, enriched, reviewed etc until the content is published. Programmers will be familiar with this in the way that source code is progressively authored, compiled, unit tested and integrated via source code control (plus a bunch of QA rules and processes).

Many computer-aided design applications also provide this ability to impose different rules through the lifecycle of a process. Many years ago I worked on a CAD application for outside plant management for Telcos which had a very interesting and powerful long transaction facility. Normal mode representing the current state of the network enforced an array of data integrity and business rules – such as what cables could be connected to each other and via what type of openable joints etc.

In the design mode, different rules are in place so that model items can be edited and temporarily placed into invalid states. This is necessary because of the connected nature of the model (a Telco network) and the fact that the design environment reflects the future state of the network which may not correctly “interface” to the current network state. The general mode of operation was multiple design projects created by different network designers who managed the planning and evolution of the network from it’s current state to some future state. And multiple future states could potentially exist within the system at any point in time. Design projects follow a lifecycle from draft through proposed and into “as built”. This progression is accompanied by various rules governing data visibility, completeness and consistency.

This is a useful model of how to manage data which may be complex, distributed or collaboratively obtained and managed. Effectively building a process around a long transaction which manages transition of data between states of completion or consistency.

Master Data Management is another topical example of this type of pattern. In this scenario data is distributed across systems or organizations. Local changes trigger processes which manage the dissemination of changes to bring the various parts of the data into consistency. During this process different business rules may be applied to manage the data in transition.

I think these concepts can be more generally applicable to SOA design-time governance. For example in the collaborative design of enterprise schemas or services contracts.

Talk to the Hand

A common issue  with any kind of BPM implementation or distributed application is the problem of what to do with errors. The most important thing is to direct errors to the correct system or person. “Correct” meaning a system that can actually do something about the error.

Take our internal timesheet application as an example. I’m sure your system is similar, where an administrator sets up a project along with various business rules such as who can enter time against a project, or what is the valid date range for the project. When I enter my time records, I invariably run afoul of one of these rules – who could have guessed the project would run late?

The problem arises when I get a (usually very obscure) error message about one of these constraints. I can’t actually do anything about incorrect dates or ad hoc resource assignments because I don’t own the project.  So it’s no good erroring out to me, the user. The really brain-dead thing is that I have to discard the timesheet entries, send an email to the project owner (whoever that may be) and enter the data later…if at all. It certainly doesn’t help collect accurate and timely information.

A better approach would be to accept the entered data and refer the errors to the project owner who is in a position to do something about it. This person can review the errors and correct the project rules accordingly. Alternatively they can reject the entry because I may have actually broken some valid rules.

This process flow is much more user friendly and streamlined. It requires a couple of important facilities:

You need to suspend business rules for part of the process. My timesheet entries initially break the business rules but we don’t know if that is because I have err’d or because the rules/reference data are in error. We need to accept the entries in an invalid state until the right authority can decide. This represents an example where data validity rules are not absolute, but depend on the process.

Secondly you need an error service…a facility whereby errors when detected can be routed to the appropriate person or system for resolution.

Our timesheet system is based on the old database paradigm – a monolithic data-centric application with only one place for rules to be applied.  The “workflow-based” solution I describe is slightly more complicated than the monolithic application but the payoff in terms of a better user experience leads to more accurate and timely records. And if your a services organization with thousands of billable timesheet records each day, then you know what takes priority.