Yossi Dahan [BizTalk]

Google
 

Monday, March 23, 2009

On Atomic Scope and Message Publishing

A few weeks back I worked on a process that looked something like this -

It was triggered by the scheduled task adapter and then used a SQL send port to call SP to return list of ‘things’.
It needed to split the things in the list to individual records, and to start a new, different, process, through pub/sub (to avoid the binary dependency with the called process), for each ‘thing’.

Fairly simple.

A lot of have been said on the different ways to split messages, I won’t repeat this discussion here; I would just say that initially I used a different approach – I used the SQL adapter in the initial, triggering, receive port and then used a receive pipeline, with an XmlDisassembler component, to split the incoming  message so that each record was published individually thus avoiding the need to have a ‘master process’; that back fired though, in my case – I quickly realised I’ll be choking the server with the amount of messages published and needed a way to throttle the execution; I’ve played a bit with host throttling but then came to the conclusion the best approach for me would be to throttle in a process, which is what I’ve done.

And so - to make things interesting, and because I already had it all ready - I decided to use a call to a pipeline from my process to split the message.

The first thing I realised, trying to take that approach, was that I had to change type of the response message received from the SQL port to be XmlDocument (which is an approach I generally dislike – I’m a sucker for strongly-typed-everything) – but my schema was configured as an envelope so that when I call the pipeline from my process it knows how to split it correctly, but, when used in the SQL port BizTalk split the message too early for me – I needed to whole message in the process first, which was no good to me; if , however, I removed the envelope definition from the schema when I would call the pipeline directly from my process it won’t know how to split the message, which is no good either; nor could i have two schemas (BizTalk, as we all know, dones’t like that bit at all, not without even more configuration); XmlDocument it is.

It then came back to me (in the form of a compile time error :-)) that the pipeline variable has to exist in an atomic scope, and so I added one to contain my pipeline variable; I then added the necessary loop with the condition set to the GetNext() method of the pipeline and in each iteration constructed a message using the GetCurrent() method; all standard stuff.

I would then set some context properties to route my message correctly and allow me to correlate the responses (I used a scatter-gather pattern in my master process) and published it to the message box

What I noticed when testing my shiny new process was that all those sub-processes that were meant to start as a result the published messages in my loop were delayed by quite a few minutes (6-8), which seemed completely unreasonable, so I embarked on a troubleshooting exercise which resulting in that big “I should have thought of that!” moment.

While the send shape in my loop successfully completed its act of publishing the message in each iteration, moving my loop to the next message and so on, being in an atomic scope BizTalk would not commit the newly published messages to the message box database, allowing subscriptions to kick in, before the atomic scope would finish; that is to allow it to rollback should something in the atomic scope would fail.
What it meant for me though, was that all the messages were still effectively published at once, which brought me back to square one (or, minus one, actually, considering that the great delay caused my this approach means I’m even worse off from my first debatch-in-pipeline approach).

And so I went back to the old and familiar approach of splitting the messages using xpath in the process, which allowed me to carefully control the publishing rate of messages for my process and throttle them as needed.

Labels: ,

Monday, December 31, 2007

BizTalk's Pub/Sub

The publish/subscribe mechanism in BizTalk is one of the key features of the product and is very useful and powerful.

I guess there's some learning curve around it, and that most first implementations of any BizTalk developer do not make much use of it (as they often start with all ports being directly bound) and that it takes a while befor a BizTalk developer and the organization involved establish a good architecture and move more towards using true publish-and-subscribe, losely-coupled, approach to implementation.

However, the existing model is not perfrect; in my view (and I suspect it is shared by many) it has two main weak points -



  • The pub/sub is implemented on top of MS-SQL which introduces a significant performance overhead


  • The orchestration subscriptions are 'compiled' and cannot be configured withouth a build-and-deploy cycle


  • The first point is quite an obvious one - there would be a latency associated with any implementation of publish/ubscribribe mechanism;. in the BizTalk case it involved writing the message and it's meta data (context) to the message box (a SQL database) and having a separate process locate newly published messages, figuring out which subscribers need to receive a copy of the message and manage the activation/correlationthe of message-process interaction (as well as keeping a list of references for house keeping etc).

    Reading and writing to the database, the the polling interval of the subscription evaluation process, etc. all introduce latency, which, in certain scenarios, can be crucial.

    If to believe the fractions of information floating around regarding Oslo then we might see an in-memory pub/sub mechanism in future version of BizTalk (in addition, not as a replacment to the existing model I suspect) which, while will no-doubt come with a price (persistance, and therefore scalabiltiy and durability to some extent), will no-doubt make supporting low-latency scenarios much easier.

    As for the second point -

    At first look the pub/sub in BizTalk is very flexible; in all the BizTalk demonstrations I can remember from the past the presenter would create a recieve port and a couple of send ports and will edit the subscriptions of those ports in the administration console to show how easy it is to create content-based routing in BizTalk server and configure it at runtime.

    In BizTalk 2006 you even did not have to restart the host to speed things up (as you did in demos with 2004), it happens pretty much instantly.

    However, the case with orchestrations is not that simple...

    The subscription for orchestrations is specified as a a filter in the properties of the initalizing receive shape in the process; this gets compiled into your assembly together with the process, and will be used to create the subscription when you deploy the orchestration.

    As far as I know, short of manipulating the management database yourself (which would not be supported) there's no way to change those subscription at runtime.

    If you want to change the subscription you have to change the filter in the orchestration, build, undeploy the old version and deploy the new one (or version the process and perform the side-by-side deployment)

    This is, in my view, an un-necessary pain, in dynamic organizations (aren't they all?) that require changes often; and to that extend developers had to find a solution to the "I need to be able to change that subscription from outside the process" requirement.

    That solution is often adding some routing metadata to messages in the form of context properties ('nextProcess', 'Operation', etc.) which would be set by publishing processes and/or pipeline components and use these in the filters (rather than the actual content data).

    So you could often see a pipeline component, often driven by some external configuraion, that would check for certain bits in the message or it's meta-data and set these context properties based on the values it found; the premise is that pipeline components are easier to replace, but also - thesee components often use database or a rules engine in one form or another to decide what goes into the message context and by doing so introducing real flexibility as is advertised.

    What all of this means is that we, developers, end up developing a pub/sub mechanism on top of the existing pub/sub simply because we need flexibility the product does not provide.

    I don't like this apprach, but I end up doing this myself occasionally, simply because I have to.

    I could possibly understand why MS has decided to do so - there are benefits to editing the subscription expression within the orchestration (known types would be one thing), and also - one could argue that the process subscription is part of the process design and so changing it is likely to involve code changes as well which will require a re-build, but really - I think we would all have benefited from the ability to edit the orchestration subscription in the same way we can edit send port subscriptions - through the admin console.

    Labels: , ,