Suspended instance and correlation subscription
We've created a test scenario today which should not happen in real life but it highlighted a certain behaviour of BizTalk I think is very important to realise as it can have very hard impact on some scenarios.
We've created a process that starts with a parallel convoy to subscribe to two messages it needs to start with.
As this was a quick test scenario we've used the ReceivePortName as the correlation property, planning to drop both messages (of different types) to the same receive location and seeing them arrive at the orchestration.
It all worked very well. well most of it did. Something further down the process failed which caused the orchestration instance to be suspended, and thats perfectly fine.
What I found out next was the big surprise.
We've fixed our problem (or so we thought), so we've re-gac'ed the assembly (no need to re-deploy) and restarted the host.
We did not terminate the existing instance, but dropped two new messages in the receive location's folder, expecting BizTalk to pick them up (which it did) and start a new instance of the process.
It's the fact that the latter did not happen that surprised us.
Looking a bit more carefully we soon realised the two messages got delivered to the existing, suspended, instance rather then creating a new one.
Once we knew the behaviour we kind of managed to rationalise it, but to be honest, I do not think that is what I would expect.
In my mind a suspended instance's subscriptions should be removed, and when additional messages arrive new instance of the process should be created.
I've already wrote a while back that I believe the fact that correlation subscriptions are not being removed once fulfiled (or that we don't at least haev the ability to explicitly remove them) makes a biztalk developer's life much harder, and "zombies" more likely to occur)
Having to worry about suspended instance subscription's is even harder on more error prone.
I guess a counter argument could be that, when a true unique correlation is used (like conversation id of some sort), and an instance is suspended you may still want it to get routed to the same instance, for the small chance you could resume the process with all the related messages, or at least you will have all the messages grouped in once place (under the same process, as consumed) rather then looking for them between instances or, in the more likely case of non-activating correlation - routing failures.
I think a distinction can be made beween correlations used to activate processes, and specifically parallel ones, and non-activating correlations such as when you simply need to route response(s) back to your process.
If the correlation is used for a parallel convoy to activate a process I think in most cases it is safe to assume a new instacne can and should be started, I agree that in the second case, when a message needs to be routed to an existing process, correlating to suspended instances makes much more sense.
I know suspended instances are not something you should really have long term in the Admin Console, I usually like to refer to them as a to-do list, which is definitely not something you want to see grow too long.
Having said that it is still more then just reasonable to expect a few of the lying around waiting to be treated, and they should not have such a dramatic affect on the health of the system.
By keeping the subscriptions open it is possible that newer instances of the process, which might work if the error that caused the original failure was fixed, will not be started and potentially for messages to disappear (as it is not so obvious where they went to without very careful investigation)
I know the correlation we've used is not the best candidate for a real world example, but in fact I have seen it being used in a couple of places, so I guess it is not that strange, and anyway, I'm sure if one thinks carefully enough it is
possible to find more real-world representations of this error.
I hope this approach would be re-considered at some stage, and potentially changed.
Alternatively it would be useful to use a proeprty on the correlation set or the receive shape to specify the expected behaviour.
We've created a process that starts with a parallel convoy to subscribe to two messages it needs to start with.
As this was a quick test scenario we've used the ReceivePortName as the correlation property, planning to drop both messages (of different types) to the same receive location and seeing them arrive at the orchestration.
It all worked very well. well most of it did. Something further down the process failed which caused the orchestration instance to be suspended, and thats perfectly fine.
What I found out next was the big surprise.
We've fixed our problem (or so we thought), so we've re-gac'ed the assembly (no need to re-deploy) and restarted the host.
We did not terminate the existing instance, but dropped two new messages in the receive location's folder, expecting BizTalk to pick them up (which it did) and start a new instance of the process.
It's the fact that the latter did not happen that surprised us.
Looking a bit more carefully we soon realised the two messages got delivered to the existing, suspended, instance rather then creating a new one.
Once we knew the behaviour we kind of managed to rationalise it, but to be honest, I do not think that is what I would expect.
In my mind a suspended instance's subscriptions should be removed, and when additional messages arrive new instance of the process should be created.
I've already wrote a while back that I believe the fact that correlation subscriptions are not being removed once fulfiled (or that we don't at least haev the ability to explicitly remove them) makes a biztalk developer's life much harder, and "zombies" more likely to occur)
Having to worry about suspended instance subscription's is even harder on more error prone.
I guess a counter argument could be that, when a true unique correlation is used (like conversation id of some sort), and an instance is suspended you may still want it to get routed to the same instance, for the small chance you could resume the process with all the related messages, or at least you will have all the messages grouped in once place (under the same process, as consumed) rather then looking for them between instances or, in the more likely case of non-activating correlation - routing failures.
I think a distinction can be made beween correlations used to activate processes, and specifically parallel ones, and non-activating correlations such as when you simply need to route response(s) back to your process.
If the correlation is used for a parallel convoy to activate a process I think in most cases it is safe to assume a new instacne can and should be started, I agree that in the second case, when a message needs to be routed to an existing process, correlating to suspended instances makes much more sense.
I know suspended instances are not something you should really have long term in the Admin Console, I usually like to refer to them as a to-do list, which is definitely not something you want to see grow too long.
Having said that it is still more then just reasonable to expect a few of the lying around waiting to be treated, and they should not have such a dramatic affect on the health of the system.
By keeping the subscriptions open it is possible that newer instances of the process, which might work if the error that caused the original failure was fixed, will not be started and potentially for messages to disappear (as it is not so obvious where they went to without very careful investigation)
I know the correlation we've used is not the best candidate for a real world example, but in fact I have seen it being used in a couple of places, so I guess it is not that strange, and anyway, I'm sure if one thinks carefully enough it is
possible to find more real-world representations of this error.
I hope this approach would be re-considered at some stage, and potentially changed.
Alternatively it would be useful to use a proeprty on the correlation set or the receive shape to specify the expected behaviour.

0 Comments:
Post a Comment
<< Home