Yossi Dahan [BizTalk]

Google
 

Monday, May 12, 2008

Flush failed to run error when using DES

A quick one -

If you're using one of the DirectEventStream BAM API and you're seeing an error "Flush failed to run", check the connection string (and related permissions) to the database.

Happened three times to me recently (don't ask!) - all of which were related to configuration problems.

Labels: ,

Monday, May 05, 2008

Do we need schemas?

This is somewhat of a recurring theme with me recently, but I want to discuss the contents of the management database; more specifically I want to discuss the fact that schemas get deployed to it and that most other things deployed will have a strong dependency on schemas.

As schemas are always at the bottom of the dependency chain, this means is that on top of the expected difficulties one can experience when needing to change schemas and the impact on other system, the actual act of deploying a new schema.

At best this is simply an annoyance to a developer who needs to re-deploy his entire solution as the schema evolves through the development cycle (versioning is not applicable in this scenario);

At worst this is an operational nightmare if a solution has to be updated/patched/evolved where a good versioning story does not exist (as is all too often the case, not that versioning would have solved this all).

As we are forced to remove the entire solution and then re-deploy with the new schema, we can expect, from my experience, the process to take quite a while for large solutions, which may take the business offline for a couple of hours.

Taking the risk of making a point about something I don't know enough about - the internal behaviour of BizTalk server with regards to deployed schemas (but one could say this is often the case...) - I would argue that as far as I can tell, schemas are not actually used all that often by the runtime.

(and because I accept I could be completely wrong here, please do share any thoughts/ideas/comments/insights/whatever on the subject - put a comment on this post or email me if you prefer. I'd love to hear some feedback on this.)

Anyway - as I was saying -

When you define a message type you select the schema at design time, and the designer may refer to that schema to do various things - draw the map designer, check validity of assignments in expression shapes, build intellisense, it would even check serialisation an de-serialisation attributes on classes vs. your schema when you try to assign a .net class to a message in an expression shape, but as far as I'm aware, the schemas are rarely used by the runtime.

At runtime, when message is received into an orchestration (and set to a pre-defined message type), it's contents are not checked against the schema; neither does it get validated at the end of a transform or message assignment shapes.

When you run a map you select a schema, but again - that map could well return something completely different; BizTalk couldn't care less.

When do I know schemas get used? in the pipelines. sometimes.

If you're using the XmlDisassembler for example it would try to resolve the message type based on the message's root node and namespace, and then try to get the schema from the database.

the disassembler may then use this schema to promote some properties, if configured it may debatch the message according to the schema and possibly use it to validate the message; all are very valid usages for the schema but - they are not always used, and they require specific configuration, either in the schema at design time or in the pipeline component (or both).

Also, at least with regards to property promotion, all that get's used is a bunch of xpaths provided in an annotation in the schema, not the actual schema information.

There are, of course, other cases where schemas are required - FlatFileDisassembler, XmlValidation, Xml and FlatFile Assemblers all need schemas for their work (to some extent at least) and definitely the design time environment uses them extensively, but what I'm arguing is - can we do without having to deploy schemas if they are not used?

BizTalk works in a late-binding fashion anyway, where assemblies and their contents are loaded from the GAC/database as needed (and may be unloaded after a period of them not being used), couldn't we get away with only deploying the schema when it is needed at runtime, and simply 'register' message types when it is not?

In fact - even if a schema is needed at runtime - why does it need to exist in the database? how is it different from maps, pipelines, orchestrations? all of which are 'known' to the database but physically exist only in the GAC? (well, that's not accurate - the orchestration's structure is stored, as XML in the database, but that's to be displayed in HAT, and possibly a bad design decision on it's own)

I can't help thinking I'm missing something, I'm sure the guys behind BizTalk's decision had given it a lot of thought and found good justification for it, wouldn't they? anyone can comment on what those might be?

One argument could be that BizTalk wants to know which messages are 'supported' by the solution - just as a message arriving with no subscription is considered an error, a message arriving which is not of a known 'type' should be considered an error. but in a sense - the two are the same, and in any case BizTalk is quite happy to support 'blob' messages through the use of passthrough pipelines and XmlDocument as a message type in the orchestrations.

Labels: ,

Thursday, April 24, 2008

Throttling in full action

Here's another one from the archives (=the list of things I have waiting to be blogged)

At some point we had a sudden peak in system load on our BizTalk processes and, as a result, our BizTalk solution that was running so nicely seem to have gotten "stuck".

In "stuck" I mean - we ended up with lots of processes in "Active" state, but they did not seem to be active at all; a closer inspection (of trace that should have been emitted) showed that although the instances status says "Active" they were all very passive indeed - nothing was executing on the server - close to 0% CPU and no trace whatsoever.

This is where you might expect me to describe the long hours we've spent investigating the issue, the sleepless nights and empty cartons of pizza... - but really what happened is that, not being able to afford any more down time, we called out premier support which turned out to be a great thing because the first thing they did (well, not literally, but anyway) was to ask us to check the state of the server using the MsgBoxViewer which in turn pointed out that we have simply "max-ed out" our memory consumption throttling level.

You see - we use a lot of caching of data in our processes; mostly because we access a lot of reference data frequently - data that does not change very often; this is by design. what we forgot to do is estimate the amount of memory this caching will require when many different clients use the system and adjust the throttling level accordingly.

As you can see from the image below - out of the box the BizTalk hosts are configured to throttle at 25% of the server's physical memory. the idea is to prevent the BizTalk processes from taking up too much memory and killing the server, and the assumption is that if throttling kicks in, and stops processing instances, memory consumption will slowly reduce until the server gets back to a more healthy state. however - from it's very nature - caching does not really release memory that often and so instances have stopped progressing but no memory was released as a result and so we got "stuck".

clip_image004

In our case, the solution was straight forward - as we know our memory consumption will be high, and we know there's nothing else running on the server to compete with that memory consumption (more or less) we could increase the threshold to 50%, which is enough to grant BizTalk Server enough memory for the caching and all the processing requirements.

In the process we monitored the situation by investigating two BizTalk performance counters - "Process memory usage threshold" (here shows as 500MB) compared to "process memory usage" (here showing around 130MB).

clip_image006

As long as there was large enough gap between the two we knew our processes are going to be just fine; it is always important, of course, to monitor these over time to ensure there's no memory leak in the processes, which we have done, on top of peak load tests - which we have not.

Now, while all of this is down to a test or two we may have neglected on our side, there are a couple of interesting points at the back of this from a product perspective -

  1. We were confused by what we saw mostly because of the "active" state of all instances (and we had quite a few); we would have diagnosed the problem much quicker, and on our own, had the admin console indicated that the server is not actually processing anything due to it's throttling state.

  2. I can't help but wondering whether the throttling mechanism couldn't be a bit more clever and identify it has reached a dead end and is not actually helping in improving the situation. following on our case the engine realised memory usage has gone too high and has stopped processing instances. wouldn't it be great if after, say, 10 minutes it realised that memory is not actually reducing and so it will never exit the throttling state and would write something to the event log?

Again - not trying to make any excuses, just thoughts with the power of hind sight...

Labels: , ,