Yossi Dahan [BizTalk]

Google
 

Monday, May 05, 2008

Do we need schemas?

This is somewhat of a recurring theme with me recently, but I want to discuss the contents of the management database; more specifically I want to discuss the fact that schemas get deployed to it and that most other things deployed will have a strong dependency on schemas.

As schemas are always at the bottom of the dependency chain, this means is that on top of the expected difficulties one can experience when needing to change schemas and the impact on other system, the actual act of deploying a new schema.

At best this is simply an annoyance to a developer who needs to re-deploy his entire solution as the schema evolves through the development cycle (versioning is not applicable in this scenario);

At worst this is an operational nightmare if a solution has to be updated/patched/evolved where a good versioning story does not exist (as is all too often the case, not that versioning would have solved this all).

As we are forced to remove the entire solution and then re-deploy with the new schema, we can expect, from my experience, the process to take quite a while for large solutions, which may take the business offline for a couple of hours.

Taking the risk of making a point about something I don't know enough about - the internal behaviour of BizTalk server with regards to deployed schemas (but one could say this is often the case...) - I would argue that as far as I can tell, schemas are not actually used all that often by the runtime.

(and because I accept I could be completely wrong here, please do share any thoughts/ideas/comments/insights/whatever on the subject - put a comment on this post or email me if you prefer. I'd love to hear some feedback on this.)

Anyway - as I was saying -

When you define a message type you select the schema at design time, and the designer may refer to that schema to do various things - draw the map designer, check validity of assignments in expression shapes, build intellisense, it would even check serialisation an de-serialisation attributes on classes vs. your schema when you try to assign a .net class to a message in an expression shape, but as far as I'm aware, the schemas are rarely used by the runtime.

At runtime, when message is received into an orchestration (and set to a pre-defined message type), it's contents are not checked against the schema; neither does it get validated at the end of a transform or message assignment shapes.

When you run a map you select a schema, but again - that map could well return something completely different; BizTalk couldn't care less.

When do I know schemas get used? in the pipelines. sometimes.

If you're using the XmlDisassembler for example it would try to resolve the message type based on the message's root node and namespace, and then try to get the schema from the database.

the disassembler may then use this schema to promote some properties, if configured it may debatch the message according to the schema and possibly use it to validate the message; all are very valid usages for the schema but - they are not always used, and they require specific configuration, either in the schema at design time or in the pipeline component (or both).

Also, at least with regards to property promotion, all that get's used is a bunch of xpaths provided in an annotation in the schema, not the actual schema information.

There are, of course, other cases where schemas are required - FlatFileDisassembler, XmlValidation, Xml and FlatFile Assemblers all need schemas for their work (to some extent at least) and definitely the design time environment uses them extensively, but what I'm arguing is - can we do without having to deploy schemas if they are not used?

BizTalk works in a late-binding fashion anyway, where assemblies and their contents are loaded from the GAC/database as needed (and may be unloaded after a period of them not being used), couldn't we get away with only deploying the schema when it is needed at runtime, and simply 'register' message types when it is not?

In fact - even if a schema is needed at runtime - why does it need to exist in the database? how is it different from maps, pipelines, orchestrations? all of which are 'known' to the database but physically exist only in the GAC? (well, that's not accurate - the orchestration's structure is stored, as XML in the database, but that's to be displayed in HAT, and possibly a bad design decision on it's own)

I can't help thinking I'm missing something, I'm sure the guys behind BizTalk's decision had given it a lot of thought and found good justification for it, wouldn't they? anyone can comment on what those might be?

One argument could be that BizTalk wants to know which messages are 'supported' by the solution - just as a message arriving with no subscription is considered an error, a message arriving which is not of a known 'type' should be considered an error. but in a sense - the two are the same, and in any case BizTalk is quite happy to support 'blob' messages through the use of passthrough pipelines and XmlDocument as a message type in the orchestrations.

Labels: ,

6 Comments:

  • Hi yossi,

    I had a situation the other day where i was creating a new message in a pipeline component which then went via the message box to an orchestration.

    The orchestration port knows what type of message it is expecting.

    When i ran my scenario i was getting an error from the receive shape (if i remember correctly) in the orchestration which was indicating that it received a message which did not match the schema strong name context property

    Im assuming this means the orchestration is deserializing this message and checking it against a schema internally

    Once i was setting the correct value in the context property it was working fine

    What does this mean, i guess in some scenarios the schema is important to orchestration execution internally

    HTH
    Mike

    By Anonymous Mike Stephenson, at 05/05/08 23:35  

  • Thanks Mike

    But I wonder - from your comment it is not clear if the engine actually checked the message content/structure AGAINST the schema, or simply checked that the message type is known (based on the message type identified in the pipeline, or event the root-node/namespace combination.

    I suspect all that happened is that the engine identified that the message received is not of the requested TYPE, but did not actually care about the contents of the message (which was examined in the pipline), am I wrong?

    I'm happy the check for message type vs. subscription, but does it really need the entire schema in the database?

    By Blogger Yossi Dahan, at 05/05/08 23:47  

  • BTS does not need the schema most of the time. One purpose in my solutions for schemas is that I use them for schema-level validation of incoming documents. I use a custom pipeline component to get a schema and validate documents against it...having the scheam availalbe in this case is very useful.

    Schemas are of course important for mapping, but there are interesting ways around that too, and you can actually get away without having a schema around at at all (for mapping).

    Things change if you have mandatory elements (min=1 max=1) and try to de/serialize something that does not conform - but that is a very basic level of validation.

    For true (as in functioning) schema-level validation, you have to write your own pipelne.

    By Anonymous Erik, at 08/05/08 04:02  

  • Yes, I agree with all the things you said. However, we still need schemas.

    Note that BizTalk create .NET data types (classes) based on these schemas, and if your input is XML, BizTalk try to match the XML based on the .NET class type generated from the schema. Sure you can always declare all message types as XmlDocument but how are you going to instantiate an Orchestration based on the Message type?

    Also, Schemas provide a way to AUTOMATICALLY promote a node as promoted property...which is used in Correlation and all that good stuff.

    So, yeah...it may not seem much...i suppose it's a necessary evil.

    By Blogger Dexter Legaspi, at 09/05/08 17:06  

  • Thank you all for the comments, I'm very interested in this subject and am really happy to get (and share) as many opinions as possible.

    I don't think at all this is clear cut (not that my view matters).

    But - to be clear - I'm not arguing at all that schemas are not important or useful, nor do I argue that sometimes they are really needed and BizTalk make a very good use of them.

    I'm definitely not arguing to make BizTalk any less strong-typed. I think that well-known message types is a fundemental concept in BizTalk that should be maintained.

    I am, however, trying to argue is, that often, the actual schema content is not used by BizTalk and so might not be needed, which would save quite a bit of hassle around deployment.

    I think often it would be enought for BizTalk to map a root node-namespace combination to a message type (as it does), without needing the entire schema in the database.

    Isn't that what's hapenning in practice anyway?


    In other words - I think that, like pipeline component for example - schemas should be "known of" but only exist fully in the databse if they are needed by the runtime (for property promotion for example, or debatching, or validaion, when one chooses to have one). maybe they can even simply be loaded from the assembly when needed at runtime, just as pipeline components and their configuration is, for example.(and cahced. of course!)

    Does that not make sense at all?

    By Blogger Yossi Dahan, at 12/05/08 22:02  

  • That does make sense, an interesting discussion.
    I think BizTalk's deep deployment dependancy on schemas is due to early design paradigms\decisions rather than what's necessary and sufficient to make the thing tick. The .net assembly containing the text of the XSD could certainly be read from the GAC at runtime, but maybe the (message data) promoted properties must be deployed in as a hard dependancy as the subscriptions rely on them - its better to make you undeploy your entire codebase to change the subscriptions than allow them to change outside of BizTalk's control.

    Of course the the intended use case is not that you should be undeploying the entire stack, you should be side by side deploying the next version. And whilst you may not have up-front planned a great versioning story, you were compelled to at least assign a strong name to your schema assembly which allows you to take things forward for side by side deployment.

    By Blogger Ben Cops, at 18/06/08 19:36  

Post a Comment

<< Home