Yossi Dahan [BizTalk]

Google
 

Thursday, January 21, 2010

What not to do – hard coded values

This always seemed like an obvious one to me, but as I do see it out there occasionally, I guess it should be mentioned in my little ‘series’ – think carefully whenever you are hard coding any data into your code, as it is almost guaranteed to change one day, however remote you think that chance is.

This post could probably end here, but that would be almost rude, so I’ll give an example -

A custom disassembler is being developed to identify a message type before parsing it.

The messages are expected to be received over POP3, and the decision which message it is is done by identifying the sender of the email (rather than by looking in the actual message), which is fair enough.

So – the component has code to read the context property storing the sender’s email address and then a switch statement on the property’s value determines the message type required.

Of course this works very well. until a third party decides to change the email address, or another one is added. now you have to update the code, re-deploy and re-test. why? because you did not want to spend another day moving this to configuration?

Even a simple AppSetting would be better than hard coded, but of course usually you’d be looking at least at defining a custom configuration section if not using a database or other store (SSO?)

Labels: ,

Friday, December 11, 2009

What not to do – avoid reading entire message’s to memory unnecessarily

This one is fairly common, I suspect, and I can certainly see why – the temptation is simply to big – but too many pipeline components start by reading the message into memory, when, with a little bit more effort this could have been avoided.

One pipeline component I’ve seen, for example, receives a flat file, and needs to remove records already processed (duplication elimination) – quite a good thing to do in a pipeline, and I also liked the approach of doing so before the disassembler, to make the xml produced smaller.

V1 of the component used a memory stream – the incoming stream was read line by line, each line was assessed, and – if was not a duplicate – it would get written to the memory stream.

When the component had finished going through the entire incoming stream, the memory stream would be assigned to the message, replacing the original stream, and the message would get returned to the pipeline.

There are two downsides to this approach – the first is memory consumption – the component will always consume at least as much memory as the size of the (outgoing) message; done properly BizTalk would then clean this memory, but only after completing the processing of the message; the second downside is potentially unnecessary delay in further processing of the message – one of the huge benefits of the pipeline, in my view, is its streaming fashion, where subsequent components, if developed in the correct manner, can start working on parts of the message before preceding components completed their processing; basically each component passes back to the pipeline the portion of the message it already processed, whilst working on the next portion.

It appears that the customer in question encounter memory issues as the component’s code was changed to use virtual stream instead of memory stream; a virtual stream is effectively a stream that uses disk for storage instead of memory.

This solves the memory consumption issue, but merely replaces it with IO operations which may have an even bigger impact on the server’s overall performance (and does not address the processing delay point at all).

What would have been the correct way to implement this in my view?

The component should have create a custom stream, wrapping the original stream from the message; It would then replace the message’s stream with the custom stream immediately returning the message back to the pipeline. Note that so far the component hadn’t touched the message stream – zero bytes have been read.

As BizTalk (and not the component!) would read the message (for instance when persisting it to the message box), the custom stream’s read function would be called which would contain that reads the underlying stream (the original stream received by the component), probably buffering reads until the end of a line for simplicity (although in many cases this is not necessary) and assessing whether the record is a duplicate or not; if it is a duplicate the function will simply read the next line and so on until a non-duplicate record is found, at which point the line would be returned as the output byte stream from the read function.

This effectively means that the next component, or the message box, will receive the message line by line, duplicate records removed, without having to wait for the component to process the entire message, and with only a maximum of one line ever loaded into memory.

Labels: ,

Tuesday, December 01, 2009

What not to do – ‘bundle’ unrelated components together’

So – here’s another real-world they shouldn’t have done this –really! item; somewhat related to my previous post about projects, but a different slant -

MyP had used pipeline components fairly extensively, a very good idea in my view, especially in this case.

But, as discussed in that previous post all pipeline components, regardless of where they were used, were bundled together in a single assembly.

To make matters worse, a single pipeline component often served more than one purpose, for more than one client.

So – for example – a pipeline initially created to remove an unwanted trailer from a message from a particular sender, ended up also converting the message to xml, and then extended to support another format, from another sender, only that the two don’t share any code – the execute method of the pipeline component has a switch statement on the sender name, and runs two separate functions.

Now – considering all the components are in the same assembly already, how can this make matters worse?

Well – single responsibility principle is one that I generally like – I’m a new developer working on this project, I see a component called TrailerRemover, used in a pipeline called <someSender>_Receive I assume this is processing messages from <someSender> and that it removes a trailer.

I eventually discover it does a lot more, and processes messages from another sender as well.

One of the side effects of this is time wasting – it is much more difficult, in my experience, to maintain systems that don’t follow the single responsibility principle.

This is aggrevated by the fact that this is usually a symptom, if not a cause, for bad architecture – I shouldn’t be able to mix logic for two different senders, not unless specified common logic is factored out and reused properly, from a shared assembly.

Now when I come to change some code I find it difficult to know what the impact may be – where exactly is this thing used?

Labels: ,

Tuesday, November 24, 2009

What not to do – projects

So – what not to do?

Here’s one – don’t take 15 different interfaces, from 15 different providers, and the canonical format, and your internal process and bundle it into the same set of projects.

My predecessor (MyP in short) in this project had followed the best practice and ensured (for the most part) that schemas, maps, pipelines and processes each have their own project.

This is good – to start with, in the old days, mixing them used to cause all sorts of build issues, but – although I haven’t tried in several years now – I assume that’s all in the past, but – more importantly – this is bad practice because different artifacts have different ‘resilience to change’ – if you need to add a small shape to a small process that gets instantiated as a result of a receive shape (and not being ‘started’ or ‘called’) you can usually un-deploy it fairly easily.

Change a schema, on the other hand, and usually there’s a whole raft of artifacts you now have to un-deploy with it.

For that reason – mixing two artifact types in the same assembly, whilst not technically problematic, usually suggests you will have maintenance nightmare in the near future (unless you don’t mind down time, and regression tests, that is).

Anyway, as I was saying, MyP did separate the different artifact types to different assemblies, but equally MyP only had one assembly of each; so – when we are processing a message arriving from partner A, for which we have a schema, a pipeline, a map or two and an orchestration – these were split across four assemblies; equally - when we are processing a message arriving from partner B, for which we also have a schema, a pipeline, a map or two and an orchestration, all different – these were also split across THE SAME four assemblies.

On the surface not much wrong with this, the problem is what happens when you want to change a small thing in, say the schema for partner B, used from within a map, which in turn gets used within an orchestration – you now have to un-deploy the orchestration assembly, so that you can un-deploy the maps assembly, so that you can un-deploy the schema assembly, so that you can redeploy the new version of the schema assembly followed by the rest. (ok – so I’m assuming versioning and side-by-side deployment are not being used – which I have to – for my story);

So – all that is the general pain that is part of the BizTalk developer’s day, why am I bringing this here? well because you’ve taken a BizTalk ‘challenge’ and doubled it - for one – you took process A down due to a change in process B; why? had they existed in separate assembly sets you wouldn’t need to… and two – as you’ve released new code – you have to test new code; only that now you have to test two sets of code, including one you haven’t intended to change (but may have, by mistake or otherwise); again – had there been two sets of assemblies, you could probably get away with testing just the scenarios related to the ones you’ve changed.

Now multiply this by about 15 partners, and you see how it can be quite wrong. I hope.

Labels: ,

What not to do - premier

Again I find my self having to apologise for not writing for a while (it’s been one month since my last post, two since my last ‘real’ post), the usual suspects are to blame – too much work, too much bureaucracy, kids, new xbox games….

Work wise I’ve taken on a few more small engagement recently, one of which was to enhance/fix/maintain a small-ish solution someone else built.

I could not resist the temptation to take some notes of all the stuff I would have done differently, and am in the process of compiling them as a proper report for the client.

As I was not posting for a while, partly due to this work, I thought its only fitting that I publish some of these ‘recommendations’ here.

For obvious reasons, I will not name names and will try to generalise any samples/explanations provided; I can’t resist mentioning, though, that this other person, I learnt as I looked up his/her name on the web, had also nicked the entire text off my web site and used it on his/her own, which I found rather annoying (I believe the pages have since been removed, after a polite nudge…)

Anyway….keep in mind that many of my notes may come down to style, or MY best practice; they don’t necessarily mean that the other approach is completely wrong -  I don’t presume I know better than anyone else (oh well….) but that there’s value in another point of view – in this case- mine.

So – a few posts coming, likely to be very short and to the point, hopefully someone will findthem useful.

Yossi

Labels: ,