Yossi Dahan [BizTalk]

Google
 

Saturday, February 21, 2009

Serialisation, mixed content and string[]

When you generate a class out of a schema with an element configured to allow mixed content (child attributes and elements as well as text), you should expect the corresponding generated field type to be a string array;

So - if you have a schema that looks like this

<?xml version="1.0" encoding="utf-8"?>
<
xs:schema targetNamespace="http://tempuri.org/XMLSchema.xsd" elementFormDefault="qualified" xmlns="http://tempuri.org/XMLSchema.xsd" xmlns:mstns="http://tempuri.org/XMLSchema.xsd"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<
xs:element name="SomeElement">
<
xs:complexType mixed="true">
<
xs:sequence>
<
xs:element name="Child1" type="xs:string"/>
<
xs:element name="Child2" type="xs:string"/>
<
xs:element name="Child3" type="xs:string"/>
</
xs:sequence>
<
xs:attribute name="SomeAttribute" type="xs:string"/>
</
xs:complexType>
</
xs:element>
</
xs:schema>

(‘SomeElement’ being a complex type allowing mixed content)

The fields in the generated class would look like

public partial class SomeElement {

private string child1Field;

private string child2Field;

private string child3Field;

private string[] textField;

private string someAttributeField;
.
.
.

The reason for the array of strings (instead of just one string field) is that an XML corresponding to the schema might look like this –


<SomeElement xmlns="http://tempuri.org/XMLSchema.xsd" SomeAttribute="someAttributeValue">
Some free text
<
Child1>Child1 text</Child1>
Some more free text
<
Child2>Child2 text</Child2>
yet some more free text
<
Child3>Child3 text</Child3>
</
SomeElement>

And so by using a string array to hold the text the deserialiser can keep string portions separately.

Initially, I thought, this allows the structure to represent the original xml accurately, but this is not exactly the case – you would still not know for certain where each string portion existed, especially if in the source XML you get a few elements that don’t have text between them, which , I suspect, is why when I serialise the instance back to xml I actually get –

<?xml version="1.0"?>
<
SomeElement xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" SomeAttribute="someAttributeValue" xmlns="http://tempuri.org/XMLSchema.xsd">
<
Child1>Child1 text</Child1>
<
Child2>Child2 text</Child2>
<
Child3>Child3 text</Child3>
Some free text

Some more free text

yet some more free text
</
SomeElement>

Now, I don’t particularly like this sort of xml, and shy away from mixed content; I don’t believe that xml snippets like my samples above are useful, specifically I don’t think that mixing elements and text is particularly nice.


However, consider an element with an attribute and some text – the following is quite reasonable I think, and yet requires mixed content -


<Phone type="mobile">some text here</Phone>



Labels: , ,

Saturday, September 13, 2008

Yet another example for a service's / xml bad design?

One of my clients have been using this 3rd party's software for some time now (I won't name neither organization for obvious reasons).

In their organisation both BizTalk and this software are key components and so they both play part in many scenarios.

Unfortunately, and quite surprisingly, this 3rd party's software doesn't have have any adequate support for integration and the main way to interact with it is pushing some data into 'import tables' in the software's database and then running some EXE to process them. quite horrendous.

To make matters worse this EXE can only allowed to run once at any single time so most implementations require some form of queuing and singleton pattern.

Recently that third party released a version with *some* support for web services, and so, excited by the great news the development team went on to implement a process using the newly introduced web service.

The service that had been exposed relates to the product's reporting features - users can create custom reports using the product's UI and then, using web services, export the result of these queries.

I would have expected, based on past experience, a generic schema in the WSDL that would describe a report's output, irrespective of the fields returned, something like -

<row>

<column fieldName=".."></column>

<column fieldName=".."></column>

.

.

</row>

But the vendor's schema included some generic field from their domain (report name, time of execution, etc.) and then rows and fields based on the specific report generated.

Because the report's structure is not known at design time its fields do not exist in the WSDL, instead a report node exist with an xs:any declaration to include the actual report's contents.

Personally I prefer to avoid xs:any if possible, and I think a schema describing a generic report's result could have been created, but that's not the main problem; the main problem, in my view, is that the fields added underneath the report element (as well as the report element itself), which are generated by the application, all belong to the same namespace.

Because they could not predict the names report designers will use for fields it was more than possible to create an element with duplicate meaning which is a bad idea overall and also causes quite a bit of headache when one needs to create schemas for BizTalk to accurately describe the response of such a service.

One thing I will grant them is that they have though of defining LAX validation for the report, so duplicate elements will not cause validation errors.

Labels: , ,

Thursday, August 02, 2007

Editing Xmls with intellisense in Visual Studio

Am I the last person in the world to realise that? hopefully not!...

...but I've just realised today that if you edit an xml file in visual studio and you have the schema of the xml you're editing open in another tab it will recognise it (as soon as you type the root node and namespace) and will provide intellisense.

I knew you could tell it to use a schema, but I did not realise it will automatically include opened documents. brilliant feature!

Labels: , ,

Tuesday, June 26, 2007

processing xml in two phases in xsl

Over the last few months I've been involved in the development of quite a few xsl scripts; one requirement that kept coming up and, as I'm not an xsl expert I always thought is impossible, was to perform what I would describe as - two phase parsing -where we run one bit of xsl to create an interim xml only to run another bit of xsl on it to get the results we want.

Here's an example of this requirement, I hope I can describe it in a way that makes sense:

Imagine you have two xml messages that are linked - one has a list of items, and the other their prices, something like this:

<?xml version="1.0" encoding="utf-8"?>
<ns0:Root xmlns:ns0="http://schemas.microsoft.com/BizTalk/2003/aggschema">
<InputMessagePart_0>
<items>
<item type="book" name="book1" barcode="12"/>
<item type="book" name="book2" barcode="34"/>
<item type="cd" name="cd1" barcode="56"/>
<item type="cd" name="cd2" barcode="78"/>
</items>
</InputMessagePart_0>
<InputMessagePart_1>
<prices>
<price price="10.5">
<barcodes>
<barcode>12</barcode>
</barcodes>
</price>
<price price="24.70">
<barcodes>
<barcode>34</barcode>
</barcodes>
</price>
<price price="56.20">
<barcodes>
<barcode>56</barcode>
</barcodes>
</price>
<price price="90.14">
<barcodes>
<barcode>78</barcode>
</barcodes>
</price>
</prices>
</InputMessagePart_1>
</ns0:Root>


(This might not make much sense as an example, but trust me - it represents a real world scenario that does, also - note the structure we've used to get both messages is the one BizTalk uses to get multiple message parts into a map)

Now - what we want to get in the end is this:


<itemTotals>
<itemType>
<type>book</type>
<total>35.2</total>
</itemType>
<itemType>
<type>cd</type>
<total>146.34</total>
</itemType>
</itemTotals>



As far as I can tell, getting from #1 to #2 in one go is not possible (but I'd love to hear otherwise), so our only conclusion was that we need to go through two stages in processing - in the first one we would de-normalise the two messages to one flattened xml, and in the second we will get the distinct types and sums.

So, to tackle the first stage we wrote a simple xsl that creates the interm xml we wanted - the output looks like this:


<itemTotals>
<item type="book" total="10.5" />
<item type="book" total="24.70" />
<item type="cd" total="56.20" />
<item type="cd" total="90.14" />
</itemTotals>



than, we put this xsl script inside a variable declaration; our xsl now looks like this:


<xsl:template match="/">
<xsl:variable name="items">
<xsl:for-each select="/*[local-name()='Root' and namespace-uri()='http://schemas.microsoft.com/BizTalk/2003/aggschema']/*[local-name()='InputMessagePart_0' and namespace-uri()='']/*[local-name()='items' and namespace-uri()='']/*[local-name()='item' and namespace-uri()='']">
<xsl:element name="item">
<xsl:attribute name="type">
<xsl:value-of select="@type"/>
</xsl:attribute>
<xsl:variable name="barcode" select="@barcode"/>
<xsl:attribute name="total">
<xsl:value-of select="/*[local-name()='Root' and namespace-uri()='http://schemas.microsoft.com/BizTalk/2003/aggschema']/*[local-name()='InputMessagePart_1' and namespace-uri()='']/*[local-name()='prices' and namespace-uri()='']/*[local-name()='price' and namespace-uri()='' and child::*/child::*=$barcode]/@price"/>
</xsl:attribute>
</xsl:element>
</xsl:for-each>
</xsl:variable>
</xsl:template>


(sorry for the long xpaths, we've had to make them "BizTalk friendly"...)

That gives us the interm xml we need to work on and is one of two "magic" bits to get this working - we did not realise up until now that we could put whole chunk of xml into a variable, and that we could use xsl to create that xml in a variable.

Having the xml in a variable, we hoped, would allow us to run another set of xsl on it before outputing it as the script's result. but there was one additional maginc point missing - as the variable was not retrieved in the normal <variable name="myVar" select="someXPath"/> it's contents is not considered a node-set but just a literal that looks like a node-set and as such we could not use this variable in further xpaths in the script.

Thankfully, Microsoft has provided a function to convert one to the other, so we had the next line of xsl to our script:


<xsl:variable name="itemsNodeSet" select="msxsl:node-set($items)"/>


we had to add the namespace declaratino for msxsl in the stylesheeet declaraion -

xmlns:msxsl="urn:schemas-microsoft-com:xslt"


Now that we have the interm xml in a variable, and is considered a node-set we could simply run the last bit of xsl we need to get the totals:

(the for each uses another nice technique we use in xsl to get a distinct list of items in a list)


<xsl:for-each select="$itemsNodeSet/*[not(@type=preceding-sibling::*/@type)]">
<itemType>
<type>
<xsl:value-of select="@type"/>
</type>
<total>
<xsl:variable name="type" select="@type"/>
<xsl:value-of select="sum($itemsNodeSet/item[@type=$type]/@total)"/>
</total>
</itemType>
</xsl:for-each>


and voila! - the output is exactly what we wanted!

Labels: ,

Tuesday, May 29, 2007

Passing xml between xsl and a helper method

I'm using a lot of custom xsl. in fact - most of my transformations are custom xsl files, skipping the mapper altogether.

In that, I'm also quite frequently using helper classes in assemblies called directly from the xsl, on which I have posted before.

Mostly I'm passing simple values between the two such as strings and ints, but every now and then I need to pass a whole xml struture between them - it might be a node-set from the xsl that needs to be processed by the helper method, it might be that the method returns an xml fragment the xsl needs to then iterate on or, most likely, it will be both.

The way to do it is quite simply to use the XPathNodeIterator class in the System.Xml.XPath namespace.

I did not know about this class until I had to do this bit, so it has to be worth posting (for myself if not for anyone else).

To demonstrate it's use I've create a helper method as follwos -

public XPathNodeIterator processXml(XPathNodeIterator nodeset)

{

nodeset.MoveNext();

XPathNavigator nav = nodeset.Current;

XPathNodeIterator i = nav.Select("//*");

return i;

}

As you can see this method doesn't really do much, but it already demonstrates how to receive an xml node-set and how to return one.

the helper method can work with the following xsl -

<xsl:variable name="MyNodeSet" select="<some xpath>"/>

<xsl:variable name="HelperResult" select="helpers:processXml($MyNodeSet)"/>

<xsl:value-of select="$HelperResult/<some other xpath>"/>


The next challange was how to create a brand new XPathNodeIterator, here's one way (assuming xmlDoc is an XmlDocument that contains the xml you need to return -

XPathNavigator xPathNav;

xPathNav = xmlDoc.CreateNavigator();

xPathNav.MoveToRoot();

XPathExpression xPathExpr = xPathNav.Compile("");

XPathNodeIterator xPathNodI = xPathNav.Select(xPathExpr);

Labels: ,