BizTalk Server: Teach me something new about Flat Files (or not) – Delimited Flat File Schema Optimizations

We ended the last post asking: Do you think the Schema is entirely correct? Does it handle all files accordingly to the expected results? Or do we need to tune some of the properties of your schema…

When we start working with Flat File schemas for BizTalk, we may think that just using the Flat File wizard is going to be enough. Once the schema is created, we just need to create a custom pipeline and add the schema to the Document Schemas of a Flat File disassembler and it will be good… well, let me tell you that most probably is not enough and the reality may be very different.

A common mistake that we sometimes do is to rely on a single sample message to generate/validate our Flat File Schemas. And we ended up getting possibly several different errors in production due to unforeseen cases that end up not being supported in our current schema version.

So it is really important to get all the information, as possible, from the different teams, for example:

  • Are there any optional elements?
  • If so, will they be deleted from the message? Or will they be present without any content (blank/null)
  • Does the message will always end with a blank line? i.e., does the last line contain a Carriage Return/Line Feed (CR+LF)
  • What are the Data type of elements?

And many, many more….

The following walkthrough shows you some of the common scenarios that you can expect and how you can tune some of the properties of our Flat File Schemas to deal with them.

Cosmetic operations

While generating the Flat File Schema with the Flat File Schema Wizard you should always provide/set:

  • The correct or proper namespace for the schema
  • Rename the parent node, each record, element or attribute
  • And set the correct data type of each element or attribute

However, if you forgot to do that or if you make a mistake defining one of them, you can always easily rectify it in the Schema Editor.

I call it cosmetic operations because they don’t infer “any changes” in the translation rules generated by the wizard, they only changed the way the elements are presented in the message if you change the name of the element or attributes. Or they will have effect in the validation of the message if we change the data types.

You can accomplish that by:

  • Open the schema in BizTalk Editor.
  • Change the Schema Target Namespace
    • In the BizTalk Editor dialog, right-click the <Schema> node in the tree view and click Properties. The Properties dialog appears.
    • In the Properties dialog, the Target Namespace displays under the General category for the <Schema> node.
    • In the Target Namespace field, type the name of your new target namespace and click OK.
  • Change the Field Attribute or Field Element Name
    • Select the Field Attribute or Field Element node and then right-click and select the “Rename” option
    • Type the desired Field Attribute or Field Element name and press “Enter”
  • Change the Field Attribute or Field Element Data Type
    • Select the Field Attribute or Field Element node and then right-click and select the “Properties” option
    • In Properties window, set the “Data Type” property to the desired type
BizTalk Flat File Schema Cosmetic Operations

Validating Flat Files Schemas through the Visual Studio with different scenarios

Let us forget the runtime for now, and focus firstly on the development phase where we can test and validate our schemas directly from Visual Studio as explained, once again, in the previous post.

Dealing with Files ending with a newline or that doesn’t end with a newline

Initially, we had the following example:

Sandro;Pereira;1978;Crestuma;4415{CR}{LF}
Vânia;Braziela;1981;Vila Nova de Gaia;4400{CR}{LF}
José;Silva;1972;Crestuma;4415{CR}{LF}
Rui;Barbosa;1975;Lever;4415{CR}{LF}

Which was the base to generate our Flat File schema: “FF_DSYMBOL_PEOPLE.xsd”.

If you paid attention, while we were generating our Flat File Schema, that in the “Select Document Data” window of the BizTalk Flat File Schema Wizard, you can notice that the last line contains a Carriage Return/Line Feed (CR+LF), which means that if you open the file for example in Notepad, you will notice that the file ends with a blank line.

BizTalk Flat File Schema Wizard Select Document Data Page CR-LF

Note: You can check the file “FF_DSYMBOL_PEOPLE.txt” that is available with the source code.

So let us try with a simple change by removing the Carriage Return/Line Feed so that the files do not end with a newline:

Sandro;Pereira;1978;Crestuma;4415{CR}{LF}
Vânia;Braziela;1981;Vila Nova de Gaia;4400{CR}{LF}
José;Silva;1972;Crestuma;4415{CR}{LF}
Rui;Barbosa;1975;Lever;4415

Check the file: “FF_DSYMBOL_PEOPLE WithoutEnterLastLine.txt

If we now try this file instance against our Flat File Schema by:

  • In Solution Explorer, right-click the schema that you want to validate, and then click Properties.
  • In Properties window for the schema, for Input Instance Filename enter the name and path for the “FF_DSYMBOL_PEOPLE
    • Check more about Schema File Properties here.
  • WithoutEnterLastLine.txt” message instance, or browse to the file, select it, and then click OK.
  • Right-click the schema that you want to validate, and then click Validate Schema

You will notice that we will get the following error:

Invoking component…
<path>\DemoFile\FF_DSYMBOL_PEOPLE WithoutEnterLastLine.txt: error BEC2004: Unexpected data found while looking for:
‘\r\n’
The current definition being parsed is Person. The stream offset where the error occured is 133. The line number where the error occured is 4. The column where the error occured is 23.
<path>\FF_DSYMBOL_PEOPLE.xsd: error BEC2004: Validate Instance failed for schema FF_DSYMBOL_PEOPLE.xsd, file: <file:///path\FF_DSYMBOL_PEOPLE WithoutEnterLastLine.txt>.
Component invocation succeeded.

Visual Studio Flat-File Instance Validation Failed

If, on the other hand, we had originally generated the Flat File Schema using the message instance without the Carriage Return/Line Feed at the end.

BizTalk Flat File Schema Wizard Select Document Data Page option

And if we know to try it with an instance containing a blank line at the end of the message we would receive a similar but different error:

Invoking component…
<path>\FF_DSYMBOL_PEOPLE.txt: error BEC2004: Unexpected end of stream while looking for:
‘;’
The current definition being parsed is People. The stream offset where the error occured is 139. The line number where the error occured is 5. The column where the error occured is 0.
<path>\FF_DSYMBOL_PEOPLE_Advance.xsd: error BEC2004: Validate Instance failed for schema FF_DSYMBOL_PEOPLE_Advance.xsd, file: <file:///path\FF_DSYMBOL_PEOPLE.txt>.
Component invocation succeeded.

Visual Studio Flat-File instance Validation Failed

I’ve seen situations/solutions where there were created pipeline components to address and solve these type of problems, which isn’t the best and the proper solution!

In both cases the solution for solving these problems is the same: ensure that the “Allow Message Breakup At Infix Root” property on the Schemas node is set to “true” by:

  • Double-click the schema to open the item in the Schema Editor;
  • In BizTalk Editor, right-click on the Schema node and select the Properties option
  • In Properties window for the schema, set the “Allow Message Breakup At Infix Root” property to “true”
Visual Studio Flat-File instance Validation Failed solved

After making this change, if we try to validate the two previous messages instances, we will notice that now everything will work perfectly.

So if you notice in the properties of our Schema:

  • The “Child Order” property of the People Root Node is set as “Postfix
  • And the “Child Order” property of the “Person” record is set as “Infix

What does it mean? Child Order must always have a value. The possible values are:

  • Prefix: Specifies that the delimiter appears before the data that it is delimiting.
  • Postfix: Specifies that the delimiter appears after the data that it is delimiting.
  • Infix: Specifies that the delimiter appears between different items of data, resulting in one less delimiter than when Prefix or Postfix is chosen.
  • Default Child Order: Sets the child_order attribute to “default”, specifying that the child order specified using the Default Child Order property should be used.
  • And Conditional Default: Removes the corresponding annotation in the XSD representation of the schema, if any, resulting in the following behavior:
    • If the Record node has a tag identifier, then the child order will be prefixed.
    • Otherwise, the child order will be infix.

There are two scenarios related to delimited flat files for which special considerations apply when setting the Child Order property.

  • The first such scenario concerns situations in which the flat-file document has a header, a body, and optionally, a trailer.
  • The second scenario is that this property must be set according to what the runtime components except for the nodes

In our instance message, the data/delimiter sequence in the conceptual loop would merely be one or more Person records. Of course, only in the case where there are more than one person records would there be a delimiter to separate them. In that case, the number of delimiters is one less than the sets of things being delimited, and the delimiters are located between the delimited items in a structure known as Infix.

  • Number delimiter = Number Records – 1

Which is not the case in all our messages instances. Our first instance has the same number of “valid records” and delimiters.

By setting “Allow Message Breakup At Infix Root” property to “true”, this tells the parser whether to allow message breakup and will ensure that the Infix stream is processed properly by the FF Disassembler.

Dealing with Optional Elements

Using Suppress Empty Nodes property

So let’s imagine a different scenario, this time we will deal with optional elements. In our first scenario while discussing with our time we realize that the element “Birthyear” was appearing empty in several messages, and it should be considered as an optional element:

Sandro;Pereira;1978;Crestuma;4415
Vânia;Braziela;1981;Vila Nova de Gaia;4300
José;Silva;;Crestuma;4415
Rui;Barbosa;1975;Lever;4415

By default, we need to have a complete instance of the message, for us developers be able to create all the transformation rules, which means that by default, when we finish creating the Flat File Schema through Flat File Schema Wizard tool, none of the records, elements or attributes are optional.

All of these changes/optimizations/configurations, they have to be made after the Flat File Schema been created.

You can always modify an existing Flat File schema that was created by the wizard (name of the elements, data type and so on). So to accomplish that we need to:

  • Double-click the schema to open the item in the Schema Editor;
  • In BizTalk Editor, expand People root node, expand Person record and right-click in the Birthyear element and select the Properties option
  • In Properties window, set the “Min Occurs” property to “0”

If we now try to validate the previous message instance (“FF_DSYMBOL_PEOPLE WithOptional.txt”), we ended up receiving another error:

Invoking component…
Validation generated XML output <file:///<path>\FF_DSYMBOL_PEOPLE_Advance_output.xml>.
<path>\FF_DSYMBOL_PEOPLE_Advance_output.xml: error BEC2004: The ‘Birthyear’ element is invalid – The value ” is invalid according to its datatype ‘http://www.w3.org/2001/XMLSchema:int’ – The string ” is not a valid Int32 value.
<path>\FF_DSYMBOL_PEOPLE_Advance.xsd: error BEC2004: Validate Instance failed for schema FF_DSYMBOL_PEOPLE_Advance.xsd, file: file:///path\FF_DSYMBOL_PEOPLE WithOptional.txt.
Component invocation succeeded.

Visual Studio Flat-File Instance Validation Failed Element String Not valid

And of course the reason is obvious, an empty string is not a valid integer.

Making this element nillable doesn’t make any difference, because in both situations the parser will continue to add an empty element:

<Person xmlns="">
    <Name>José</Name>
    <Surname>Silva</Surname>
    <Birthyear></Birthyear>
    <Address>Crestuma</Address>
    <ZipCode>4415</ZipCode>
</Person>

To solve that situation there are a few additional flat-file node properties that we can use. You can see the full list here:

And in our case, we are looking for the “Suppress Empty Nodes” property (or suppress_empty_nodes). By setting this property to “True” it will remove all the empty nodes from the XML stream.

  • This can be used to eliminate fields that are empty after being parsed, but the XSD type doesn’t allow empty values.

To accomplish that we need to:

  • In BizTalk Editor, right-click on the Schema node and select the Properties option
  • In Properties window for the schema, set the “Suppress Empty Nodes” property to “true”
Visual Studio Flat-File Schema Suppress Empty Nodes

After making this change, if we try to validate the previous message instance, we will notice that now everything will work perfectly.

Note: When using Delimited files, the optional element in the beginning or in the middle of each line (record node) need always to be present in the text with empty values (;;) otherwise the parsing rules will fail or will perform an incorrect translation of the message.

If the “Birthyear” is set as a string and we receive this type of messages

Sandro;Pereira;1978;Crestuma;4415
Vânia;Braziela;1981;Vila Nova de Gaia;4300
José;Silva;Crestuma;4415
Rui;Barbosa;1975;Lever;4415

The message will be translated incorrectly and there is nothing you can do about it, the end result would look like this:

<Person xmlns="">
    <Name>José</Name>
    <Surname>Silva</Surname>
    <Birthyear>Crestuma</Birthyear>
    <Address>
      4415
      Rui
    </Address>
    <ZipCode>Barbosa;1975;Lever;4415</ZipCode>
  </Person>

Dealing with Optional Elements (last element with the data type string)

Continue in the same scenario/message type, after a further series of tests, we noticed another unforeseen situation. Once again several messages were being executed incorrectly because some of the element was not being parsed the element “ZipCode” – note that this element is a string

Sandro;Pereira;1978;Crestuma;4415
Vânia;Braziela;1981;Vila Nova de Gaia;4300
José;Silva;1972;Crestuma
Rui;Barbosa;1975;Lever;4415

Note: Contrary to what happens with elements at the beginning and end of the message, if the last element is an optional element, then it is not required to appear in the text (see line 3 – it has one element less)

We don’t receive any error message in BizTalk Server runtime but we notice that we messages are being integrated with incorrect data. After further analysis, let’s take the sample above, we notice that the message is being translated incorrectly and the last node is being added to the “ZipCode” element and not in a different Person node as supposed

<Person xmlns="">
    <Name>José</Name>
    <Surname>Silva</Surname>
    <Birthyear>1972</Birthyear>
    <Address>
      Crestuma
      Rui
    </Address>
    <ZipCode>Barbosa;1975;Lever;4415</ZipCode>
  </Person>

To solve this problem we once again need set the element, in this case, “ZipCode” as optional:

  • Double-click the schema to open the item in the Schema Editor;
  • In BizTalk Editor, expand People root node, expand Person record and right-click in the ZipCode element and select the Properties option
  • In Properties window, set the “Min Occurs” property to “0”
<People xmlns="http://DelimitedBySymbols.FF_DSYMBOL_PEOPLE_Advance">
  <Person xmlns="">
    <Name>Sandro</Name>
    <Surname>Pereira</Surname>
    <Birthyear>1978</Birthyear>
    <Address>Crestuma</Address>
    <ZipCode>4415</ZipCode>
  </Person>
  <Person xmlns="">
    <Name>Vânia</Name>
    <Surname>Braziela</Surname>
    <Birthyear>1981</Birthyear>
    <Address>Vila Nova de Gaia</Address>
    <ZipCode>4300</ZipCode>
  </Person>
  <Person xmlns="">
    <Name>José</Name>
    <Surname>Silva</Surname>
    <Birthyear>1972</Birthyear>
    <Address>Crestuma</Address>
  </Person>
  <Person xmlns="">
    <Name>Rui</Name>
    <Surname>Barbosa</Surname>
    <Birthyear>1975</Birthyear>
    <Address>Lever</Address>
    <ZipCode>4415</ZipCode>
  </Person>
</People>

There is a Flat File Schema property call “allow_early_termination” (Early Terminate Optional Fields) that is a Boolean property. The use of this property is this situation is incorrect and does produce any changes in outcome. We will address and explain this property at another time.

Some references:

Related posts:

Download

You can download the source code from:
Flat Files Schemas: Dealing with Files Delimited by SymbolsFlat Files Schemas: Dealing with Files Delimited by Symbols
GitHub

#1 all-in-one platform for Microsoft BizTalk Server management and monitoring
Author: Sandro Pereira

Sandro Pereira lives in Portugal and works as a consultant at DevScope. In the past years, he has been working on implementing Integration scenarios both on-premises and cloud for various clients, each with different scenarios from a technical point of view, size, and criticality, using Microsoft Azure, Microsoft BizTalk Server and different technologies like AS2, EDI, RosettaNet, SAP, TIBCO etc. He is a regular blogger, international speaker, and technical reviewer of several BizTalk books all focused on Integration. He is also the author of the book “BizTalk Mapping Patterns & Best Practices”. He has been awarded MVP since 2011 for his contributions to the integration community.

3 thoughts on “BizTalk Server: Teach me something new about Flat Files (or not) – Delimited Flat File Schema Optimizations”

  1. Hi Sandro,
    I tried your article of flat file to XML and it works fine.
    Do you have a example of other way round whic is converting XML to CSV,
    I am new to Biztalk so some pointer will be helpful.

    Thanks
    S

  2. Hi Sandro,

    Brilliant article! Could you also let us know a way to get 0 or any other char in place of missing characters in fixed-length columns?

    like if a column width is 5 and the data coming is 12, so the output should be 00012

Leave a Reply

Your email address will not be published. Required fields are marked *

turbo360

Back to Top