Transformations are one of the most common components in the integration processes. They act as essential translators in the decoupling between the different systems to connect. We usually associate the transformations of documents with BizTalk Server Maps, but the reality is that there are two types of transformations:
- Semantic Transformations: This type of transformation usually occurs only in BizTalk maps. Here the document maintains the same represented syntax (XML) but changes its semantics (data content). This type of transformation is typically one-way since when we add and aggregate small parts of the information that compose the document into another different document, we may miss important details for its reconstruction.
- Syntax Transformations: This type of transformation occurs in the receive or send pipelines and aims to transform a document into another representation, e.g., CSV to XML. Here the document maintains the same data (semantics) but changes the represented syntax. i.e., we translate the document, but typically don’t modify the structure. Usually, this type of transformation is bidirectional. Since we still have the same semantic content, we can apply the same transformation logic and obtain the document in its original format. Common examples of these transformations are also conversions between HL7 and XML or EDI and XML.
Sometimes also called Data transformation and Data translation, for that order.
This blog is an introductory note for those taking the first steps in this technology.
What are Flat Files?
One of the most ancient and common standards for message representation is using text files (Flat Files) like CSV (Comma Separated Values) or TXT files, many of which are custom-made for their systems.
However, over time, XML and, nowadays, JSON have become the standard message format because of their widespread use by major corporations and open-source development efforts. However, do not be fooled and think these messages are outdated and rarely used. A good example is EDI messages, which are used extensively by large companies, so it is often necessary to transform text files into XML and vice versa.
In the context of Microsoft BizTalk Server, a flat-file instance message is a text file that can contain three logical parts:
- A header.
- A body.
- And a trailer.
In that order. Of course, both the header and the trailer are optional. The following example shows a flat-file instance message consisting of all three parts, with the body in bold type:
Sandro Pereira
Porto, Portugal
PO,1,BOOK,4415
TRANS-1
For the flat file disassembler to correctly distinguish the header, the body, and the trailer of a flat-file instance message, you must create and configure a separate schema for each of them.
Flat File Message Headers
The Flat file disassembler’s parsing of the optional flat-file instance message header is controlled by the flat file schema that you have configured in the Header schema design-time property of the flat file disassembler or the XMLNORM.HeaderSpecName message context property. If you have not specified a schema using one of these two methods, the flat file disassembler assumes that the flat file instance message does not contain a header.
For outbound flat-file instance messages, you can configure the flat file assembler to produce a header by specifying the appropriate schema in its Header Specification Name design-time property or the XMLNORM.HeaderSpecName message context property.
Data found in inbound flat-file instance message headers can be preserved and utilized in two different ways.
- First, flat-file instance message headers can be saved in their entirety within the message context of the body for later restoration as the header of a corresponding outbound flat-file instance message. You can use the recipient pipeline’s Preserve header property to specify that the header should be preserved. And if a header is specified in the Flat file assembler, the preserved header will be used on the outbound message.
- Second, individual data items from a flat-file instance message header can be copied to the message context associated with the flat-file message body by specifying property promotion for one or more of the fields in the corresponding schema.
Flat File Message Bodies
A flat-file instance message body, which is required, is what the Flat file disassembler processes into one or more XML instance messages. To know what data to expect in an inbound flat-file instance message body, you must configure the Flat file disassembler with the flat file schema corresponding to the body. You can specify the schema by using the Document schema design-time property of the flat file disassembler or the XMLNORM.DocumentSpecName message context property. Because flat file instance messages must have a body part, you must configure the appropriate schema using one of these two methods.
For outbound flat-file instance messages, the Flat file assembler can dynamically determine the appropriate flat-file schema for the body of the instance message. The Flat file assembler determines the appropriate schema from the message type, which is a combination of the target namespace and the root element’s name, both of which must be present in the XML version of the outbound message. Alternatively, you can explicitly configure the flat-file schema to be used by configuring the Document schema design-time property of the Fflat file assembler or the XMLNORM.DocumentSpecName message context property.
Data found in inbound flat-file instance message bodies can be copied to the corresponding message context by specifying property promotion in the flat-file schema being used by the Flat file disassembler to process the inbound instance message. Likewise, data in the message context can be copied back into outbound flat-file instance messages by specifying property demotion in the flat-file schema being used by the Flat file assembler to process the outbound message.
Flat File Message Trailers
As with flat-file instance message headers, the parsing of the optional flat-file instance message trailer by the Flat file disassembler is controlled by the flat file schema that you have configured in the Trailer schema design-time property of the flat file disassembler or the XMLNORM.TrailerSpecName message context property. If you have not specified a schema using one of these two methods, the Flat file disassembler will assume that the flat file instance message does not contain a trailer.
Unlike flat-file instance message headers, flat-file instance message trailers can neither be saved and restored as a single unit nor can they use property promotion to copy individual data items to the message context associated with the flat-file instance message body. However, a trailer can be added to an outbound flat file instance message by specifying the appropriate schema in the Trailer schema design-time property of the flat file assembler or the XMLNORM.TrailerSpecName message context property. The data that constitutes the variable portion of the trailer can be specified using property demotion from the message context of the flat-file instance message body or by specifying default or fixed values in the corresponding schema.
Flat-File Schema Types
Within a particular part of a flat-file instance message, different data items are grouped into records, which themselves can contain sub-records and, ultimately, individual data items known as fields. These records and fields are distinguished from each other using one of two different basic methodologies.
- The first methodology, known as positional, defines each data item as a pre-established length, with pad characters being used to bring a shorter item of data up to its expected length.
- The second methodology, known as delimited, uses one or more special characters to separate items of data from each other. This methodology avoids the need for otherwise superfluous pad characters, but introduces some special considerations when the data itself contains the character or sequence of characters being used as a delimiter.
Positional Flat Files
Positional records within a flat-file instance message contain individual fields (items of data) that are each of a predefined length. The fields are parsed according to these lengths. For example, consider the following positional record from a flat-file instance message containing an id, country code, client name, and Country name:
01 PT Sandro Pereira Portugal
A reasonable definition for this record in a flat-file schema can be described as follows:
- A positional record named Client contains the following fields:
- An attribute named id that is left-aligned, 3 characters in length, with a zero character offset.
- An element named countryCode that is left-aligned, 3 characters in length, with a zero character offset.
- An element named name that is left-aligned, 37 characters in length, with a zero character offset.
- An element named country that is left-aligned, and the length is until the end of the line.
Given these record and field definitions, the Flat file disassembler will produce the following XML equivalent of this record:
<Client id=01 ">
<countrCode>PT </countrCode>
<name>Sandro Pereira </name>
<country>Portugal</country>
</Client>
There are several considerations related to positional records that will affect how the record is parsed when received and constructed when sent, including:
- The character used to fill the unused portion of each field, known as the pad character.
- An optional tag within the record can be used to distinguish the record from other similar records. Tags usually occur at the beginning of the record but are allowable anywhere within it. Positional records can be defined to have a tag or not have a tag, but once defined, the tag must be present or not, based on the definition.
- How data is justified within a fixed length field relative to the accompanying pad characters.
- Positional records nested within other positional or delimited records.
- Positional records with field lengths specified as a specific number of bytes rather than a specific number of characters.
Notes:
- If your flat file contains both delimited and positional records, you must set the Structure property of the root node to Delimited and the Structure property of subordinate record nodes to either Delimited or Positional as appropriate.
- Fields in positional records have a limit of 50000000 characters.
Delimited Flat Files
Delimited records within a flat-file instance message contain nested records and/or individual fields (items of data) that are separated by a predefined character or set of characters. The fields are parsed according to these separating delimiters. For example, consider the following delimited records from a flat-file instance message, which contain three client lines to add to our internal system hypothetically:
Sandro;Pereira;1978;Crestuma;4415
José;Silva;1972;Crestuma;4415
Rui;Barbosa;1975;Lever;4415
A reasonable definition for this record in a flat-file schema can be described as follows:
- A delimited repeating record named Client with child delimiter {CR}{LF}
- And delimited elements with child delimiter ;
- firstName
- lastName
- birthYear
- city
- zipCode
Given these record and field definitions, the Flat file disassembler produces the following XML equivalent of these records.
<Client>
<firstName>Sandro</firstName>
<lastName>Pereira</lastName>
<birthYear>1978</birthYear>
<city>Crestuma</city>
<zipCode>4415</zipCode>
</Client>
<Client>
...
</Client>
...
There are several considerations related to delimited records that will affect how the record is parsed when received and constructed when sent, including:
- The character or characters are used to override the interpretation of delimiters so that they are treated as part of the data.
- An optional tag at the beginning of the record can be used to distinguish the record from other similar records.
- How data is justified within fields with minimum lengths relative to the accompanying pad characters.
- Positional records nested within other delimited records.
- How data is justified within a fixed length field relative to its accompanying pad characters.
Preservation and suppression of delimiters when flat-file messages are received and sent.
Notes:
- If your flat file contains both delimited and positional records, you must set the Structure property of the root node to Delimited and the Structure property of subordinate record nodes to either Delimited or Positional as appropriate.
- Delimited fields in flat files have a limit of 50000000 characters.
How does the text files (Flat Files) are processed by BizTalk?
Internally, BizTalk “prefers” to use the message type XML. If messages are in XML format, BizTalk “offers” numerous automatisms that are very useful in these environments, such as message routing based on a particular field (promoted property), tracking and analysis of multidimensional values and dimensions with BAM (Business Activity Monitoring), or making logical decisions within orchestrations (business processes) using elements of the message.
If messaging is the foundation of BizTalk Server, the message schemas are the bedrock on which messaging is built. Fortunately, BizTalk supports converting text files to XML simply and intuitively using Flat File Schemas, which are simple XML schemas (XSD) with specific annotations. At first glance, this may seem strange because XML Schemas (XSD) are used to describe XML files. However, BizTalk uses them as metadata to describe XML documents and text files (flat files).
The trick is that all the necessary information, such as the delimiter symbols or the element size in a positional file, i.e., the definition of the rules of parsing (transformation rules), are embedded in the form of annotations in XML Schema (XSD), thereby simplifying the reuse of all these schemes in different parts of the process. The document can be translated back into a flat file at any point because the definition is declarative and symmetric.
Where can syntax transformations occur?
This type of transformation – Syntax Transformations – can occur in receive or send pipelines. Usually, text files (Flat Files) are processed at runtime as follows:
- The Flat Files are received by an adapter associated with a receive location (Folder in File System, for example).
- A pipeline configured in the receive location will be responsible for transforming the Flat File into its equivalent XML.
- One or more interested in the message, such as orchestration, will subscribe to the XML document, and this message will go through the business process. Note in a pure messaging scenario, orchestrations are unnecessary.
- If and when necessary, BizTalk can send XML messages again as text files (Flat Files) by using another pipeline in the send ports, which will be responsible for transforming the XML into its equivalent, the Flat File.
As the image below shows:
The receive pipeline consists of four stages, being that syntax transformations may occur in two of them:
- Decode Stage: This stage is used for components that decode or decrypt the message. The MIME/SMIME Decoder pipeline component or a custom decoding component should be placed in this stage if the incoming messages need to be decoded from one format to another. The syntax transformations can occur in this stage through a custom component.
- Disassemble Stage: This stage is used for components that parse or disassemble the message. The syntax transformations should occur at this stage. In the example that will be demonstrated in this article, we will use the “Flat file disassembler” to transform a text file into XML.
- Validate Stage: This stage is used for components that validate the message format. A pipeline component processes only messages that conform to the schemas specified in that component. If a pipeline receives a message whose schema is not associated with any component in the pipeline, that message is not processed. Depending on the adapter that submits the message, the message is either suspended or an error is issued to the sender.
- Resolve Party Stage: This stage is a placeholder for the Party Resolution Pipeline Component.
Regarding the send pipelines, they consist of three stages, being that syntax transformations may also occur in two of them:
- Pre-assemble Stage: This stage is a placeholder for custom components that should perform some action on the message before the message is serialized.
- Assemble Stage: Components in this stage are responsible for assembling or serializing the message and converting it to or from XML. The syntax transformations should occur at this stage.
- Encode Stage: This stage is used for components that encode or encrypt the message. Place the MIME/SMIME Encoder component or a custom encoding component in this stage if message signing is required. The syntax transformations can occur in this stage through a custom component.
Hope you find this useful! So, if you liked the content or found it useful and want to help me write more content, you can buy (or help buy) my son a Star Wars Lego!