Today I want to talk about a very common problem that can occur when we are invoking BizTalk Orchestrations exposed as synchronous services (Request-Response Receive ports):
“System.Net.WebException: The operation has timed out”
In my case, I was trying to invoke an orchestration exposed as a WCF service. And of course, this can be a very straightforward problem, most of the time easy to detect and probably also to fix… but sometimes BizTalk tries to play with us and throw a few good surprises…
Typical Causes and Solutions
Typically this problem is usually associated with network problems or lack of Error Handling inside Orchestrations:
- You are trying to invoke an external system inside the orchestration and it takes too much time to respond and naturally, we get a timeout error.
- The SOAP.ClientConnectionTimeout property can be used on a Web service that takes a long time to return a response to try to solve or address this problem.
- You can have a High Volume of requests and you can overload the external service with too many concurrent calls or you can have limits of max connections to a certain address and will naturally this can cause affect the performance and probably will take too much time to respond.
- There’s a nice post from Richard Seroter on How to avoid Service Timeouts In High Volume Orchestration Scenarios but you can find several ways to solve this address.
- This error can also occur when the BizTalk performance gets degraded and it starts responding slowly. If the BizTalk jobs are not well configured and running the size of BizTalk Databases can grow extremely and respond very slow and we get stuck with this error.
- You need to validate the jobs and the database and if necessary you need to configure the jobs and clean the databases with the Terminator tool and probably shrinking also the databases to resolve the problem.
- Also you can get this error because you don’t want to handle errors inside orchestrations, so for example if you don’t handle WCF Fault messages from your external service or you are invoking a C# code and it raises an exception, and you are not handling these situations in your orchestration, the orchestration will be suspended and you will get a timed out exception.
- You can prevent this by handling errors inside orchestrations. See BizTalk Training – Handling Exceptions inside orchestration
Nothing fits in the problem. What can it be?
But … and when all this is checked and doesn’t fit in the problem that is happening. What can it be? Before explain let me describe my scenario:
- I have a simple demo service that receives a small message, invoke an external service and return the message to the source system.
- Because of API limitation, I decided to invoke these external services with C# and control Exception inside orchestrations.
- I know by using HAT and debugging in DEV environment that the external service was giving me a known error that I was controlling and throwing the error in order to create a response with the error description, inside the orchestration, to be returned to the original system.
So nothing to fancy and very simple stuff. However every time I tried to test the orchestration was stuck in the message box in suspended state… the external service was invoked, the error was a catch in the code, logged and the exception was raised for the orchestration and after that nothing … the engine seemed unaware and did not understand what was happening, crazy I know.
When I investigated the Event Viewer to try to find more information I found this description:
“xlang/s engine event log entry: Uncaught exception (see the ‘inner exception’ below) has suspended an instance of service ‘MyOrchestration(728766c9-9df2-609b-004e-fa2e7c3079c4)’.
The service instance will remain suspended until administratively resumed or terminated.
If resumed the instance will continue from its last persisted state and may re-throw the same unexpected exception.
InstanceId: 4dfd6041-d52f-4110-8d0d-92efc48f0c38
Shape name: InvokeExternalService Shape
ShapeId: eff276fd-f289-4778-ba47-ff66309ae8c6
Exception thrown from: segment 2, progress 8
Inner exception: Fault Response: My Error Description”
The first thing I thought was that I incorrectly published some resource (DLL) … but after validating and publishing the solution again, the problem prevailed.
Cause
The Orchestration Designer can play some tricks to developers. And be very careful when you copy shapes from one orchestration to others!
What I did was open a similar solution and copy the main scope (body of the orchestration) to a new solution that I had created. This will also copy all the shapes inside the scope… and of course, I change the shapes to fit my new requirements, by deleting some shapes and change the code inside others… Why? To be faster and not lose much time creating the main sketch of the orchestration.
But be aware that the Orchestration design doesn’t like some of these operations (for me it’s a bug inside Orchestration design), and for some reason, the designer doesn’t interpret well the shapes (“refactoring” or the “graphical interpretation”). It compiles well the solution but in runtime, we get stuck with the error “The service instance will remain suspended until administratively resumed or terminated. If resumed the instance will continue from its last persisted state and may re-throw the same unexpected exception.”
I don’t know if some also detect this behavior before but I already experience this twice.
Solution
To solve this strange behavior you need to redesign the same Orchestration flow by:
- Dragging new shapes to the Orchestration design and copy the exact same code inside the existing shapes to the new ones… you can also give the same names!
- At the end delete the existing shapes (which had been copied).
- Compile and deploy the project again.
Without doing anything more this solved my problem.
Hi Sandro,
I mentioned this “broken copy” behavior of the Orchestration Editor with Send shapes.
Also the Port types in the Orchestration View do not show the full qualified .NET name. And when we copy them they save the old names (names with previous orchestration name prefix in it, something like that). And we got a naming problem.
I worked this out to look to the Xlang orchestration code and check the full names.
All this the bugs, 100%
Hi Leonid,
Nice to know and thanks for the comment!
I like the name you gave it and we can call this the “broken copy” bug 😀
Thanks Sandro, this information its very clean.