This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

UTM incorrectly detects interface error state

Recently added a 4.5 mbps circuit (3 bonded T1's, waiting for 10X10 fiber build out) to an existing 3 mbps circuit and moved a couple of high traffic vpn sites to the 4.5 interface. About once a week, the UTM reports that the new circuit is down (status shows an error condition) and causes the VPN's to failover to the original 3 mbps circuit. This causes some disruption in the RDP sessions running over those VPN's. Typically the service is down for 5 to 10 minutes, then back up.

The vendor shows the circuits clean, no errors. They did however say that they were seeing periods of up to 98% utilization.
What is my best practice configuration for uplink monitoring to ensure that I don't failover unless the circuit is really down? It would be really rare for 3 T1's to go down at the same time. I have adjusted the uplink monitoring to manual and set ping to 60 seconds with 10 second timeout. Should I also consider limiting bandwidth utilization on the circuit to maybe 90%? Or will that make the problem worse?

This thread was automatically locked due to age.

Parents

0 BarryG over 10 years ago

Hi Bob,

If the bonding routers are intelligent, the connection will be degraded, not down, if some of the T1's fail.

We used to have several T1's bonded at my office; unfortunately it was often not noticed for a long time (by the ISP or the resident Cisco 'engineer') that the system was degraded, even when everyone was complaining of slow speeds.
So, one needs to keep an eye on things at the router level, or run speedtests regularly.

If we assume that the whole system would go down, or that degraded mode is unacceptable, then you would want to use the Series reliability formula, which starts at the 3rd formula at
Probability, Reliability & Failure Analysis

You'd add the failure probabilities; so if 1 line has an uptime of 99% (1% downtime), then 3 lines would be 97% (3% downtime).
(This is the same as multiplying the downtime by 3)

If single failure is non-catastrophic, you'd use the Parallel formula on the same page, to calculate the probability of all 3 lines failing. I believe this is related to Bayes' Theorem, which has other explanations elsewhere.

Barry
Cancel
Vote Up 0 Vote Down

Cancel

Reply

0 BarryG over 10 years ago

Hi Bob,

If the bonding routers are intelligent, the connection will be degraded, not down, if some of the T1's fail.

We used to have several T1's bonded at my office; unfortunately it was often not noticed for a long time (by the ISP or the resident Cisco 'engineer') that the system was degraded, even when everyone was complaining of slow speeds.
So, one needs to keep an eye on things at the router level, or run speedtests regularly.

If we assume that the whole system would go down, or that degraded mode is unacceptable, then you would want to use the Series reliability formula, which starts at the 3rd formula at
Probability, Reliability & Failure Analysis

You'd add the failure probabilities; so if 1 line has an uptime of 99% (1% downtime), then 3 lines would be 97% (3% downtime).
(This is the same as multiplying the downtime by 3)

If single failure is non-catastrophic, you'd use the Parallel formula on the same page, to calculate the probability of all 3 lines failing. I believe this is related to Bayes' Theorem, which has other explanations elsewhere.

Barry
Cancel
Vote Up 0 Vote Down

Cancel

Children

No Data