This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

[BUG] XG v16/17 PPPOE doesn't endlessly tries to reconnect

Hi All,

i have a weird issue concerning clients with xDSL lignes, with a modem bridged and connected to wan port of the XG.

There are more and complains about internet failures (different clients, different DSL providers, different modems), and in most of the cases the pppoe Wan port status is "Disconnected".

Just by clikcing on "connect" connection goes up immediatly !

Is there a way to force the XG to retry enlessly to reconnect ?! (which should be a basic thing !!!!)

Thanks !



This thread was automatically locked due to age.
Parents Reply Children
  • Hello,

    I also had this problem yesterday with a Sophos XG 135 running Version SFOS 17.5.12 MR-12.

    The WAN connection was in status "disconnected" and didn't try to reconnect. After simply clicking "Connect" the connection was established in a few seconds without a problem.

    Why is this obvious problem not fixed yet?

    Best regards,
    Peter

  • Hey Guys,

     

    Everybody is frustrated by this issue, and I know that Sophos is taking the problem seriously, especially with the requests for specific case numbers and direct escalations to support.

     

    But also understand that it is still an intermittent problem, in that there is no clear and definitive way to make the issue repeatable. This always makes determining the cause of the issue harder to discover.

    Unfortunately until a unit is discovered that can repeat the issue in a definable way or timeframe, support will need to look for similarities with unit operations or log messages to try and determine the root cause of the issue.

    Some of the issues reported to support have been single or dual occurrences, and weeks apart. I know from the 4 customer machines that I have experienced the issue at, the total number of occurrences are 11 over 6 weeks.

    Best thing to do is continue advising support of every time a unit fails, as quickly as feasible, and enable support access for investigation. And maybe one of the test units will exhibit the issue with a secondary connection which will allow direct debug access with a unit in a stopped condition.

    Regards,

    Gavin Daniels. DipIT(Networking)

     

     
  • hello  

    From our side the problem is easily repeatable...

    you just have to activate a pppoe link on a port, plug it to anything, even a switch, to activate the port, and wait for the status to be disconnected, after a day or so !

  • Hi Gavin,

    I think Sophos is not taking this problem serios enough.
    If you have a look since when people are reporting this issue, you will agree.

    It would be a lot easier if somebody could provide us further informations about the status of this case, because this thread is open since 2018!
    Its easy to reproduce this Problem, a lot of people mentioned it right now, but again:


    Establish a PPoE Gateway Interface.
    Take down the connection (but dont take down the link between xg and the modem, you could pull out the fiber cable on the modem side going to the ISP).
    Hold it for about 1-2 hours and then establish the connection again (pull the fiber cable in the modem).
    As you can see now, the interface wont reconnect automaticaly, you need to press "Connect now" on the PPoE Interface.
    Now the Firewall is reconnecting and the gateway connection is established.

  • Hello,

     

    While that test scenario produces an error, for the units I have had the issue with, this is not the error.

    On one unit which had exhibited the fault, 2 weeks ago they had maintenance works done on the fibre. It was offline for over an hour and reconnected perfectly.

     

    All of the sites I have had issues with, there is no noticeable outage of link service. Even the site with a DSL connection did not display a link fault first.

     

    This test method you are using may be hitting some other induced timeout state due to number of unsuccessful connection attempts before an extended delay is introduced.

     

    A test where there may be a very short packet loss or corruption which breaks the ISP side of the PPPOE but the Sophos appears to miss would be more likely, or a packet loss and due to some other process causes missed CPU cycle or interrupt handling which sees the PPPOE in an unknown state, and whatever watchdog service does not pick this up.

     

    I know that this is an issue that some other users have been having for over 12 months, which does make the fact that there has not been further investigation earlier a problem. But there has been a dramatic increase in the occurrences over the last 2 months. And not of the original units becoming more unstable, but with more units exhibiting the issue.

    And when you have units which have not experienced the issue for the year plus that they have been installed, and do not have firmware updates and now experience the issue, points to an underlying condition that has been made worse by recent hot fix deployments.

    I believe that the SQL injection hotfix is somehow tied to the error. My customer units did not have the issue prior to this, but have experienced it since. While this fix is not the actual issue, as people have had the problem before this was known, I think it is somehow tied to the additional processing work done by the Sophos in constantly checking and reporting on this.

     

     

    Regards,

    Gavin Daniels. DipIT(Networking)

     

     
  • well well well

    Days are going by without any news of Sophos support...

    Is anyone still in charge or maybe it would be wyse to remove form XG this very complex and innovative feature (yeah yeah i speak about the pppoe)...