This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Sophos Firewall Web Protection Unable To Process Cyrillic Domain Names

Hello Everyone,

We've had requests to block websites that contain Cyrillic characters, however whenever I attempt to do so in the Sophos interface it states it's an invalid URL.

To avoid posting the full domain of the malicious site, an excerpt would be:

https://[domain].ком.рус

We're seeing more and more of this now, and I'm concerned if the XG cannot process these characters for something as simple as a website/URL, what else could the product be effectively unable to interpret?

Many Thanks



This thread was automatically locked due to age.

Top Replies

  • in reply to rfcat_vk +1 verified

    Spoke with Sophos' support team and after a lot of testing they decided the best approach was to either block the IP address (doesn't really solve the issue), or to ping the Cyrillic domain name, and determine the translated domain name, which in the case above comes to: xn--80akhqdddqo.xn--j1aef.xn--p1acf

    I'll be honest, I don't really understand it. Probably something to do with Unicode, ASCII, etc. I still wouldn't consider this a true fix, but maybe that's not Sophos' fault.

    Jump to answer

  • User enters in a cyrillic domain name into the address bar of the browser.
    The browser converts to punycode and does a dns lookup for the punycode domain name.
    The browser makes a connection to ip of the far server passing in an SNI that is punycode.
    The proxy/DPI mode compares the punycode SNI versus any rules.
    The browser makes an HTTP request with a Host: that is puncode.
    The proxy/DPI mode compares the punycode request versus any rules.

    The browser displays cyrillic.

    AFAIK all logging and block pages will have the punycode

    "Under the hood" everything is punycode - all the underlying RFC and specs are ASCII only.

    To take another example...  some browsers hide the http:// or https:// from the address bar.  Some even (more common on phones) hide the path in the address bar (unless you click into it).  If you put in UPPERCASE in domain names, most browsers convert to lowercase in traffic (and sometimes address bar).  What browsers display in the address bar and what they send over the TCP connection are sometimes different.