This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Automatically submit unknown websites for categorisation

Good Day,

 

when you have enabled blocking of unknown Websites and you are surfing on unusual websites such as personal websites of small companies or blogs you always have to submit thousands of Websites for categorisation because they are unknown. That's a lot of work and disrupts the workflow.

So it could be a good idea to enable XG to submit unknown/uncategorized/category "None" Websites automatically for categorisation.

I have started a Feature request at Sophos Ideas, and would to encourage you to vote for this feature, because it makes using Webfiltering of XG with restrictive rules so much easier!

Having more unknown websites submitted would also enhance quantity as well as quality of the "Sophos Website Category Database".

https://ideas.sophos.com/forums/330219-xg-firewall/suggestions/35847178-automatically-submit-unknown-web-address-for-categ

Kind regards:

 

 

Dwayne Parker



This thread was automatically locked due to age.
Parents
  • This is in fact already done, in a different manner.

     

    We have the cloud service, we know when the cloud service replies Uncategorized.  The cloud service logs this and we keep a record of what domain are uncategorized AND how often they are being requested per day.  That list is ordered/prioritized so that the most frequently hit uncategorized are at the top.

    We then process the list, trying to categorize each domain, the highest priority first.  We will never make it to the bottom of the list.  The next day a new list comes out and we start again at the top.

     

    The problem is twofold.  This first is one of the "long tail".  The nature of domains and categorization is a bit like Xeno's paradox.  You can categorize 1000 websites and get 50% of traffic.  Categorize 2000 sites and get 75% of traffic.  At some point you'll be categorizing hundreds of thousands of websites and be at 99%.  But that last 1% is still hundreds of thousands of websites.  https://en.wikipedia.org/wiki/Long_tail

    The second problem is localization and frequency.  I live in a town and there is a local beer store with one location and a website.  Lets say there are 20 Sophos customers in the same town, and I visit the site and find out - even though my company blocks Alcohol - this site is uncategorized.  So I browse the site for 15 minutes, buy something, and leave.  I just so happens that day that no other Sophos customer in that city visits that website that day.  That probably generated 3 requests to the Sophos cloud categorization for that domain that came back uncategorized.  Now the cloud servers have billions of requests per day, and the top uncategorized sites have thousands per day.  My little visit that generated 3 requests - that so far down the long tail that it never gets seen.  So I keep visiting it, day after day, and the admin start complaining why in the world Sophos can't get around to categorizing sites properly.  The answer is basically that small sites with very few visits, often related to local companies, don't get enough hits to be categorized.

    So if you know of a site that isn't categorized, and weeks have gone by and it still isn't categorized, it may be a site that it just lost in the noise of a billion other requests.  If it important to you, then you have to tell us by submitting a request.

    Implementing a system where the XG sends a daily report of uncategorized sites to Sophos won't change this problem.

     

    Here is another analogy.  Ever vacuumed a carpet and then found dirt on the floor?  The vacuum is our attempt to categorize and the dirt is uncategorized sites.  Even if we vacuum every day there will still be dirt on that carpet.  Because no vacuum is perfect, because some dirt is tiny and easily missed, because people keep walking in the room and adding more dirt.

     

    That all being said, the squeakiest wheel gets the oil.  Complain about uncategorized sites a lot and you may get more resources applied to the problem.

     

    Note: If you have a web exception that excludes Policy, then categorization is not done on it and will appear in the logs as uncategorized.  We have found that some customers who are using logs to identify that they have huge numbers of uncategorized requests ended up only going to a few domains that were uncategorized due to exceptions.

  • Hi,

     

    thank you a lot for your long reply.

    I understand, categorizing takes a lot of resources, so it isn't possible to categorize all websites.

    But currently submitting websites manually as a sample is very complicated and takes a lot of time, so it would may be an good idea to embed the form for reclassification in the message that appears after trying to access a uncategorized website.

    A better option to make submitting websites easier is adding a button to the message, which automatically submits the address of the uncategorized Website.

     

    Regards,

     

     

    Dwayne Parker

Reply
  • Hi,

     

    thank you a lot for your long reply.

    I understand, categorizing takes a lot of resources, so it isn't possible to categorize all websites.

    But currently submitting websites manually as a sample is very complicated and takes a lot of time, so it would may be an good idea to embed the form for reclassification in the message that appears after trying to access a uncategorized website.

    A better option to make submitting websites easier is adding a button to the message, which automatically submits the address of the uncategorized Website.

     

    Regards,

     

     

    Dwayne Parker

Children
No Data