This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Strange routing problem - Hosted in ESXi environment

Hello

I have uncovered a strange issue in routing (or lack of it). First let me describe my environment -

XG, Windows server are hosted on ESXi.

Windows server has one NIC, connected to vswitch0 - port group 0, called Internal.

XG has 2 NIC, one connected to Internal, and the other connected to port group 1, called external. XG uses pppoe to get internet. It uses IPsec to connect to two other remote sites (Mikrotik and TMG).

These two NIC are physical NIC (2 port intel ethernet card). One port is connected to internal (so my other laptops etc can connect) and one port is connected to Fiber box(XG dialing into pppoe). All works well as intended.

 

Now, after establishing IPSEC, I can ping to remote sites if I am pinging from other computers, which are NOT hosted by ESXi. Eg, from my laptop (XG is default gateway), I can ping the remote computers. The remote computers can ping me.

The problem -For whaterever I try, I cannot ping from ANY (I tried installing experimental linuxes also in ESXi) of the ESXi hosted computers (same network - internal) to any of the remote sites. Ofcourse I have checked the usual suspects such as default gateway etc - I am ok in the basics of networking - so usual suspects taken care of). The remote computers CAN ping the esxi hosted computers, so its a one way ping. (makes me think XG is somehow blocking ping from windows host)

 

XG packet capture, logs DONOT show anything. So how do I troubleshoot. Is this ESXi issue or XG issue? Should I install packet capture in Windows host(I doubt it will help). The packet capture in XG seems to be useless(just like logging).

Also, I tried the promiscous, forged MACs and other things in ESXi. No change.

Tracert from windows (in ESXi)shows that packets are sent to XG, but XG is silently dropping those packets without logging. XG has no problem in routing correctly if the packets originated outside the esxi host. This is ONLY for IPSEC tunnels. I can access XG ui in any case (from ESXi hosted or outside ESXi). 

 

Thank You



This thread was automatically locked due to age.
  • Replying to myself - using pktcap is esxi seems to capture only the packets coming into the port of the VM.

    Is there a way to capture packets which are going OUT of the port.

  • Ok - Confirmed it. XG is dropping it.

     

    Here is the packet capture from my internal machine using XG (192.168.39.1) as GW.

    (switchport below is XG switchport)

     

    pktcap-uw --switchport 67117068 --proto 0x01 --capture PortOutput

    000c 2920 b357 000c 2951 1daa 0800 4500
    003c f52a 0000 8001 49eb c0a8 27f9 c0a8
    5261 0800 3a81 0001 12da 6162 6364 6566
    6768 696a 6b6c 6d6e 6f70 7172 7374 7576
    7761 6263 6465 6667 6869

     

    So, 192.168.39.249(my site) is pinging 192.168.82.97 (site1). XGs NIC is receiving this ping (as the capture is from ESXi), and then putting this packet in blackhole because there is no trace of this packet inside XG.

     

    What is so special in this configuration? How XG even knows this packet is originating from ESXi?

  • Hello Nitin,

    do you have (by any chance) exchanged internal and external port in your XG config?

    AND/OR did you put both physical NICs into the same switch?

  • BTW, and not meant to be impolite:

    I would NEVER state things like these when troubleshooting is going on;

    "Ofcourse I have checked the usual suspects such as default gateway etc - I am ok in the basics of networking - so usual suspects taken care of)."

    This has the assumption, that we should not take care about these questions, although something really goes wrong in your setup.

    First thing I teach my apprentices is "Ask questions, never make assumptions!"

  • Yes I have!!

    This XG setup was a physical setup a year ago.

    I moved it to ESXi later and restored the backup.

    Also, when reassigning switches in ESXI, I found myself putting both LAN and WAN into the same switch(migrated the standard switch to distributed switch using vsphere wizard).

    Corrected it a bit later.

     

    Arp is usually short lived, so that can be a problem?

    Thank you

  • Found the bug.

    Finally.

     

    Was able to capture port address that XG is using. Found that XG is using 4444 port to reply back!!

    This is definitely a bug. XG admin console resides on 4444.

    Switched the admin console to 4446, and everything is OK!!!

     

    So XG team - Why are you using 4444 port when it was marked as admin console?