Friday, June 19, 2015

Juniper SRX Vpn Debugging with "gateway-filter"


After a support session with jtac, I learned there's a hidden srx configuration to debug single vpns:

admin@srx# show security ike traceoptions
file vpn-trace size 10m;
gateway-filter {
    local-address x.x.x.x;
    remote-address y.y.y.y;
}

level 15;

This is equivalent to the operational command

request security ike debug-enable local x.x.x.x remote y.y.y.y

The jtac engineer used this command to debug an ipsec issue where SA's existed in the management plane but not in the data plane. Following command was used to list the dataplane SA's:

run request pfe execute command "show usp ipsec sa" target fwdd 

A security flow trace of the faulting session reported an unrelated error about missing traffic selectors:

Jun 17 18:47:04 18:47:04.409145:CID-1:RT:flow_ipv4_rt_lkup success 192.168.122.12, iifl 0x50, oifl 0xb8

Jun 17 18:47:04 18:47:04.409168:CID-1:RT:  routed (x_dst_ip 192.168.122.12) from srxhq (reth0.1001 in 1) to st0.108, Next-hop: 192.168.122.12

Jun 17 18:47:04 18:47:04.409185:CID-1:RT:flow_first_policy_search: policy search from zone srxhq-> zone OMP-VPN (0x0,0x2ee0001,0x1)

Jun 17 18:47:04 18:47:04.409205:CID-1:RT:Policy lkup: vsys 0 zone(8:srxhq) -> zone(13:OMP-VPN) scope:0

Jun 17 18:47:04 18:47:04.409205:CID-1:RT:             192.168.213.2/2048 -> 192.168.122.12/19053 proto 1

Jun 17 18:47:04 18:47:04.409212:CID-1:RT:  app 0, timeout 60s, curr ageout 60s

Jun 17 18:47:04 18:47:04.409212:CID-1:RT:  permitted by policy allow_any(6)

Jun 17 18:47:04 18:47:04.409294:CID-1:RT:  packet passed, Permitted by policy.

Jun 17 18:47:04 18:47:04.409294:CID-1:RT:flow_first_src_xlate:  nat_src_xlated: False, nat_src_xlate_failed: False

Jun 17 18:47:04 18:47:04.409294:CID-1:RT:flow_first_src_xlate:  incoming src port is : 750.

Jun 17 18:47:04 18:47:04.409294:CID-1:RT:flow_first_src_xlate: src nat returns status: 0, rule/pool id: 0/0, pst_nat: False.

Jun 17 18:47:04 18:47:04.409294:CID-1:RT:  dip id = 0/0, 192.168.213.2/750->192.168.213.2/750 protocol 0

Jun 17 18:47:04 18:47:04.409294:CID-1:RT:(flow_first_get_tun_info) Valid IP, using IP from session

Jun 17 18:47:04 18:47:04.409394:CID-1:RT:  Doing IPSec traffic-selector match for  192.168.213.2 -> 192.168.122.12

Jun 17 18:47:04 18:47:04.409394:CID-1:RT: Did not find traffic-selector enabled nsp_tunnel for  st0-ifp st0.108. Finding non-traffic-selector nsp_tunnel

Jun 17 18:47:04 18:47:04.409394:CID-1:RT: get_nsp_tunnel - Tunnel not found. if st0.108, nexthop ip 0xbadc0ded, policy id 6

Jun 17 18:47:04 18:47:04.409394:CID-1:RT:  packet dropped, outgoing tunnel missing in tunnel-if

Jun 17 18:47:04 18:47:04.409394:CID-1:RT: outgoing tunnel missing in tunnel-if st0.108

Jun 17 18:47:04 18:47:04.409394:CID-1:RT:flow_first_get_out_ifp: get tunnel_info failed!

Jun 17 18:47:04 18:47:04.409394:CID-1:RT:flow_initiate_first_path: first pak no session

Jun 17 18:47:04 18:47:04.409394:CID-1:RT:  flow find session returns error.

Jun 17 18:47:04 18:47:04.409394:CID-1:RT: ----- flow_process_pkt rc 0x7 (fp rc -1)


It seems this issue cannot be resolved without impact (commit full, reboot, delete ike/ipsec).

Update: according to the support case this issue should be fixed in the next srx firmware release: 12.1X46-D40 


5 comments:

Hisham said...

Hi Thomas !

Thanks for this post, wanted to ask how did you resolve the issue ? Did you upgrade Junos or did you configure a new st0.x interface instead of the old one ?

Also, did you ever know the reason behind the SA being absent from the data plane ?

Anonymous said...

Ran into the same issue here with an SRX running 11.4R6.6. I've tried clearing the security-associations and removing and then re-adding the IPSec settings. Next up is a commit full. If that doesn't work a reboot is the only thing left?

Anonymous said...

I struggled for MONTHS with this issue!
Juniper support was also involved and collected a vast amount of data (log files, console dump files...) although knowing I was on 12.1X44 and 12.1X46-D40 was already released!
So they escalated my case to the next level. This engineer had also no idea what was going on and I continued googling and found this blog. And at the end there's this note that said, that this issue will be fixed with version 12.1X46-D40!
So, I gave it a try and guess what - all VPN tunnels were up and running for two weeks now!!!

Why on earth do have to stumble across this blog?! Shouldn't this info be given by??

Regards
Andy

Anonymous said...


Shouldn't this info be given by JTAC??

Glad I found this blog! Saved me _A_LOT_ of time!


Cheers
Andy

Anonymous said...

thx for this info,
had this issue on srx240, fw 12.1x44.