VPN analysis¶
Given the fact that Envoy Mobile utilizes raw BSD sockets for performing API calls today (investigations into which have been detailed in #13), we wanted to validate the behavior of the library when working with VPNs.
In order for us to consider Envoy Mobile to be “working properly” with respect to VPNs, it needs to:
Send all traffic over the VPN if enabled when the library starts
Send all new requests over the VPN if the VPN is enabled after the library starts
Properly recover from dead VPN connections if the VPN is disabled after the library starts
Mirror this behavior on both iOS and Android
Investigation¶
Experiment¶
The following approach was taken to experiment with VPN connections on both iOS and Android:
Start a service running Envoy proxy
Create a mobile app that performs requests to the service using Envoy Mobile
In our case, we used a man-in-the-middle proxy to observe requests/responses between the two, but this could also be accomplished via logging on the service
Open the app running Envoy Mobile with a VPN disabled
Note the
x-forwarded-for
header (or thex-envoy-external-address
header which should be the same) of requests from the clientEnable a VPN on the mobile device
Monitor the above headers for changes in IP address
Repeat the same for disabling the VPN
On both iOS and Android…
With the above workflow, we observed an initial IP address of 68.7.163.XXX
.
Within a second or two of enabling the Hotspot Shield VPN, requests sent
from the client changed IP addresses to 104.232.37.XXX
- the location of
the VPN servers.
When launching the app with the VPN enabled, all requests were seen as coming from the VPN IP address.
Upon disabling the VPN, some requests failed before switching IP addresses back to the original (non-VPN). This took several seconds (noticeably longer than switching onto the VPN when it was enabled).
Deep dive¶
Enabling the VPN
Our understanding of why Envoy Mobile sends traffic through VPNs the way it does is as follows.
When the library starts up with a VPN enabled, none of its clusters have been used yet. Upon utilizing each cluster, it selects the preferred network to use for establishing a connection. Since the OS is routing all of these through the VPN, Envoy Mobile immediately ends up sending all traffic over the VPN.
However, when a VPN is enabled after the library is already running, any number of clusters in use by Envoy Mobile may already have established connections. Based on what we’ve seen, the OS does not seem to aggressively terminate non-VPN connections when a VPN is enabled, and some requests continue to be made over these pre-existing connections.
Envoy clusters regularly rotate connections in their pools, so the existing non-VPN connections are eventually replaced by new connections which, when created, utilize any active VPN.
Interestingly, since Envoy Mobile typically utilizes 3 clusters (base
,
base_h2
, and stats
), it’s
possible that all clusters have not established connections when the VPN
becomes enabled.
In the experiments we ran, it was merely a matter of a second or two before all new requests typically went through the VPN once enabled.
Disabling the VPN
When disabling the VPN, there was a noticeable delay where requests sent through Envoy Mobile would fail before the library recovered and sent traffic through the non-VPN connection as expected.
Our current understanding of this behavior is that the delay is caused by the OS destroying the VPN connection immediately, at which point Envoy Mobile allows several failures before tearing down its connection and estabilshing a new one.
With trace logs enabled, we saw Envoy Mobile returning some local 503
errors before finally destroying its connection. An example of these logs
is available in this gist.
The reachability findings outlined below suggest that we could potentially mitigate this delay by observing reachability updates from the OS, and work is being tracked in #727.
Reachability
On iOS, we added additional logging to indicate reachability state from
SCNetworkReachability
while enabling and disabling the VPN. During these
transitions, the network remained .reachable
, but oscillated between
having and not having .transientConnection
:
Reachability flags: .reachable] // VPN off
Reachability [.reachable, .transientConnection] // VPN on
Reachability flags: [.reachable] // VPN off
This behavior suggests that we could further optimize Envoy Mobile’s behavior
by switching preferred networks when we detect a change in
.transientConnection
.
When testing enabling a VPN with URLSession
, URLSession
seemed to
switch traffic onto the VPN slightly faster than Envoy Mobile. This can likely
be attributed to the above reasoning.
Conclusions¶
Based on the above investigations, Envoy Mobile handles VPN connections properly for the most part on both iOS and Android.
There is room for improvement as outlined in #727, where the library could potentially more aggressively switch connections based on the OS notifying the library of a VPN enablement/disablement.