From a8e3f43370d6275aec2ab2914f32213e036d88ae Mon Sep 17 00:00:00 2001 From: Cathode Ray Dude <46694173+cathoderaydude@users.noreply.github.com> Date: Mon, 14 Oct 2019 15:55:14 -0700 Subject: [PATCH] Create README.md --- README.md | 225 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 225 insertions(+) create mode 100644 README.md diff --git a/README.md b/README.md new file mode 100644 index 0000000..b6954aa --- /dev/null +++ b/README.md @@ -0,0 +1,225 @@ +# sip-ping +A VoIP diagnostic utility, similar to ICMP Echo (ping) but using the SIP protocol + +## Purpose + +SIP Ping is a tool for monitoring a SIP gateway (PBX, SBC, phone) for deep dive diagnostics. Most tools for VoIP monitoring are based on meeting SLA figures and providing general "network availability" statistics. SIP Ping is for granular troubleshooting. + +Say you work in the NOC of a cloud-hosted VoIP provider. A high-profile customer states that every day their entire VoIP network goes down for ten to fifteen minutes. The packet captures from your session border controller confirm the gaps in registration, but ping tests show no loss throughout the day. + +You could be looking at an overloaded or flaky SIP gateway at the customer site, a problem with a network element in your core network or trouble with a firewall at the customer site, among other things. + +A key diagnostic datum is whether SIP traffic is able to transit to and from the customers site. ICMP is not SIP - you need to send real SIP packets, from different locations, and see if data can get through even when registration is failing. SIP Ping does that. + +## Disclaimers & Caveats + +I'm not a professional software developer. I created this tool to assist in my job as a VoIP tech, and as far as I can tell it works for me. I cannot make any assurances about the accuracy and reliability of this tool. + +If you need a serious, enterprise-grade tool, you're free to download and modify SIP Ping. + +Every packet that SIP Ping sends originates from a new source port. I don't know what will happen if you set the interval very low. Depending on your OS and how it handles ephemeral UDP ports, you could end up depleting your port pool. + +## Needs & Can'ts + +SIP Ping needs a working Python 2.x environment. It should work on anything that can provide that, and may work on 3.x. + +The host you're pinging needs to respond to OPTIONS packets. It doesn't matter what it responds with - 200, 403, even 501 are all fine as long as something is returned with the Call-ID intact. + +If you're pinging a device that filters SIP traffic by the domain name (e.g. only accepts packets that look like `From: user@**company.com**`) you may need to use -d to set the correct domain name. + +SIP Ping is UDP-only, it can't do TCP or TLS. + +SIP Ping will not answer any questions about _audio quality_ issues. SIP is not RTP, and the results from this tool will not tell you anything useful about how RTP is treated. +The exception is that if you're seeing very high latency _very frequently_ on pings, it's likely that RTP will not be very healthy - but that's no different than the data you'd get from ICMP ping. + +## Download + +Go to *Releases* to find the latest version. + +## Usage + +Here's all the options, **but I'll talk down below in human words.** + +
usage: sipping.py [-h] [-I interval] [-u userid] [-i ip] [-d domain] [-p port] + [--ttl ttl] [-w file] [-t timeout] [-c count] [-x [X]] + [-X [X]] [-q [Q]] [-S [S]] + host + +Send SIP OPTIONS messages to a host and measure response time. Results are +logged continuously to CSV. + +positional arguments: + host Target SIP device to ping + +optional arguments: + -h, --help show this help message and exit + -I interval Interval in milliseconds between pings (default 1000) + -u userid User part of the From header (default sipping) + -i ip IP to send in the Via header (will TRY to get local IP by + default) + -d domain Domain part of the From header (needed if your device filters + based on domain) + -p port Destination port (default 5060) + --ttl ttl Value to use for the Max-Forwards field (default 70) + -w file File to write results to. (default logs/[ip] - * to disable. + -t timeout Time (ms) to wait for response (default 1000) + -c count Number of pings to send (default infinite) + -x [X] Print raw transmitted packets + -X [X] Print raw received responses + -q [Q] Do not print status messages (-x and -X ignore this) + -S [S] Do not print loss statistics ++ + + +### Installation + +Download **sipping.py** from **Releases**. Now you're installed. For this doc, we'll just assume you have it in /home/fred + +### Use + +The simplest way to use this is: `python sipping.py 192.168.1.10` + +This will start sending OPTIONS packets once every second to 192.168.1.10\. It will wait up to a second for each response. Every five packets loss statistics will be printed to the screen, and the stats on those five pings will be written to /home/fred/sipping-logs/192.168.1.10.csv. + +The output looks like this: + +
dt@shell:sipping$ ./sipping.py 192.168.1.10 -t 40 +Press Ctrl+C to abort +> (15/05/15 02:08:07:230895) Sending to 192.168.1.10:5060 [id: 9437147350] +< (15/05/15 02:08:07:268631) Reply from 192.168.1.10 (37.58ms): SIP/2.0 403 Forbidden +> (15/05/15 02:08:08:270195) Sending to 192.168.1.10:5060 [id: 6142922627] +X (15/05/15 02:08:08:309415) Timed out waiting for response from 192.168.1.10 +> (15/05/15 02:08:09:310890) Sending to 192.168.1.10:5060 [id: 1143193559] +< (15/05/15 02:08:09:348649) Reply from 192.168.1.10 (37.61ms): SIP/2.0 403 Forbidden +> (15/05/15 02:08:10:350181) Sending to 192.168.1.10:5060 [id: 1569110399] +X (15/05/15 02:08:10:389392) Timed out waiting for response from 192.168.1.10 +> (15/05/15 02:08:11:390844) Sending to 192.168.1.10:5060 [id: 3661471762] +X (15/05/15 02:08:11:430038) Timed out waiting for response from 192.168.1.10 + [Recd: 3 | Lost: 2] [loss stats: longest run: 2 length of last run: 1 length of current run: 2 ] + [min/max/avg 37.58/39.13/38.10]+ +In the above, five packets were transmitted, two were received back, and three were lost. The two that responded came back with 403 and latencies of about 38 milliseconds. + +### Options + +Most of the things mentioned there can be changed. For instance, if you're working with a SIP gateway that uses a port other than 5060, you can do: `python sipping.py -p 5080 192.168.1.10`, assuming it's on port 5080. + +To ping at 300ms intervals: `python sipping.py -I 300 192.168.1.10` + +If you don't want to get a message for every single ping sent, do `python sipping.py -q 192.168.1.10`. This mutes the send/receive messages, but **still shows** the statistics summary every five pings. + +Conversely, if you don't like the stats: `python sipping.py -S 192.168.1.10` + +If you need to know if an ALG is mangling your packets, try `python sipping.py -X 192.168.1.10`. This shows return traffic, so you might be able to see if unexpected changes are appearing in the responses. + +### Using Logfiles + +SIP Ping outputs logfiles in CSV format. If you are a programmer you can probably use this with little difficulty. + +If you aren't a programmer, the easiest way to use the data is to load it into Microsoft Excel, which natively understands CSV. OpenOffice likely does as well, though I haven't tested it. + +The CSV files are human-readable if you know what to ignore. An example file looks like: + +
time,timestamp,host,latency,callid,response +15/05/15 08:18:48:383775,1431703128.38,192.168.10.10,3,0762101642,SIP/2.0 200 OK +15/05/15 08:18:49:387328,1431703129.39,192.168.10.10,drop,1257707506,drop +15/05/15 08:18:50:390695,1431703130.39,192.168.10.10,2,9081881016,SIP/2.0 200 OK+ +Each line represents a single ping attempt. Here are the fields, and how you would read the first line above: + +
Time | + +15/05/15 08:18:48:383775 | + +
UNIX timestamp | + +1431703129.39 | + +
Host that responded (for drops, the host that was pinged) | + +192.168.10.10 | + +
Latency | + +3 milliseconds | + +
SIP response code | + +200 OK | + +
15/05/15 08:18:49:387328,1431703129.39,192.168.10.10,drop,1257707506,drop+ +The Call-ID is 1257707506\. Now pull up the packet capture from the PBX. + +The packet capture shows an OPTIONS packet coming in with a Call-ID of 1257707506, and a 501 Not Implemented going back - but that 501 doesn't show up in the packet capture from your core network. + +You have now identified an instance of one-way signaling failure; the packet came in, the PBX tried to respond, but the response didn't arrive. You now have a much more specific definition of the problem. + +## Bugs / Known Issues + +* On Windows every line in the CSV is double-spaced. Excel will still read the file and a quick sort by the time column will strip the empty rows. + +If you find anything else, email me at [cathoderaydude@gekk.info](mailto:cathoderaydude@gekk.info) or open an Issue and I'd be happy to look into it. I do not have a QA team, so there could be lots of bugs I don't know about. + +## Future Development + +There are a lot of things I could do with this, but one of the big changes I intend to implement is **register-based** testing, where the tool runs at the "client" end and registers repeatedly against a server. + +## Why? + +There are at least two other SIP diagnostic tools that can send ping-esque messages that I can think of, but the problem is that, like all network diagnostic tools, they're only designed to answer the question, "is anything wrong right now?" + +In the real world, problems can often be intermittent, and their unpredictability certainly makes them no less serious. Hours or days might elapse between service outages, but when they occur they're showstoppers. + +I found that there was no tool available in this industry that could prove the integrity of the connection between two SIP devices, in full duplex, at high speed. + +I also found that a lot of techs in my field of work were using ping to do long-term monitoring, and ping just doesn't cut it for a wide variety of reasons, not the least being that ICMP is deprioritized on many routers and servers. Purpose-built tools decrease false positives. + +Additionally, while ping gives packet loss stats over time, it doesn't allow you to look back and say "two weeks ago at 11:31:43 voice traffic stopped passing for three minutes," and in this job, you sometimes need data with that level of precision. + + + + + +