Skip to content

Latest commit

 

History

History
51 lines (30 loc) · 2.16 KB

README.md

File metadata and controls

51 lines (30 loc) · 2.16 KB

Encryption Analysis

Encryption Analysis calculates the entropies of data flows and classifies each flow as either encrypted, text, media, or unknown.

For step-by-step instructions on getting started, see the Getting Started document.

Setup

The system should have wireshark/tshark installed; the versions that our machines have tested to be working are v2.6.7, v2.6.8, and v2.6.10.

We use Python 3 unless otherwise specified.

Usage

The Jupyter Notebook encryption_sample.ipynb provides steps to parse a pcap file and label each flow as one of the four data types (encrypted, text, media, unknown).

encryption.sh is an equivalent to the Jupyter Notebook, which can be run directly in the terminal.

Usage: ./encryption.sh in_pcap out_csv ek_json

Example: ./encryption.sh sample.pcap sample.csv sample.json

The sample code intends to demonstrate how we processed a single file. One should adapt the code in their cluster environment to process the whole dataset (traffic of 34,586 experiments).

Input

in_pcap - The path to the input pcap file.

out_csv - The path to the output CSV file that will will be generated from ek_json.

ek_json - The path to the intermediate JSON file that will be generated from in_pcap.

Note: If out_csv and ek_json do not exist, they will be generated by the scripts. If they do currently exist, they will be overwritten.

Output

First, TShark decodes the pcap file into the JSON file. shrink_compute.py then performs analysis on the JSON file, which produces the CSV file.

The CSV file has ten headings. Their meanings are listed below:

  • ip_src - The IP address of the source.
  • ip_dest - The IP address of the destination.
  • srcport - The transport layer source port number.
  • dstport - The transport layer destination port number.
  • tp_proto - The transport layer protocol.
  • data_proto - The application layer protocol.
  • data_type - The data type. Either unknown, text, media, compressed, or encrypted.
  • data_len - The length of the data in bytes.
  • entropy - The entropy of the data.
  • reason - Information about the output.