The goal of this challenge is to achieve the highest prediction scores while showing advanced data research capabilities in order to map out the right features in the best way fitting your machine learning model of choice.
You will need to predict and classify malicious and benign API traffic utilizing your knowledge in data exploration and machine learning models.
In some datasets you will be required to create a model that labels not only the attack but also the attack type.
API security contains four datasets each will be divided to Train (70%), Test (15%), and Validation (15%) while the train-test split will be done inside the code the validation split is gives without labels for later grading.
All the datasets contain HTTP traffic of API requests and responses.
The datasets are:
- Dataset_1 - The most basic API traffic containing the least number of attacks and endpoints. Will basically enable to have a soft start.
Dataset 1 score:
precision recall f1-score support
0 1.00000 1.00000 1.00000 356
1 1.00000 1.00000 1.00000 400
accuracy 1.00000 756
macro avg 1.00000 1.00000 1.00000 756
weighted avg 1.00000 1.00000 1.00000 756
- Dataset_2 - A more advance version of the former dataset containing much more attacks and endpoint (almost double). The data is also more complex with a higher randomization mechanism.
Dataset 2 score:
precision recall f1-score support
0 1.00000 1.00000 1.00000 20812
1 1.00000 1.00000 1.00000 1692
accuracy 1.00000 22504
macro avg 1.00000 1.00000 1.00000 22504
weighted avg 1.00000 1.00000 1.00000 22504
- Dataset_3 - Same as before but this time the dataset includes more complex parameters in any request making it more authentic but also making it much harder to simply "plug and play" on any machine learning model. According to the model you choose some of you may need to approach the data exploration and feature extraction differently.
Dataset 3 baseline score - Label phase:
precision recall f1-score support
Benign 0.95715 0.97607 0.96652 30209
Malware 0.77364 0.65181 0.70752 3791
accuracy 0.93991 34000
macro avg 0.86539 0.81394 0.83702 34000
weighted avg 0.93669 0.93991 0.93764 34000
Dataset 3 score - Attack Type phase:
precision recall f1-score support
0 1.00000 1.00000 1.00000 22667
1 1.00000 1.00000 1.00000 575
2 1.00000 1.00000 1.00000 272
3 1.00000 1.00000 1.00000 278
4 1.00000 1.00000 1.00000 287
5 1.00000 1.00000 1.00000 285
6 1.00000 1.00000 1.00000 560
7 1.00000 1.00000 1.00000 581
accuracy 1.00000 25505
macro avg 1.00000 1.00000 1.00000 25505
weighted avg 1.00000 1.00000 1.00000 25505
- Dataset_4 - The most advance and complex dataset containing all the above and some more advanced features like API redirection, more requests types, deeper data access and more.
Dataset 4 score - Label phase:
precision recall f1-score support
0 1.00000 1.00000 1.00000 25292
1 1.00000 1.00000 1.00000 6967
accuracy 1.00000 32259
macro avg 1.00000 1.00000 1.00000 32259
weighted avg 1.00000 1.00000 1.00000 32259
Dataset 4 score - Attack type phase:
precision recall f1-score support
0 1.00000 1.00000 1.00000 25292
1 1.00000 1.00000 1.00000 1364
2 1.00000 1.00000 1.00000 703
3 1.00000 1.00000 1.00000 663
4 1.00000 1.00000 1.00000 718
5 1.00000 1.00000 1.00000 698
6 1.00000 1.00000 1.00000 1398
7 1.00000 1.00000 1.00000 1423
accuracy 1.00000 32259
macro avg 1.00000 1.00000 1.00000 32259
weighted avg 1.00000 1.00000 1.00000 32259
You will need to prepare four models (one per dataset) from the Train and test sets. The validation dataset does not contain labels. Once you build your model, please run the validation set and upload the output as described in the submission guide below.
The competition datasets include various API attack you may or may not encounter in any dataset. This is a basic description of the attacks you are encouraged to search online about these attacks in order to map out their features:
-
SQL Injection - One of the most common web attacks, uses backend SQL queries to inject code directly to the servers SQL database.
-
Directory Traversal - Uses basic terminal traversal strings in order to reach folders on the servers host that were not meant to be accessed by the user.
-
Remote Code Execution (RCE) - Some may call it the most critical exploit in any system, it allows the attacker to run code remotely on the local machine.
-
Cookie Injection - Cookies injected to a session they were not originated from. This is used sometimes to access another user illegitimately by using his tokens.
-
Cross Site Scripting (XSS) - Another very widely used web vulnerability XSS enables the attacker to run client-side code that will eventually affect sensitive processes handled by the backend.
-
Log4J - One of the most famous recently patched vulnerabilities. An exploit in JAVA servers using the famous Apache logging library that enables to run code remotely (This is actually an "easy" RCE exploit) on the server.
-
Log Forging - A technique of using the system to print fake or fraudulent logs. This enables an attacker to "inject" other user logs or fake his own attack logs in order to make it harder for any security researcher to find out what he really did during his access.
Model code that contain:
-
Basic log parsing (you can improve the logic)
-
Basic feature example of feature extraction (you can improve the logic)
-
Basic machine learning model (you can create and type of AI/ML model you want)
-
Output format (this you can’t change the competitions grading code needs this exact format in order to check the result correctly)
Each dataset contains a list of JSON file with this format:
- The log parsing logic reads the JSON and call extract feature function.
- The features are used to train the mode. You can change each part and adapt it to your own logic.
Once you upload the results to the Google Drive folder the code will analyze your results once a day and return the resulting grades including precision and recall. If your score is in the top 10 your name will appear as a leader in the dataset.
The competition score is based on:
-
Cisco judges
-
Competition average top10 ranking
-
Solution presentation and innovation.
As you can see by the baseline scores above, this challenge has 6 different phases which you will be required to submit a result for each one separately:
- Dataset 1 label phase
- Dataset 2 label phase
- Dataset 3 label phase
- Dataset 4 label phase
- Dataset 3 attack_type phase
- Dataset 5 attack_type phase
In order to submit your result file please upload it to this Google Drive folder (You will need to request for permissions)
-
In the submissions' folder create a folder with your team's name.
-
Upload all the result files in your team's folder.
- Please make sure to upload the files in this exact format and name (The baseline code currently saves in this format):
Once a day our grading script will run over the files and update your grading files and leaderboard accordingly.