Skip to content

kth-step/VM_SCAMV_intro

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Scam-V (Validation of Abstract Side-Channel Models for Computer Architectures)

In our paper, we describe a method to generate test inputs to validate side-channel models. The implementation thereof is called Scam-V and consists of a test input generation tool and a number of models together with a hardware benchmark platform. The tool has been packaged in this VM together with our evaluation results. However, the actual evaluation of test inputs generated by the tool requires a benchmark platform, which can be built with a bit of special hardware according to the documentation in the repository we have made available through the GitHub project EmbExp-Box.

In order to simplify the evaluation process, we have given access to our benchmark platform for this VM. This way, an evaluator is able to generate test inputs and execute them on the actual hardware benchmark platform remotely without requiring to invest in building an experiment setup first. Note that the process of executing test cases remotely requires internet access for the VM and the possibility to establish an SSH connection to tcs79.csc.kth.se on port 4422. The alternative is to build a local EmbExp-Box instance with a Raspberry Pi 3 board interfaced with appropriate JTAG and UART connections.

The evaluators may need to coordinate their efforts because whole experiment sets take long to execute and the hardware resources in our benchmark platform are limited. We currently only have 4 instances of Raspberry Pi 3 available and one of us may also use one instance during the artifact evaluation process. Furthermore, be aware that the hardware sometimes exhibits hardware issues as mentioned in the EmbExp-Logs README document, in the end of section "Executing and inspecting single experiments".

All paths given below are relative to ~/scamv in the VM.

1. The results presented in the paper

The first step is to review the results presented in this paper. These are the product of generating test inputs, or experiments, using the tool Scam-V and the execution of these experiments on our benchmark platform.

Experiment sets

When Scam-V generates test cases, it stores them in a SQLite database. At first, the database contains experiments that have not been executed yet. After executing them on a board, their outputs are stored in the database and they are ready for evaluation. The git repository containing the database and the scripts needed to drive Scam-V is in HolBA_logs/EmbExp-Logs. More detailed information about how to use the scripts are in the GitHub project EmbExp-Logs. The scripts reside in the directory scripts and all of them provide basic usage information if executed with the command line switch --help. Each bash terminal in the VM always has the HolBA environment loaded.

Evaluating experiment sets

The script db-eval.py in HolBA_logs/EmbExp-logs/scripts can be used to present the results of an experiment set execution. Its command-line option --dbfile can be used to select the experiment database of interest. The details about the evaluation of individual experiments and sets is described in the EmbExp-Logs README document. Here is an example output of this script taken from our prefetching experiments (dbfile is orig_exps/table1/1_logs_cachepartnopageboundary.db and we look at the experiment set with HolBA run id 2021-04-05_18-19-13_541 in the database):

Scam-V/HolBA run id: 2021-04-05_18-19-13_541
==================================================
exps_list_id = 1
progs_list_id = 1

scamv arguments = -i 450 -t 40 --prog_size 5 --enumerate --generator prefetch_strides --obs_model cache_tag_index_part --hw_obs_model hw_cache_tag_index_part

logged scamv gen run time = Duration: 31931.859s


runspecs = ['run.32a82c3f7f63b0b3240873b6c0471f99dd6ebb0b']

numprogs with exps           = 450
numprogs with result         = 450
numprogs with counterexample = 89

numexps                      = 18000
numexps withresult           = 18000
numexps asexamples           = 12844
numexps ascounterexamples    = 447
numexps asinconclusive       = 4709
numexps asexception          = 0

exps until first cexp gen    = 398

The fields of the output are to be interpreted as follows:

  • numprogs with exps: Total number of programs that are used by the respective experiment set

  • numprogs with result: Number of programs that produced a result (an exception is not a result)

  • numprogs with counterexample: Number of programs that produced at least one counterexample

  • numexps: Total number of experiments in the set

  • numexps withresult: Number of experiments that produced a result

  • numexps asexamples: Number of experiments that were examples (indistinguishable)

  • numexps ascounterexamples: Number of experiments that were counterexamples (distinguishable)

  • numexps asinconclusive: Number of experiments that were inconclusive

  • numexps asexception: Number of experiments that throw an exception at runtime

  • exps until first cexp gen: The id of the experiment that was the first found counterexample. Used to compute ``time to first counterexample''.

In this output we see that Scam-V generated 450 programs and 18000 experiments, of which 447 experiments were counterexamples. We can also see that these counterexamples arise from 89 different programs, and that 4709 experiments were inconclusive.

Showing bundled results

This VM includes the databases generated by the authors using Scam-V that contain the results presented in the paper. These can be found in orig_exps and correspond to the entries in Table 1 and Figure 7 in the way given by the following mapping:

Table Column Experiment description Database file HolBA run id
Table 1 1.1 cache partitioning non-aligned, no refinement table1/1_logs_cachepartnopageboundary.db 2021-04-06_08-57-18_724
Table 1 1.2 cache partitioning non-aligned, with refinement table1/1_logs_cachepartnopageboundary.db 2021-04-05_18-19-13_541
Table 1 2.1 cache partitioning aligned, no refinement table1/2_1_logs_cachepartpage_noenum.db 2021-01-27_22-44-05_349
Table 1 2.2 cache partitioning aligned, with refinement table1/2_2_logs_cachepartpage_withenum.db 2021-01-26_17-54-54_267
Table 1 3.1 template A, no refinement table1/3_logs-TemplateA.db 2021-03-30_10-21-45_317
Table 1 3.2 template A, with refinement table1/3_logs-TemplateA.db 2021-03-29_10-37-01_009
Table 1 4.1 template B, no refinement table1/4_logs-templateB/logs_templateB_batch1_p1p2p3.db + table1/4_logs-templateB/logs_templateB_batch2_p1p2p3.db 2021-04-07_19-19-07_315 + 2021-04-07_19-19-12_535
Table 1 4.2 template B, with refinement table1/4_logs-templateB/logs_templateB_batch1_p1p2p3.db + table1/4_logs-templateB/logs_templateB_batch2_p1p2p3.db 2021-04-10_03-48-55_400 + 2021-04-10_02-43-14_924
Figure 7 1.1 armclaim, no refinement figure7/logs-armclaim_mem_address_pc.db 2021-04-08_13-44-31_977
Figure 7 1.2 armclaim, with refinement figure7/logs-armclaim_cache_speculation.db 2021-04-07_17-30-41_937
Figure 7 2.1 armclaim, observe first speculative memory access, with refinement figure7/logs-armclaim_cache_speculation_first.db 2021-04-08_13-39-25_612
Figure 7 2.2 template b, observe first speculative memory access, with refinement table1/4_logs-templateB/logs_templateB_batch1_p1p2p3.db + table1/4_logs-templateB/logs_templateB_batch2_p1p2p3.db 2021-04-13_01-23-52_668 + 2021-04-13_01-24-13_308
Figure 7 3 straightline speculation, with refinement figure7/straightline/logs_straightline_b1.db + figure7/straightline/logs_straightline_b2.db 2021-04-10_14-54-43_533 + 2021-04-13_16-13-30_488

The script introduction/scripts/eval_all.py can be used to automatically run db-eval.py on all the included databases. Results will be printed out to the terminal in the format described above. The expected output can be found in introduction/scripts/eval_all_result.txt. We also provide the following two files where this output has been organized to match the table column order and omit the outputs of redundant experiment set data:

  • introduction/scripts/experiment_index_table1.txt, and
  • introduction/scripts/experiment_index_figure7.txt.

NB. The script db-eval.py does not actually exercise the pipeline, it is simply a tool to inspect the results that have already been stored in a database.

2. Reproducing experiments

The second step is to generate new experiments with Scam-V using the same configurations as in the paper.

In order to generate and run experiments with Scam-V, the tool needs a configuration that specifies parameters such as observation model, program generator, and number of test cases to generate, among others. We have included in this VM the same configurations we used in our experiments, and they are given by the following identifiers:

Table Column Experiment generation configuration id
Table 1 1.1 micro2021_t1_c1_1
Table 1 1.2 micro2021_t1_c1_2
Table 1 2.1 micro2021_t1_c2_1
Table 1 2.2 micro2021_t1_c2_2
Table 1 3.1 micro2021_t1_c3_1
Table 1 3.2 micro2021_t1_c3_2
Table 1 4.1 micro2021_t1_c4_1
Table 1 4.2 micro2021_t1_c4_2
Figure 7 1.1 micro2021_f7_c1_1
Figure 7 1.2 micro2021_f7_c1_2
Figure 7 2.1 micro2021_f7_c2_1
Figure 7 2.2 micro2021_f7_c2_2
Figure 7 3 micro2021_f7_c3

Notice that it may happen that the experiment execution process stalls due to run-time issues as indicated in the EmbExp-Logs README file. In this case many experiments execute without a result, which is indicated with the warning unsuccessful. This requires either to issue a complete restart or, better yet, to cancel the running experiments and resume by manually orchestrating the scripts in Scam-V examples or EmbExp-Logs according to the documentation. We do not provide a high level script for this purpose.

The process to generate and validate an experiment set is as follows:

  1. Select a configuration from the list above (e.g., micro2021_f7_c1_1) and execute the following:
    ./introduction/scripts/1_reproduce_experimentset.sh micro2021_f7_c1_1
    
  2. Follow the outputs and answer the questions of the script. A second and third terminal will open up in the process and the first terminal will start running experiment by experiment.
  3. It is possible and common practice to monitor the status of the process by opening a fourth terminal and executing the following from time to time to notice if something goes wrong or the hardware is stuck:
    ./introduction/scripts/2_status.sh
    
  4. Wait for the experiments to finish executing in the first terminal. NOTICE: This step takes about 24-48 hours (depending on your connection's latency and throughput to the experiment board server) for the example invokation shown in step 1.
  5. Make sure to terminate the board connection in the second terminal once the experiments finished.
  6. Check the results using ./introduction/scripts/2_status.sh.

3. Evaluation checklist

Having executed experiments, in order to validate the obtained results the following points must be checked for each experiment output. Notice that the numbers in our checklist are approximate and the results of experiments can be slightly affected by different factors. For example, the execution time can be affected by connection's latency and throughput to the experiment board server, the number of counterexamples might be affected by unforeseen and hidden microarchitectural interactions, etc.

Model M_part. With refinement in place:

  • Number of programs with counterexamples is ~4 times more
  • Number of counterexamples is ~20 times greater
  • Time to reach the first counterexample is ~4 times faster

Model M_ct Template A. With refinement in place:

  • Number of programs with counterexamples is ~100 times more
  • Number of counterexamples is ~2000 times greater
  • Time to reach the first counterexample is ~7000 times faster

Model M_ct Template B. Without refinement we do not expect to find any counterexample, while with refinement in place:

  • ~50% of all programs will have at least one counterexample
  • ~13% of all experiments will be counterexamples
  • Time to reach the first counterexamples is ~15 minutes

Model M_ct Template C. Without refinement we do not expect to find a counterexample, while with refinement in place:

  • ~42% of all experiments will be counterexamples
  • Time to reach the first counterexamples is less than a minute

Model M_spec_1 Template C and B and with refinement. While with Template C we do not expect to get any counterexample, with Template B:

  • ~0.6% of all experiments will be counterexamples
  • Time to reach the first counterexamples is ~4.5 hours

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •