In our paper, we describe a method to generate test inputs to validate side-channel models. The implementation thereof is called Scam-V and consists of a test input generation tool and a number of models together with a hardware benchmark platform. The tool has been packaged in this VM together with our evaluation results. However, the actual evaluation of test inputs generated by the tool requires a benchmark platform, which can be built with a bit of special hardware according to the documentation in the repository we have made available through the GitHub project EmbExp-Box.
In order to simplify the evaluation process, we have given access to our benchmark platform for this VM.
This way, an evaluator is able to generate test inputs and execute them on the actual hardware benchmark platform remotely without requiring to invest in building an experiment setup first.
Note that the process of executing test cases remotely requires internet access for the VM and the possibility to establish an SSH connection to tcs79.csc.kth.se
on port 4422
.
The alternative is to build a local EmbExp-Box
instance with a Raspberry Pi 3 board interfaced with appropriate JTAG and UART connections.
The evaluators may need to coordinate their efforts because whole experiment sets take long to execute and the hardware resources in our benchmark platform are limited.
We currently only have 4 instances of Raspberry Pi 3 available and one of us may also use one instance during the artifact evaluation process.
Furthermore, be aware that the hardware sometimes exhibits hardware issues as mentioned in the EmbExp-Logs
README document, in the end of section "Executing and inspecting single experiments".
All paths given below are relative to ~/scamv
in the VM.
The first step is to review the results presented in this paper. These are the product of generating test inputs, or experiments, using the tool Scam-V and the execution of these experiments on our benchmark platform.
When Scam-V generates test cases, it stores them in a SQLite
database. At first, the database contains experiments that have not
been executed yet. After executing them on a board, their outputs are
stored in the database and they are ready for evaluation. The git
repository containing the database and the scripts needed to drive
Scam-V is in HolBA_logs/EmbExp-Logs
. More detailed information about
how to use the scripts are in the GitHub project
EmbExp-Logs
. The scripts
reside in the directory scripts
and all of them provide basic usage
information if executed with the command line switch --help
. Each
bash terminal in the VM always has the HolBA environment loaded.
The script db-eval.py
in HolBA_logs/EmbExp-logs/scripts
can be used to present the results of an experiment set
execution. Its command-line option --dbfile
can be used to select
the experiment database of interest. The details about the evaluation
of individual experiments and sets is described in the
EmbExp-Logs
README
document. Here is an example output of this script taken from our
prefetching experiments (dbfile is orig_exps/table1/1_logs_cachepartnopageboundary.db
and we look at the experiment set with HolBA run id 2021-04-05_18-19-13_541
in the database):
Scam-V/HolBA run id: 2021-04-05_18-19-13_541
==================================================
exps_list_id = 1
progs_list_id = 1
scamv arguments = -i 450 -t 40 --prog_size 5 --enumerate --generator prefetch_strides --obs_model cache_tag_index_part --hw_obs_model hw_cache_tag_index_part
logged scamv gen run time = Duration: 31931.859s
runspecs = ['run.32a82c3f7f63b0b3240873b6c0471f99dd6ebb0b']
numprogs with exps = 450
numprogs with result = 450
numprogs with counterexample = 89
numexps = 18000
numexps withresult = 18000
numexps asexamples = 12844
numexps ascounterexamples = 447
numexps asinconclusive = 4709
numexps asexception = 0
exps until first cexp gen = 398
The fields of the output are to be interpreted as follows:
-
numprogs with exps: Total number of programs that are used by the respective experiment set
-
numprogs with result: Number of programs that produced a result (an exception is not a result)
-
numprogs with counterexample: Number of programs that produced at least one counterexample
-
numexps: Total number of experiments in the set
-
numexps withresult: Number of experiments that produced a result
-
numexps asexamples: Number of experiments that were examples (indistinguishable)
-
numexps ascounterexamples: Number of experiments that were counterexamples (distinguishable)
-
numexps asinconclusive: Number of experiments that were inconclusive
-
numexps asexception: Number of experiments that throw an exception at runtime
-
exps until first cexp gen: The id of the experiment that was the first found counterexample. Used to compute ``time to first counterexample''.
In this output we see that Scam-V generated 450 programs and 18000 experiments, of which 447 experiments were counterexamples. We can also see that these counterexamples arise from 89 different programs, and that 4709 experiments were inconclusive.
This VM includes the databases generated by the authors using Scam-V
that contain the results presented in the paper. These can be found in
orig_exps
and correspond to the entries in Table 1 and
Figure 7 in the way given by the following mapping:
Table | Column | Experiment description | Database file | HolBA run id |
---|---|---|---|---|
Table 1 | 1.1 | cache partitioning non-aligned, no refinement | table1/1_logs_cachepartnopageboundary.db |
2021-04-06_08-57-18_724 |
Table 1 | 1.2 | cache partitioning non-aligned, with refinement | table1/1_logs_cachepartnopageboundary.db |
2021-04-05_18-19-13_541 |
Table 1 | 2.1 | cache partitioning aligned, no refinement | table1/2_1_logs_cachepartpage_noenum.db |
2021-01-27_22-44-05_349 |
Table 1 | 2.2 | cache partitioning aligned, with refinement | table1/2_2_logs_cachepartpage_withenum.db |
2021-01-26_17-54-54_267 |
Table 1 | 3.1 | template A, no refinement | table1/3_logs-TemplateA.db |
2021-03-30_10-21-45_317 |
Table 1 | 3.2 | template A, with refinement | table1/3_logs-TemplateA.db |
2021-03-29_10-37-01_009 |
Table 1 | 4.1 | template B, no refinement | table1/4_logs-templateB/logs_templateB_batch1_p1p2p3.db + table1/4_logs-templateB/logs_templateB_batch2_p1p2p3.db |
2021-04-07_19-19-07_315 + 2021-04-07_19-19-12_535 |
Table 1 | 4.2 | template B, with refinement | table1/4_logs-templateB/logs_templateB_batch1_p1p2p3.db + table1/4_logs-templateB/logs_templateB_batch2_p1p2p3.db |
2021-04-10_03-48-55_400 + 2021-04-10_02-43-14_924 |
Figure 7 | 1.1 | armclaim, no refinement | figure7/logs-armclaim_mem_address_pc.db |
2021-04-08_13-44-31_977 |
Figure 7 | 1.2 | armclaim, with refinement | figure7/logs-armclaim_cache_speculation.db |
2021-04-07_17-30-41_937 |
Figure 7 | 2.1 | armclaim, observe first speculative memory access, with refinement | figure7/logs-armclaim_cache_speculation_first.db |
2021-04-08_13-39-25_612 |
Figure 7 | 2.2 | template b, observe first speculative memory access, with refinement | table1/4_logs-templateB/logs_templateB_batch1_p1p2p3.db + table1/4_logs-templateB/logs_templateB_batch2_p1p2p3.db |
2021-04-13_01-23-52_668 + 2021-04-13_01-24-13_308 |
Figure 7 | 3 | straightline speculation, with refinement | figure7/straightline/logs_straightline_b1.db + figure7/straightline/logs_straightline_b2.db |
2021-04-10_14-54-43_533 + 2021-04-13_16-13-30_488 |
The script introduction/scripts/eval_all.py
can be used to
automatically run db-eval.py
on all the included databases. Results
will be printed out to the terminal in the format described above.
The expected output can be found in introduction/scripts/eval_all_result.txt
.
We also provide the following two files where this output has been organized to match the table column order and omit the outputs of redundant experiment set data:
introduction/scripts/experiment_index_table1.txt
, andintroduction/scripts/experiment_index_figure7.txt
.
NB. The script db-eval.py
does not actually exercise the pipeline, it is simply a tool to inspect the results that have already been stored in a database.
The second step is to generate new experiments with Scam-V using the same configurations as in the paper.
In order to generate and run experiments with Scam-V, the tool needs a configuration that specifies parameters such as observation model, program generator, and number of test cases to generate, among others. We have included in this VM the same configurations we used in our experiments, and they are given by the following identifiers:
Table | Column | Experiment generation configuration id |
---|---|---|
Table 1 | 1.1 | micro2021_t1_c1_1 |
Table 1 | 1.2 | micro2021_t1_c1_2 |
Table 1 | 2.1 | micro2021_t1_c2_1 |
Table 1 | 2.2 | micro2021_t1_c2_2 |
Table 1 | 3.1 | micro2021_t1_c3_1 |
Table 1 | 3.2 | micro2021_t1_c3_2 |
Table 1 | 4.1 | micro2021_t1_c4_1 |
Table 1 | 4.2 | micro2021_t1_c4_2 |
Figure 7 | 1.1 | micro2021_f7_c1_1 |
Figure 7 | 1.2 | micro2021_f7_c1_2 |
Figure 7 | 2.1 | micro2021_f7_c2_1 |
Figure 7 | 2.2 | micro2021_f7_c2_2 |
Figure 7 | 3 | micro2021_f7_c3 |
Notice that it may happen that the experiment execution process stalls due to run-time issues as indicated in the EmbExp-Logs
README file.
In this case many experiments execute without a result, which is indicated with the warning unsuccessful
.
This requires either to issue a complete restart or, better yet, to cancel the running experiments and resume by manually orchestrating the scripts in Scam-V examples or EmbExp-Logs
according to the documentation.
We do not provide a high level script for this purpose.
The process to generate and validate an experiment set is as follows:
- Select a configuration from the list above (e.g.,
micro2021_f7_c1_1
) and execute the following:./introduction/scripts/1_reproduce_experimentset.sh micro2021_f7_c1_1
- Follow the outputs and answer the questions of the script. A second and third terminal will open up in the process and the first terminal will start running experiment by experiment.
- It is possible and common practice to monitor the status of the process by opening a fourth terminal and executing the following from time to time to notice if something goes wrong or the hardware is stuck:
./introduction/scripts/2_status.sh
- Wait for the experiments to finish executing in the first terminal. NOTICE: This step takes about 24-48 hours (depending on your connection's latency and throughput to the experiment board server) for the example invokation shown in step 1.
- Make sure to terminate the board connection in the second terminal once the experiments finished.
- Check the results using
./introduction/scripts/2_status.sh
.
Having executed experiments, in order to validate the obtained results the following points must be checked for each experiment output. Notice that the numbers in our checklist are approximate and the results of experiments can be slightly affected by different factors. For example, the execution time can be affected by connection's latency and throughput to the experiment board server, the number of counterexamples might be affected by unforeseen and hidden microarchitectural interactions, etc.
Model M_part. With refinement in place:
- Number of programs with counterexamples is ~4 times more
- Number of counterexamples is ~20 times greater
- Time to reach the first counterexample is ~4 times faster
Model M_ct Template A. With refinement in place:
- Number of programs with counterexamples is ~100 times more
- Number of counterexamples is ~2000 times greater
- Time to reach the first counterexample is ~7000 times faster
Model M_ct Template B. Without refinement we do not expect to find any counterexample, while with refinement in place:
- ~50% of all programs will have at least one counterexample
- ~13% of all experiments will be counterexamples
- Time to reach the first counterexamples is ~15 minutes
Model M_ct Template C. Without refinement we do not expect to find a counterexample, while with refinement in place:
- ~42% of all experiments will be counterexamples
- Time to reach the first counterexamples is less than a minute
Model M_spec_1 Template C and B and with refinement. While with Template C we do not expect to get any counterexample, with Template B:
- ~0.6% of all experiments will be counterexamples
- Time to reach the first counterexamples is ~4.5 hours