Scam-V (Validation of Abstract Side-Channel Models for Computer Architectures)

In our paper, we describe a method to generate test inputs to validate side-channel models. The implementation thereof is called Scam-V and consists of a test input generation tool and a number of models together with a hardware benchmark platform. The tool has been packaged in this VM together with our evaluation results. However, the actual evaluation of test inputs generated by the tool requires a benchmark platform, which can be built with a bit of special hardware according to the documentation in the repository we have made available through the GitHub project EmbExp-Box.

In order to simplify the evaluation process, we have given access to our benchmark platform for this VM. This way, an evaluator is able to generate test inputs and execute them on the actual hardware benchmark platform remotely without requiring to invest in building an experiment setup first. Note that the process of executing test cases remotely requires internet access for the VM and the possibility to establish an SSH connection to tcs79.csc.kth.se on port 4422. The alternative is to build a local EmbExp-Box instance with a Raspberry Pi 3 board interfaced with appropriate JTAG and UART connections.

The evaluators may need to coordinate their efforts because whole experiment sets take long to execute and the hardware resources in our benchmark platform are limited. We currently only have 4 instances of Raspberry Pi 3 available and one of us may also use one instance during the artifact evaluation process. Furthermore, be aware that the hardware sometimes exhibits hardware issues as mentioned in the EmbExp-Logs README document, in the end of section "Executing and inspecting single experiments".

All paths given below are relative to ~/scamv in the VM.

1. The results presented in the paper

The first step is to review the results presented in this paper. These are the product of generating test inputs, or experiments, using the tool Scam-V and the execution of these experiments on our benchmark platform.

Experiment sets

When Scam-V generates test cases, it stores them in a SQLite database. At first, the database contains experiments that have not been executed yet. After executing them on a board, their outputs are stored in the database and they are ready for evaluation. The git repository containing the database and the scripts needed to drive Scam-V is in HolBA_logs/EmbExp-Logs. More detailed information about how to use the scripts are in the GitHub project EmbExp-Logs. The scripts reside in the directory scripts and all of them provide basic usage information if executed with the command line switch --help. Each bash terminal in the VM always has the HolBA environment loaded.

Evaluating experiment sets

The script db-eval.py in HolBA_logs/EmbExp-logs/scripts can be used to present the results of an experiment set execution. Its command-line option --dbfile can be used to select the experiment database of interest. The details about the evaluation of individual experiments and sets is described in the EmbExp-Logs README document. Here is an example output of this script taken from our prefetching experiments (dbfile is orig_exps/table1/1_logs_cachepartnopageboundary.db and we look at the experiment set with HolBA run id 2021-04-05_18-19-13_541 in the database):

Scam-V/HolBA run id: 2021-04-05_18-19-13_541
==================================================
exps_list_id = 1
progs_list_id = 1

scamv arguments = -i 450 -t 40 --prog_size 5 --enumerate --generator prefetch_strides --obs_model cache_tag_index_part --hw_obs_model hw_cache_tag_index_part

logged scamv gen run time = Duration: 31931.859s


runspecs = ['run.32a82c3f7f63b0b3240873b6c0471f99dd6ebb0b']

numprogs with exps           = 450
numprogs with result         = 450
numprogs with counterexample = 89

numexps                      = 18000
numexps withresult           = 18000
numexps asexamples           = 12844
numexps ascounterexamples    = 447
numexps asinconclusive       = 4709
numexps asexception          = 0

exps until first cexp gen    = 398

The fields of the output are to be interpreted as follows:

numprogs with exps: Total number of programs that are used by the respective experiment set
numprogs with result: Number of programs that produced a result (an exception is not a result)
numprogs with counterexample: Number of programs that produced at least one counterexample
numexps: Total number of experiments in the set
numexps withresult: Number of experiments that produced a result
numexps asexamples: Number of experiments that were examples (indistinguishable)
numexps ascounterexamples: Number of experiments that were counterexamples (distinguishable)
numexps asinconclusive: Number of experiments that were inconclusive
numexps asexception: Number of experiments that throw an exception at runtime
exps until first cexp gen: The id of the experiment that was the first found counterexample. Used to compute ``time to first counterexample''.

In this output we see that Scam-V generated 450 programs and 18000 experiments, of which 447 experiments were counterexamples. We can also see that these counterexamples arise from 89 different programs, and that 4709 experiments were inconclusive.

Showing bundled results

This VM includes the databases generated by the authors using Scam-V that contain the results presented in the paper. These can be found in orig_exps and correspond to the entries in Table 1 and Figure 7 in the way given by the following mapping:

Table	Column	Experiment description	Database file	HolBA run id
Table 1	1.1	cache partitioning non-aligned, no refinement	`table1/1_logs_cachepartnopageboundary.db`	`2021-04-06_08-57-18_724`
Table 1	1.2	cache partitioning non-aligned, with refinement	`table1/1_logs_cachepartnopageboundary.db`	`2021-04-05_18-19-13_541`
Table 1	2.1	cache partitioning aligned, no refinement	`table1/2_1_logs_cachepartpage_noenum.db`	`2021-01-27_22-44-05_349`
Table 1	2.2	cache partitioning aligned, with refinement	`table1/2_2_logs_cachepartpage_withenum.db`	`2021-01-26_17-54-54_267`
Table 1	3.1	template A, no refinement	`table1/3_logs-TemplateA.db`	`2021-03-30_10-21-45_317`
Table 1	3.2	template A, with refinement	`table1/3_logs-TemplateA.db`	`2021-03-29_10-37-01_009`
Table 1	4.1	template B, no refinement	`table1/4_logs-templateB/logs_templateB_batch1_p1p2p3.db` + `table1/4_logs-templateB/logs_templateB_batch2_p1p2p3.db`	`2021-04-07_19-19-07_315` + `2021-04-07_19-19-12_535`
Table 1	4.2	template B, with refinement	`table1/4_logs-templateB/logs_templateB_batch1_p1p2p3.db` + `table1/4_logs-templateB/logs_templateB_batch2_p1p2p3.db`	`2021-04-10_03-48-55_400` + `2021-04-10_02-43-14_924`
Figure 7	1.1	armclaim, no refinement	`figure7/logs-armclaim_mem_address_pc.db`	`2021-04-08_13-44-31_977`
Figure 7	1.2	armclaim, with refinement	`figure7/logs-armclaim_cache_speculation.db`	`2021-04-07_17-30-41_937`
Figure 7	2.1	armclaim, observe first speculative memory access, with refinement	`figure7/logs-armclaim_cache_speculation_first.db`	`2021-04-08_13-39-25_612`
Figure 7	2.2	template b, observe first speculative memory access, with refinement	`table1/4_logs-templateB/logs_templateB_batch1_p1p2p3.db` + `table1/4_logs-templateB/logs_templateB_batch2_p1p2p3.db`	`2021-04-13_01-23-52_668` + `2021-04-13_01-24-13_308`
Figure 7	3	straightline speculation, with refinement	`figure7/straightline/logs_straightline_b1.db` + `figure7/straightline/logs_straightline_b2.db`	`2021-04-10_14-54-43_533` + `2021-04-13_16-13-30_488`

The script introduction/scripts/eval_all.py can be used to automatically run db-eval.py on all the included databases. Results will be printed out to the terminal in the format described above. The expected output can be found in introduction/scripts/eval_all_result.txt. We also provide the following two files where this output has been organized to match the table column order and omit the outputs of redundant experiment set data:

introduction/scripts/experiment_index_table1.txt, and
introduction/scripts/experiment_index_figure7.txt.

NB. The script db-eval.py does not actually exercise the pipeline, it is simply a tool to inspect the results that have already been stored in a database.

2. Reproducing experiments

The second step is to generate new experiments with Scam-V using the same configurations as in the paper.

In order to generate and run experiments with Scam-V, the tool needs a configuration that specifies parameters such as observation model, program generator, and number of test cases to generate, among others. We have included in this VM the same configurations we used in our experiments, and they are given by the following identifiers:

Table	Column	Experiment generation configuration id
Table 1	1.1	`micro2021_t1_c1_1`
Table 1	1.2	`micro2021_t1_c1_2`
Table 1	2.1	`micro2021_t1_c2_1`
Table 1	2.2	`micro2021_t1_c2_2`
Table 1	3.1	`micro2021_t1_c3_1`
Table 1	3.2	`micro2021_t1_c3_2`
Table 1	4.1	`micro2021_t1_c4_1`
Table 1	4.2	`micro2021_t1_c4_2`
Figure 7	1.1	`micro2021_f7_c1_1`
Figure 7	1.2	`micro2021_f7_c1_2`
Figure 7	2.1	`micro2021_f7_c2_1`
Figure 7	2.2	`micro2021_f7_c2_2`
Figure 7	3	`micro2021_f7_c3`

Notice that it may happen that the experiment execution process stalls due to run-time issues as indicated in the EmbExp-Logs README file. In this case many experiments execute without a result, which is indicated with the warning unsuccessful. This requires either to issue a complete restart or, better yet, to cancel the running experiments and resume by manually orchestrating the scripts in Scam-V examples or EmbExp-Logs according to the documentation. We do not provide a high level script for this purpose.

The process to generate and validate an experiment set is as follows:

Select a configuration from the list above (e.g., micro2021_f7_c1_1) and execute the following:
```
./introduction/scripts/1_reproduce_experimentset.sh micro2021_f7_c1_1
```
Follow the outputs and answer the questions of the script. A second and third terminal will open up in the process and the first terminal will start running experiment by experiment.
It is possible and common practice to monitor the status of the process by opening a fourth terminal and executing the following from time to time to notice if something goes wrong or the hardware is stuck:
```
./introduction/scripts/2_status.sh
```
Wait for the experiments to finish executing in the first terminal. NOTICE: This step takes about 24-48 hours (depending on your connection's latency and throughput to the experiment board server) for the example invokation shown in step 1.
Make sure to terminate the board connection in the second terminal once the experiments finished.
Check the results using ./introduction/scripts/2_status.sh.

3. Evaluation checklist

Having executed experiments, in order to validate the obtained results the following points must be checked for each experiment output. Notice that the numbers in our checklist are approximate and the results of experiments can be slightly affected by different factors. For example, the execution time can be affected by connection's latency and throughput to the experiment board server, the number of counterexamples might be affected by unforeseen and hidden microarchitectural interactions, etc.

Model M_part. With refinement in place:

Number of programs with counterexamples is ~4 times more
Number of counterexamples is ~20 times greater
Time to reach the first counterexample is ~4 times faster

Model M_ct Template A. With refinement in place:

Number of programs with counterexamples is ~100 times more
Number of counterexamples is ~2000 times greater
Time to reach the first counterexample is ~7000 times faster

Model M_ct Template B. Without refinement we do not expect to find any counterexample, while with refinement in place:

~50% of all programs will have at least one counterexample
~13% of all experiments will be counterexamples
Time to reach the first counterexamples is ~15 minutes

Model M_ct Template C. Without refinement we do not expect to find a counterexample, while with refinement in place:

~42% of all experiments will be counterexamples
Time to reach the first counterexamples is less than a minute

Model M_spec_1 Template C and B and with refinement. While with Template C we do not expect to get any counterexample, with Template B:

~0.6% of all experiments will be counterexamples
Time to reach the first counterexamples is ~4.5 hours

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
scripts		scripts
welcome		welcome
README.md		README.md
prepare.sh		prepare.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scam-V (Validation of Abstract Side-Channel Models for Computer Architectures)

1. The results presented in the paper

Experiment sets

Evaluating experiment sets

Showing bundled results

2. Reproducing experiments

3. Evaluation checklist

About

Releases

Packages

Contributors 3

Languages

kth-step/VM_SCAMV_intro

Folders and files

Latest commit

History

Repository files navigation

Scam-V (Validation of Abstract Side-Channel Models for Computer Architectures)

1. The results presented in the paper

Experiment sets

Evaluating experiment sets

Showing bundled results

2. Reproducing experiments

3. Evaluation checklist

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages