Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add workload generation #27

Merged
merged 14 commits into from
Jan 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
File renamed without changes.
61 changes: 61 additions & 0 deletions experiment/experiment-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
### Experiment design

We need to replay a production VM arrival trace on the system. We can evaluate different packing strategies
then.

Limitation is inventory of the deployment(36 cpu cores) is smaller than of a cloud inventory (~15k+ machines).

Solution is generating a vm trace suitable for the deployment, based on the production trace.

We need to decide followings.

Trace features.

- VM arrival rate: How many VMs arrive at a given time
- VM size: vCPU count
- VM type: Evictable or Regular

Experiment settings.

- Total duration for the experiment
- Time period of Renewable availability

#### Inventory

A five node Openstack cluster with green cores enabled.

- 16 cores: 12 Regular + 4 Green
- 8 cores: 5 Regular + 3 Green
- 4 cores: 4 Regular + 1 Green
- 4 cores: 4 Regular + 1 Green
- 4 cores: 4 Regular + 1 Green \[Physical Access\]

One of the machine provides admin access to bios, and provides host os that runs on bare-metal.

Physical access machine provides power stats, as well as true control over CPU core power management. All machines
provide a packing inventory. Therefore, packing metrics such as green score, density and utilization are calculated
all over the cluster and physical machine provide power consumption metrics.

### Trace goals

Generated trace should roughly utilize inventory around 70-80% (=production cloud inventory data)

#### pre-process

We select X-percentile of the workload. say 90th percentile.

We further pick a time period, t1 and t2. ex: 0.0 - 1.0.

Within the time period, we explore each time step. calculate number of requests in each step, to get request distribution. obtain vcpu distribution
and lifetime distribution. then for the X-th percentile, we find the max number for each (y for vcp, z for requests, etc).

We then reduce the trace by converting each time step via the calculate max values.

if count is too much, we omit all requests from that step. if vcpu is too much, we omit that vm request. likewise.

result is reduced trace.

when pre-process script ran, initially it prits max values. so percentile can be adjusted by guessing for the size of the
inventory we have.

ex: for 26 cores inventory, maybe 4 requests max at a time suits. then 70th percentile is better.
3 changes: 3 additions & 0 deletions experiment/trace-generation/create_flavours.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from openstack_client import create_flavours

create_flavours()
68 changes: 68 additions & 0 deletions experiment/trace-generation/external.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
import csv
from openstack_client import create_flavours, create_vm, delete_vm


class RequestsManager:
'''OsManager encapsulate vm lifecycle management based on the trace information.
It further provides an insight on core utilization.
'''

def __init__(self):
self.local_tracking_total_cores = 36
self.local_tracking_used_cores = 0
self.created_vms = {}
self.evictedVMs = []
self.uninterruptedVMs = []

def dispatch(self, vm_rqs, clk):
self.create_vms(vm_rqs, clk)

def handle_expired_vms(self, clk):
for vm_name, vm in self.created_vms.items():
if vm['end-of-life'] <= clk:
resp = self.delete_vm(vm)
if not resp:
# VM does not exist. Must be evicted.
print('failed deletion. marking as evicted: ', vm['name'])
vm = self.created_vms[vm_name]
vm['is-evicted'] = True
else:
vm['is-deleted'] = True
vm['end-of-life'] = 100 # a very large value that we will never reach

def create_vms(self, vm_rqs, clk):
for vm in vm_rqs:
resp = self.create_vm(vm, clk)
if resp:
print('marked successful vm creation of: ', vm['name'])
vm['end-of-life'] = clk + vm['lifetime']
vm['is-evicted'] = False # assume vm is going to live a full life.
self.created_vms[vm['name']] = vm
self.local_tracking_used_cores += vm['vcpu']

def create_vm(self, vm, clk):
return create_vm(vm)

def delete_vm(self, vm):
return delete_vm(vm)

def get_utilization(self):
return self.local_tracking_used_cores / self.local_tracking_total_cores

def dump(self, file_path):
header = ['name', 'type', 'vcpu', 'lifetime', 'is-evicted']

with open(file_path, 'w', encoding='UTF8') as f:
writer = csv.writer(f)

# write the header
writer.writerow(header)

for vm in self.created_vms.values():
writer.writerow([
vm['name'],
vm['type'],
vm['vcpu'],
vm['lifetime'],
vm['is-evicted']
])
19 changes: 19 additions & 0 deletions experiment/trace-generation/math_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
import scipy.stats


def pick_random(dst):
is_uniform = len(set(dst)) == 1
if is_uniform:
return dst[0]

lower_bd = min(dst)
upper_bd = max(dst)
kde = scipy.stats.gaussian_kde(dst)

# kde can include out of range.
sample = kde.resample(size=1)
val = sample[0][0]
while lower_bd >= val >= upper_bd:
sample = kde.resample(size=1)
val = sample[0][0]
return val
52 changes: 52 additions & 0 deletions experiment/trace-generation/openstack_client.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
import subprocess

cmd = "date"

# returns output as byte string
returned_output = subprocess.check_output(cmd)

# using decode() function to convert byte string to string
print('Current date is:', returned_output.decode("utf-8"))


def create_flavours():
# flavours support upto 12 vcpu.
for i in range(1, 13):
try:
cmd = "openstack flavor create --public pinned.vcpu-" + str(i) + " --id pinned.vcpu-" + str(
i) + " --ram 256 --disk 1 --vcpus " + str(i)
print(cmd)
returned_output = subprocess.check_output(cmd, shell=True)
print('flavour creation for vcpu:', i, returned_output.decode("utf-8"))
except:
print('failed flavour creation for vcpu:', i)
try:
cmd2 = "openstack flavor set pinned.vcpu-" + str(i) + " --property hw:cpu_policy=dedicated"
print(cmd2)
returned_output = subprocess.check_output(cmd2, shell=True)
print('setting dedicated for flavour creation for vcpu:', i, returned_output.decode("utf-8"))
except:
print('failed setting dedicated for flavour creation for vcpu:', i)


def create_vm(vm):
print('attempting to create vm: ', vm['name'])
try:
cmd = "openstack server create --nic net-id=\"public\" --image \"cirros-0.6.2-x86_64-disk\" --flavor \"pinned.vcpu-" + str(
round(vm['vcpu'])) + "\" \"" + vm['name'] + "\" --wait "
returned_output = subprocess.check_output(cmd, shell=True)
print('vm creation for vm:', vm, returned_output.decode("utf-8"))
return True
except:
return False


def delete_vm(vm):
print('attempting to delete vm: ', vm['name'])
try:
cmd = "openstack server delete " + vm['name'] + " --wait --force"
returned_output = subprocess.check_output(cmd, shell=True)
print('vm deletion for vm:', vm, returned_output.decode("utf-8"))
return True
except:
return False
88 changes: 88 additions & 0 deletions experiment/trace-generation/trace-generator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
import math
import sys
import time
import uuid

import pandas as pd

from external import RequestsManager
from math_utils import pick_random

# 1 - normalized azure trace csv
# 2 - starting time
# 3 - end time
# 4 - max number of requests at a time
# 5 - max lifetime of a vm
# 6 - max vcpu of an instant

# Example:
# python3 trace-generator.py 4-min-test/nrl_azure_packing_2020_perc_55.csv 0.819502315018326 0.8208333 5 14.93 12
# csv file is generated for 4 minutes (0.00277778 days) between 0.819502315018326 and 0.8208333. max req. set to 5 and
# max lifetime is the duration of experiment.
# Lifetime max:
# -- 24hours: 89.6387860183604 (unscaled val. in the trace)
# -- 4 min: 14.93 (above linearly scaled down for 4)

nrl_trace_file = sys.argv[1]
t_start = float(sys.argv[2])
t_stop = float(sys.argv[3])

# max_rq_cnt = float(sys.argv[4])
# max_lft = float(sys.argv[5])
# max_vcpu_cnt = float(sys.argv[6])

print("nrl_trace_file: ", nrl_trace_file, " t_start: ", t_start, " t_stop: ", t_stop)

df = pd.read_csv(nrl_trace_file)
df = df[t_start <= df['time']]
df = df[df['time'] <= t_stop]

EXPERIMENT_UUID = uuid.uuid4()


def generate_rqs(rq_count, row, time, type, bucket):
for rq in range(rq_count):
lifetime = pick_random(dst=eval(row['lifetime_distribution'][0]))
vcpu = round(pick_random(dst=eval(row['vcpu_distribution'][0])))
if vcpu > 0:
bucket.append({
'name': 'VM-' + str(EXPERIMENT_UUID) + '-' + str(time) + '-' + type + '-' + str(rq),
'type': type,
'lifetime': lifetime,
'vcpu': vcpu
})


t_s = df['time'].values
os_manager = RequestsManager()
for idx, t in enumerate(t_s):
row = df.loc[df['time'] == t].to_dict('list')

vm_rqs = []
total_rq_cnt = row['request_count'][0]
# here rounding is upto us. we favour more evictable vms, assuming fine-grain trace analysis allows us to realize
# slightly larger evictable vm types.
reg_rq_cnt = round(
math.floor(total_rq_cnt * row['regular_vm_count'][0])
)
evct_rq_cnt = round(
math.ceil(total_rq_cnt * row['evictable_vm_count'][0])
)
if reg_rq_cnt > 0:
generate_rqs(rq_count=reg_rq_cnt, row=row, time=t, type='regular', bucket=vm_rqs)
if evct_rq_cnt > 0:
generate_rqs(rq_count=evct_rq_cnt, row=row, time=t, type='evictable', bucket=vm_rqs)

#print('row: ', row, 'rq: ', vm_rqs, 'total rq: ', total_rq_cnt, 'evct: ', evct_rq_cnt, ' reg: ', reg_rq_cnt)

os_manager.handle_expired_vms(clk=t)
os_manager.dispatch(vm_rqs=vm_rqs, clk=t)

if (idx + 1) < len(t_s):
t_to = t_s[idx + 1] - t
wait_for = t_to * (24 * 3600)
print("time: ", t, "total requested: ", len(vm_rqs), "waiting for: ", wait_for, "cls util: ",
os_manager.get_utilization())
time.sleep(wait_for)

os_manager.dump(file_path='./trace-emulation-report_' + str(time.time()) + '.csv')
1 change: 1 addition & 0 deletions experiment/trace-pre-process/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
secrets.txt
Loading