Skip to content

Tutorial: install and maintain cheta telemetry archive

Tom Aldcroft edited this page Jan 3, 2022 · 34 revisions

This tutorial will guide you through the steps of installing a subset or all of the cheta (AKA Ska engineering archive) data archive to a local directory on a standalone laptop or desktop, followed by instructions on how to keep your cheta archive up to date.

Cheta vs. Ska engineering archive

The Ska engineering archive is being re-branded to cheta to make it easier to import and easier to say. This is not an acronym, though you can think "Chandra engineering telemetry archive" if that makes you happier. Mostly cheta means it is fast. In any code you can replace Ska.engarchive with cheta (but the original will always work):

from cheta import fetch_eng as fetch

A quick word on sizes and speeds

The data files that comprise the cheta archive take around 200 Gb circa late-2019. While this is large, it is within the storage capability of modern laptops.

In contrast, network speeds can present a limitation for initial syncing: at 1 Mb/sec the whole archive would take 2.5 days, but if 10 Mb/sec can be sustained it takes only about 6 hours. VPN servers can limit bandwidth substantially, so it is important to have an estimate of your transfer speed prior to diving in to copy the whole archive.

Using my cheap Seagate external 2Tb drive with USB 3.0, I see transfer speeds of 80-90 Mb/s, which translates to copying the entire archive in a bit less than an hour.

Setup: installing cheta 4.47.3

At this time the cheta archive package is available in the Ska3 flight environment, so no independent installation is necessary.

FOT MATLAB tools users

To initialize python in MATLAB, type the following command at the MATLAB command prompt:

pyexec('update_path=True')

For syncing to work correctly, you will need to manually create the folder in which you plan to store the data. By default, the Python tools will look for the folder:

%SKA%\Ska_data\data\eng_archive

where %SKA% is a windows environment variable set for you by the MATLAB FOT Tools. You can check what this is using the following command in the MATLAB Command Window:

getenv('SKA')

If the eng_archive directory doesn't exist on your machine, you will need to create it and then restart MATLAB before continuing, or manually specify a different directory using the --data-root flag discussed later in this tutorial.

MacOSX and linux users

A prerequisite for all of this is to have a standalone Ska3 environment installed and running on your machine, with a $SKA environment variable defined and pointing to a directory with a data/ subdirectory. This is covered in the Ska3 runtime environment for users wiki.

Now we will make a new conda environment just for doing cheta archive maintenance. This shows a good practice for doing development / experimental package updates with Ska3: leave your "flight Ska3" in a clean state corresponding to the most recent official (tested!) release, and check out new packages in a clone of the flight environment:

conda create --clone=ska3 --name ska3-cheta
source activate ska3-cheta

Now we will put the new version of cheta into our new environment by installing with pip directly from a branch/tag on GitHub. The old versions of the conda package manager and the pip installer that we use are not quite compatible, so we need to first uninstall the Ska.engarchive conda package and then install the new version with pip:

conda uninstall --force ska.engarchive
pip install --egg --no-deps --ignore-installed git+https://github.com/sot/[email protected]

Installing some or all of the cheta archive

In the examples you will see commands beginning with !. For Windows users you must type that ! symbol. For MacOS/linux users, do NOT type the ! symbol (instead pretend that is your command prompt).

First, let's confirm that you have the right version of cheta installed:

! python -m cheta.update_client_archive --version
update_client_archive.py 4.47.3

For this tutorial you need to be on a network that can see the ICXC web site. To test, try loading https://icxc.cfa.harvard.edu/.

Using a test data archive for this tutorial

In the commands below, you will see something like [--data-root=.]. If you already have a local copy of the cheta archive on your laptop and want to do the tutorial "on the side", then include that option to instruct the commands to store data in that root directory.

Most people can skip this, in which case the new data will be added in the standard location which you can discover with:

! python -c "from cheta import fetch; print(fetch.msid_files.basedir)"
/Users/aldcroft/ska/data/eng_archive

File of MSIDs to copy

The list of MSIDs that you want to copy is provided to the cheta_sync tool in a file. There are three related ways to select MSIDs:

  1. MSIDs that match the name or pattern are included, for example aopcadmd or aacccd*. Note that case does not matter.
  2. MSIDs with the same subsystem and sampling rate as given MSIDs are included. For example: */1wrat gives all ACIS engineering telemetry nominally sampled at 16.4 sec, while */aopcadmd gives all PCAD telemetry at 1.025 sampling.
  3. MSIDs with the same subsystem regardless of sampling rate, for example **/3tscpos gives all engineering SIM telemetry, while **/aopcadmd gives all PCAD telemetry (which is more than 100 Gb, see Appendix: archive content sizes). So in your favorite editor create a file named msid_specs and enter the following:
aacccd*
# aopcadmd (275 Mb for just this MSID)
# */1wrat (205 Mb for all 16.4 sec ACIS engineering telemetry)

Dry-run using remote server

Now we're finally ready for the copy, but let's do a dry-run to see what it will copy. Instead of ccosmos you can use chimchim.

WINDOWS Users: you need to include the final & shown the examples to allow typing your password. You will also need to set the --server-data-root to chimchim

! python -m cheta.update_client_archive --add-msids=msid_specs [--data-root=.] \
  --server-data-root=<username>@ccosmos.cfa.harvard.edu --dry-run [&]

Copy data from remote server

So let's do it:

! python -m cheta.update_client_archive --add-msids=msid_specs [--data-root=.] \
  --server-data-root=<username>@ccosmos.cfa.harvard.edu [&]

Force archive to be out of date

The next step is bringing your local cheta archive up to date with the server version. However, in this tutorial you have already synced with the latest available data, so doing an update will not actually do anything. So we will apply a command to truncate 7 days of data from the local archive to force it to be out of date. First, if you are installing to the standard cheta archive location (instead of --data-root=.) then use this command to discover that directory name:

! python -c "from cheta import fetch; print(fetch.msid_files.basedir)"
/Users/aldcroft/ska/data/eng_archive

Now do the actual truncate command, first doing the --dry-run option:

! python -m cheta.update_archive --content=pcad5eng --data-root=<previous_output OR .> \
  --truncate=-7 --dry-run

Note that this command is slightly dangerous so the default for --data-root is . to prevent accidentally wiping out your local Ska cheta archive. If you do not supply the --content option then the entire archive will be truncated.

Note also that if your cheta archive gets corrupted during an update (e.g. power loss) then truncating the archive to a time before the update will often fix things.

Check out

First, get into Ska3 ipython.

For MATLAB FOT Tools users, this means executing the following command from the MATLAB Command Window:

starting_dir = cd(get_python_install_dir()); system('start_ipython.exe&'); cd(starting_dir);

Once you have ipython open you can test out the following python code:

import os
# set the 'SKA' environment variable to the absolute path
# of your --data-root.  The folder structure should look like:
#
#   [data_root]/data/eng_archive/data/[cheta data]
#
# where you set 'SKA' or --data-root to be [data_root]
#
os.environ['SKA'] = os.path.abspath('.')

# Now import fetch and get some data
from cheta import fetch_eng as fetch

# Print location we are fetching from
print(fetch.msid_files.basedir)

%matplotlib
dat = fetch.Msid('aacccdpt', -14)  # 14 days before now
dat.plot()

# Print the available time range
fetch.get_time_range('aacccdpt', 'fits')

Maintaining your local cheta archive

The cheta server archive on the HEAD network (in /proj/sot/ska/data/eng_archive) is updated every morning by 9am Eastern local. In order to get your local cheta archive sync'ed to the primary server version you simply run this command:

! python -m cheta.update_client_archive [--data-root=.]

This command has plenty of options (see --help) but most users will never need them.

Now go back and do the steps in the previous Check out section to prove that it worked.

Performance

On a fast network with an solid state hard drive, you can do a daily update of the entire cheta archive in about 6 minutes. You can catch up a month of data in about an hour or two. With a slower network or slower hard drive it will take longer, with the hard drive speed being generally the more important factor.

Cron jobs

We'll talk about this in a future session! For now run it by hand as needed.

Keeping up to date and what to do if you forget

The cheta sync archive keeps the last 60 days of updates. If you wait longer than that you will get a message like below when updating:

ERROR: unexpected discontinuity for full msid=1DEICACU content=acis2eng
Looks like your archive is in a bad state, CONTACT your local Ska expert with this info:
  First row0 in new data 19557749 != length of existing data 19406329

The way to recover from this is by using the rsync command to refresh your archive:

### Using kady for the data (SOT) ###
rsync -av --existing <user>@kady:/proj/sot/ska/data/eng_archive/data/ <local_SKA>/data/eng_archive/data/
rm <local_SKA>/data/eng_archive/data/*/5min/last_date_id
rm <local_SKA>/data/eng_archive/data/*/daily/last_date_id

### Using GRETA for the data (FOT) ###
rsync -av --existing <user>@cheru:/proj/sot/ska/data/eng_archive/data/ <local_SKA>/data/eng_archive/data/
# NOTE: the "rm" commands should not be necessary if you sync from cheru

On Windows machines, MATLAB FOT_Tools has an rysnc executable available in the directory FOT_Tools\local\tools\

Appendix: approximate sizes of cheta archive content types

Circa 2021 here were the directory file sizes for each content type.

953M	acis2eng
224M	acis3eng
235M	acis4eng
1.0G	acisdeahk
762M	angleephem
243M	ccdm10eng
2.6G	ccdm11eng
748M	ccdm12eng
380M	ccdm13eng
 11M	ccdm14eng
 19M	ccdm15eng
463M	ccdm1eng
399M	ccdm2eng
139M	ccdm3eng
4.4G	ccdm4eng
407M	ccdm5eng
575M	ccdm7eng
639M	ccdm8eng
510M	cpe1eng
273M	dp_acispow128
956M	dp_eps16
164M	dp_eps8
400M	dp_orbit1280
626M	dp_pcad1
450M	dp_pcad16
1.0G	dp_pcad32
6.2G	dp_pcad4
 18G	dp_thermal1
1.1G	dp_thermal128
437M	ephhk
 57M	ephin1eng
 88M	ephin2eng
2.9G	eps10eng
750M	eps1eng
217M	eps2eng
706M	eps3eng
200M	eps4eng
118M	eps5eng
691M	eps6eng
486M	eps7eng
5.3G	eps9eng
179M	hrc0hk
388M	hrc0ss
122M	hrc2eng
392M	hrc4eng
114M	hrc5eng
5.0M	hrc7eng
1.6G	lunarephem0
675M	lunarephem1
263M	misc1eng
357M	misc2eng
126M	misc3eng
617M	misc4eng
 85M	misc5eng
112M	misc6eng
242M	misc7eng
897M	misc8eng
249M	obc3eng
1.9G	obc4eng
353M	obc5eng
1.7G	orbitephem0
709M	orbitephem1
144M	pcad10eng
 76M	pcad11eng
656M	pcad12eng
952M	pcad13eng
264M	pcad14eng
7.7G	pcad15eng
 62G	pcad3eng
397M	pcad4eng
881M	pcad5eng
758M	pcad6eng
 21G	pcad7eng
 28G	pcad8eng
8.6M	pcad9eng
1.2G	prop1eng
582M	prop2eng
116M	sim1eng
 57M	sim21eng
 57M	sim2eng
315M	sim3eng
 92M	sim_mrg
 61M	simcoor
125M	simdiag
186M	sms1eng
 75M	sms2eng
1.5G	solarephem0
604M	solarephem1
572M	tel1eng
259M	tel2eng
589M	tel3eng
764M	thm1eng
176M	thm2eng
186M	thm3eng