Skip to content

Releases: UCBerkeleySETI/turbo_seti

New plotSETI parameter: --h5dat_lists for pre-generated/edited lists of h5 files and dat files

25 May 21:16
7d9b4fd
Compare
Choose a tag to compare

Internally, plotSETI uses one text-file-resident list of h5 files and another for the corresponding dat files. The list of h5 files is formatted as a text file like this:

/home/giraffe/BASIS/seti_data/voyager_2020/h5_dir/single_coarse_guppi_59046_80036_DIAG_VOYAGER-1_0011.rawspec.0000.h5
/home/giraffe/BASIS/seti_data/voyager_2020/h5_dir/single_coarse_guppi_59046_80354_DIAG_VOYAGER-1_0012.rawspec.0000.h5
/home/giraffe/BASIS/seti_data/voyager_2020/h5_dir/single_coarse_guppi_59046_80672_DIAG_VOYAGER-1_0013.rawspec.0000.h5
/home/giraffe/BASIS/seti_data/voyager_2020/h5_dir/single_coarse_guppi_59046_80989_DIAG_VOYAGER-1_0014.rawspec.0000.h5
/home/giraffe/BASIS/seti_data/voyager_2020/h5_dir/single_coarse_guppi_59046_81310_DIAG_VOYAGER-1_0015.rawspec.0000.h5
/home/giraffe/BASIS/seti_data/voyager_2020/h5_dir/single_coarse_guppi_59046_81628_DIAG_VOYAGER-1_0016.rawspec.0000.h5

The list of corresponding dat files is formatted in the same manner.

Normally, both lists are generated internally by plotSETI and are never seen by the user. However, it has been proposed that in some circumstances, the lists should be prepared by the user. So, if parameter --h5dat_lists (NEW!) is set to 2 file paths (one text file for h5s, one text file for dats), then those list files should be used instead of autogeneration. User-supplied lists will be:

  • Checked for existence and consistency.
  • Used for internal list processing.

E.g. plotSETI --h5dat_lists /dir_a/list_h5_files.txt /b/list_h5_files.txt --out_dir ..... tells plotSETI that there exists a list of h5 files in /dir_a/list_h5_files.txt and a list of dat files in /dir_b/list_dat_files.txt.

If --h5dat_lists is absent (default i.e. most common usage), plotSETI will internally generate the 2 list files as it has been doing in the past.

Add diagnostics when plotSETI detects mismatch of h5 and dat files

25 May 13:43
a7ee618
Compare
Choose a tag to compare

The following circumstances are anomalous:

  • No h5 files are found.
  • No dat files are found.
  • The number of h5 files != the number of dat files.

They are now explicitly diagnosed to prevent confusion.

New utility for showing the differences between 2 dat files

16 May 20:54
f53f7bb
Compare
Choose a tag to compare

The new utility (dat_diff) executes the overall comparison as 2 independent processes in succession:

  • For each entry in dat file dat1, look for a match in dat file dat2.
  • For each entry in dat file dat2, look for a match in dat file dat1.

Given 2 dat file entries, the comparison is performed using the following data elements:

  • Coarse channel number (exact match)
  • Frequency (within rtol) where rtol is the {math,numpy}.isclose() value (E.g. 0.0001 which signifies 0.01%)
  • Drift rate (within rtol)

Correct duplicate hits design

07 May 23:25
e74830d
Compare
Choose a tag to compare

The tophitsearch code, considers each candidate hit, and checks over a window of frequencies nearby that hit. If there is another larger hit, it doesn't report this one. This is to prevent reporting a single signal as multiple hits.

There was a small bug in this logic. Previously, the index of the edge of the window was calculated as

i - obs_length*max_drift/2
If you just check the units, obs_length is measured in seconds, max_drift is measured in Hz/s, so obs_length * max_drift has units of Hz, and we were subtracting it from i which is a unitless index. So, this is basically just a meaningless calculation.

Also it should be multiplying by 2 instead of adding by 2, because the two signals could be moving toward each other. These two bugs were roughly canceling each other out, so for Green Bank data for example, we were deduplicating over a window of radius 58 when it should have been a window of radius 80. Not too big a difference, and this fix won't change very much in practice, but it's better to be using the right calculation here.

Correct drift rate calculation in find_doppler.py

05 May 22:07
023a212
Compare
Choose a tag to compare

There was an off-by-one error when calculating the resolution of the drift rate. You don't want to divide by number of timesteps. Instead, you want to divide by (number of timesteps - 1). Think of the line as being between the centroid of a bin in the first and last row, rather than the very start of the first row and the end of the last row.

@lacker discussed this in the iseti meeting of 5/4/2022 and also with @stevecroft on the previous day. There is a general astronomer-consensus that this fix is an improvement.

GPU Performance Improvement

04 Apr 23:52
0fc6e73
Compare
Choose a tag to compare

This release replaces the flt function with a new implementation when turbo_seti is running in GPU mode. Thanks to Franklin Antonio (@fantonio2 on github) for his code at https://github.com/UCBerkeleySETI/dedopplerperf/blob/main/CudaTaylor5demo.cu; these turbo_seti changes are based on that. Kevin Lacker (@lacker on github) used a C++ template to handle multiple float types and other miscellaneous amendments.

Note that some of the surrounding code is refactored because the CPU implementation of flt stores rows of the output by using a bit reversal technique,. The GPU implementation doesn't so the format is slightly different.

This speeds up the flt function by a factor of 5x or so and that was previously around 30% of the time spent by turbo_seti in the search_coarse_channel function. Overall this change seems to provide a ~15% performance improvement.

Note that the output of the search_coarse_channel function is unchanged. This is purely a performance change when running in GPU mode.

Profiling before this change:
https://bldata.berkeley.edu/pipeline/tmp/turboseti_profile.svg

Profiling after this change:
https://bldata.berkeley.edu/pipeline/tmp/new_turboseti_profile.svg

Plot Event Improvements

30 Mar 22:19
a54d8c0
Compare
Choose a tag to compare

Some of the turbo_seti plot_event.py code was fixed in a couple of ways:

  • When using a plot offset (red barbell) with the red guideline, it was not being placed correctly.
  • Some of the code was extraneous which could be quite confusing.

Using Filter Parameters after turboSETI Completes

27 Mar 20:06
d0065a7
Compare
Choose a tag to compare

Sometimes, when running turboSETI (or the FindDoppler Python class object), one or more of the 3 filtering parameter values (minimum drift rate, maximum drift rate, and minimum SNR) are guessed, misspecified, or omitted. It is desirable to have a second chance at filtering out dedoppler top hits that are not interesting for analysis (E.g. RFI). Also, this will reduce the number of plots (PNG files) produced which then need to be pruned manually.

This release of turbo_seti adds 2 courses of action that can be taken after the turboSETI execution:

  • With a new utility (dat_filter), apply one or more of the 3 above filtering parameters to permanently update the DAT file produced by turboSETI.
  • In the plotSETI program or through the use of the find_event_pipeline API, specify values for one or more of the 3 filtering parameters. Note that in this case, the DAT file is not updated.

For example, suppose turboSETI has produced xx.dat from xx.h5 with drift rates varying from -0.5 to 0.5. All of the SNR values are acceptable but we'd like to avoid signals with drift rate absolute values below 0.1 and above 0.4. Then, the following dat_filter execution will permanently purge the signals near 0 drift rates:

dat_filter -m 0.1 -M 0.4 xx.dat

Alternatively, without modifying xx.dat, we could use plotSETI with new parameters instead. Assume that both xx.h5 and xx.dat are in the same directory abc. Then, the following execution will do event analysis and produce plots without permanently purging the signals near 0 drift rates from xx.dat:

plotSETI -m 0.1 -M 0.4 abc

The latter might be a useful tool for experimentation.

Correct drift rate resolution calculation

23 Feb 19:15
0b86826
Compare
Choose a tag to compare

When the number of time integrations is not a power of 2, the next power of 2 was used for the number of time integrations in the drift rate resolution calculation (data_handler.py DATAH instantiation). As a result, this threw off the drift rate calculations in the doppler search main loop (find_doppler.py).

Visible symptom: the red-colored fit lines of the plots were not falling on the signal in the waterfall plots of the signal candidates.

Enhance logging information

07 Feb 17:04
4334d7e
Compare
Choose a tag to compare
  • Show versions of hdf5plugin and the HDF5 library.

  • Enable the display of HDF5 library error messages which are inhibited by default.