Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Options to skip message headers between packets #74

Open
tloubrieu-jpl opened this issue Jun 27, 2023 · 7 comments
Open

Options to skip message headers between packets #74

tloubrieu-jpl opened this issue Jun 27, 2023 · 7 comments

Comments

@tloubrieu-jpl
Copy link

tloubrieu-jpl commented Jun 27, 2023

For Europa-Clipper test files, we're having packets separated by specific pieces of binary stream.

See spreadsheet https://docs.google.com/spreadsheets/d/1YX-_zw9tdEwkYxdQ5IA8tClHnzwaNehGvE04JfFdY2A/edit#gid=0

We have 3 cases:

  1. standard packet sequence
  2. fixed length header
  3. packet start marker

I propose:

  • case 2: skip a fixed number of bits in between packets
  • case 3: specify the start marker with as a callable taking as parameters n bits of the stream. The callable will be callable at every position in the stream. A packet will start a the position after whenever the callable return True (in our case the function works on 4 bytes and returns seq[1] == 0xF0 and seq[0] == seq[2] == seq[3] != 0x00)

I am attaching 2 test files for case 2 and 3:
case-2.bin.zip
case-3.bin.zip

@ddasilva
Copy link
Collaborator

Thanks for sharing this. We can definitely add a packet_spacing=N option to load() and related functions which will skip N bytes between packets to address case 2. Do we need to support skipping a number of bits that isn't divisible by 8? I believe we discussed supporting arbitrary spacing between packets to address outer frames attached to the packets, which generally come in sets of complete bytes.

To address case 3, I will think about the best way to support this. How are you doing this currently? We could maybe implement this with a keyword argument such as:

def locator_fun(byte_window):
    return byte_window[1] == 0xF0 and byte_window[0] != 0 and byte_window[2] != 0 and byte_window[2] != 0

pkt = VariableLength([...])
result = pkt.load('telemetry.bin', packet_locator=PacketLocator(locator_fun, window=4))

@tloubrieu-jpl
Copy link
Author

Thanks @ddasilva for your quick feedback.

We can do the packet_spacing in bytes, in our case it is 32 bits, so that will work in bytes.

I like your proposal for the packet_locator.

Thanks again,

@ddasilva
Copy link
Collaborator

Adding a note here for myself. I think in addition to packet_spacing=N we should support skip_extra_headers=N. The difference is an extra header is before each packet (so, the file starts with one), where as a spacing is between each packet (in which case, the file doesn't start with one).

@ehsteve
Copy link
Member

ehsteve commented Jul 7, 2023

@tloubrieu-jpl is this compatible with the ccsds standard? Are those messages also ccsds packets?

Like my comment in the other issue, I'd like us to maintain the ability to document packets non programmatically. If these messages always follow another packet, could they just defined as extra fields in the previous packet? Or we could define a new SPACER field.

@tloubrieu-jpl
Copy link
Author

@ehsteve , I don't think that is strictly speaking CCSDS. This is how our Science Data System receives the CCSDS packets embedded in other messages following a specific syntax. And specifically for test datasets, the source of the CCSDS packets differ and so does their wrapping.

Now that I think of it more, that would be an acceptable answer to say that this is outside the scope of CCSDSpy and that a preprocessing need to be done to strip the packets off their non CCSDS wrappers.

@ehsteve
Copy link
Member

ehsteve commented Jul 11, 2023

Thanks @tloubrieu-jpl. I wonder what @ddasilva thinks of this CCSDS strictness? I worry that if we start allowing for all kinds of packet shenanigans (1) this code base will become a giant mess as binary data can be organized in an infinite amount of ways and (2) we weaken the CCSDS standard. I actually wish the standard was more developed and stricter in some ways but I understand that others may disagree!

@ddasilva
Copy link
Collaborator

I'll have to think about this more deeply, but for this ticket I think we can conclude that we'll not implement packet spacing skipping right now for this ticket since @tloubrieu-jpl seems satisfied. But here are my thoughts.

On one hand, I think about the pandas pd.read_csv() function, which is incredibly useful and has an option for almost everything I'd ever need. It's probably a mess to maintain, but from my point of view, if I get data in a weird CSV format, it's most probably a lost cause to try to get the upstream provider to "fix" their weird CSV format. That's where pd.read_csv() comes in to save me. Pretty much everything pd.read_csv() does I could re-implement myself on a case by case basis, but in the end it's a matter of whether I spend 10 seconds or an hour on the same problem.

One the other hand, the code has the ability to become very complex if we implement a lot of different "helpful options" as keyword arguments to a single function. We might be able to manage this complexity by making functions like strip_spacing() that work together like unix command line tools. We would then be requiring someone to call pkt.load(strip_spacing(bytes_io, 10)).

At the end of the day, what I think this library does for most people is make it so they have to think less about bit manipulation and more about the problem they're trying to solve. If something like strip_spacing(bytes_io, 10) saves a lot of people time and avoids context switching to bit manipulation, I think it makes sense for the project. I think it's pretty unlikely that we'll be feeding the beast of poorly prepared CCSDS data, because in reality, that ship tends to sail for other reasons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants