Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom value types #62

Open
msuhanov opened this issue Apr 26, 2016 · 14 comments
Open

Custom value types #62

msuhanov opened this issue Apr 26, 2016 · 14 comments

Comments

@msuhanov
Copy link

msuhanov commented Apr 26, 2016

Hello.

At present, the data_type() method of the VKRecord class applies the DEVPROP_MASK_TYPE (0x00000FFF) bit mask to a registry value type using the AND operation, and thus clears bits #12-#31 in this value type. I know that the purpose of this operation is to extract the DEVPROP_TYPE_FILETIME (0x00000010) type (you call it RegFileTime), but there is a major issue here:

  • the Windows kernel is using 12 predefined value types in the registry subsystem,
  • other value types are not predefined and can be specific to a kernel subsystem or a driver (e.g. DEVPROP_TYPE_FILETIME),
  • not all kernel subsystems and drivers are expected to use the DEVPROP_MASK_TYPE bit mask when parsing a value type.

Based on this, the library can't silently clear the bits in a value type. For example, the PnP subsystem is using the DEVPROP_TYPE_STRING_LIST (0x00002012) type to store device location paths in the registry (this type will be converted to following value: 0xFFFF2012). Reading such a type with python-registry will return 18 (0x12), or DEVPROP_TYPE_STRING, if we resolve this constant in the context of device properties, but this value is wrong, because it doesn't match the value stored by the subsystem.

@williballenthin
Copy link
Owner

Good catch.

I see that there are two modifiers that we should be taking into account:

#define DEVPROP_TYPEMOD_ARRAY                   0x00001000
#define DEVPROP_TYPEMOD_LIST                    0x00002000

src

I'm not quite sure how these affect parsing yet, but I'm sure we can work that out to support lists and arrays of the other basic types (int, string, timestamp, etc).

To do this, we'd also need to check the "type modifier" by using DEVPROP_MASK_TYPEMOD:

#define DEVPROP_MASK_TYPEMOD                    0x0000F000

With this approach, we'd be inspecting the lower two bytes of the DWORD. Do you know what is placed in the higher two bytes, and if we can parse it? Are 0xFFFF0012 and 0x00000012 both DEVPROP_TYPE_FILETIME? If so, this is an easy fix. If not, how do we resolve the type?

In any case, I suggest that we update VKRecord.data_type() to at least use the mask 0xFFFF (which is DEVPROP_MASK_TYPEMOD | DEVPROP_MASK_TYPE), and potentially returning the unmasked value (if there's no other option).

@msuhanov
Copy link
Author

Do you know what is placed in the higher two bytes, and if we can parse it?

On Windows 10, when storing device properties, these bytes are always set to 0xFF. But, as I said before, the higher two bytes in other keys can be used by something else.

@williballenthin
Copy link
Owner

williballenthin commented Apr 28, 2016

Ok, then I propose updating VKRecord.data(), VKRecord.data_type() and VKRecord.data_type_str() to handle the common case by masking the type down to the lower two bytes (using DEVPROP_MASK_TYPEMOD | DEVPROP_MASK_TYPE), and adding a method VKRecord.raw_data_type() that returns the unmasked data type.

The benefit is that this allows an advanced user to parse unusual structures manually using VKRecord.raw_data_type() and VKRecord.raw_data(), while keeping the common case very simple (if vk.data_type == RegMultiSZ: ...).

An issue is that most developers won't realize this could happen, and will assume that VKRecord.data_type() returns perfectly correct data. Clear documentation on these methods will help, but won't totally remove this issue.

I'm trying to balance ease-of-use of the module with providing the most correct data. As a developer, what would you most like to see?

@msuhanov
Copy link
Author

Sounds good.

As a developer, what would you most like to see?

I think that the best solution is to introduce the data_type_devprop() method and make the data_type() method return a raw value type. Any other custom value types will have the corresponding data_type_* methods. But this breaks backward compatibility.

@geoffblack
Copy link

+1 on the issue - I just ran into 0x00002012 today (along with 0x00000011 and 0x00000012, which python-registry does not handle).

@williballenthin
Copy link
Owner

@geoffblack having read the above discussion, do you have any opinions on how the API could be enhanced?

@msuhanov
Copy link
Author

An application (or a driver) can specify any value type possible. You may also find this post interesting.

@geoffblack
Copy link

I think introducing separate methods per data type set is the wrong thing to do, and not just for backwards compatibility. In the short term, I like fixing the mask and adding raw_data_type() - it gets what I need as the user to do my own decoding with raw_data(), and it's a quick fix.

In the long term, you could build profiles of data type definitions, where data() takes a parameter for the data type set. The default would be standard registry data types already included in python-registry, but you could add other data type lists like devprop_type* from devpropdef.h over time, and the necessary code to support decoding those data types. The user would still have to know which data type set to use in certain areas of the registry, but it would make it easier for the user.

@geoffblack
Copy link

@williballenthin I've just read through raw_data() and realized your implementation there is problematic for this, as well. Take, for example, the case of RegMultiSZ (0x07). In the DevPropType list 0x00000007 is a uint32. I have not tested this, but there's a case in there where you're not actually returning the raw data (!!!), but instead returning and empty byte string because, "it must be 4, and be composed of completely \x00, so the strings are empty." So, I'm not sure raw_data_type() is a quick fix if raw_data() can't be relied upon with non-default data types.

@geoffblack
Copy link

Just to follow up - I do indeed have a registry value with 4 bytes of data where raw_data() returns an empty byte string due to using the wrong data type as described above.

@williballenthin
Copy link
Owner

acknowledged. sounds like i need to sit down one weekend and implement some of these approaches to figure out what works best.

is there any chance you can share the hive? i understand if you cannot.

@geoffblack
Copy link

You can find samples of these values in the SANS Donald Blake Windows 8.1 image in various keys under Control\DeviceContainers. If you don't have that lying around, shoot me an email.

@williballenthin
Copy link
Owner

thanks!

@msuhanov
Copy link
Author

Is it possible to get the image? Not just registry files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants