dynamic scope design #1517
Replies: 9 comments 19 replies
-
static and dynamic scopes can be mixedin summary: capa can do both static and dynamic analysis and reason about the results together. that is, it can express things like: "when I see this in the file and that in the API trace, then the sample must be able to do XYZ". discussion: we should find motivating examples of these rules; otherwise, the additional complexity of this approach may not be worth it. invocation of capa.exe (ideas):$ capa.exe /path/to/suspicious.exe # no dynamic analysis done here
$ capa.exe /path/to/suspicious.exe /path/to/suspicious.trace.json # existing trace file on disk
$ capa.exe /path/to/suspicious.exe http://cape.com/traces/suspicious.exe # fetch trace from CAPE API
$ capa.exe /path/to/suspicious.exe --submit-to-cape=true # submit to CAPE and fetch results not all of these are required, though they may be useful/convenient for our users. the local trace file on disk is an easy place to start. rule examplesthese rule examples should should both static and dynamic features in the same rule, and be used in a way that cannot be done when static and dynamic are separate. |
Beta Was this translation helpful? Give feedback.
-
static and dynamic scopes are separate and cannot be mixedin summary: static and dynamic analysis cannot be mixed. capa detects on startup which mode it's in and only does that thing. discussion: this is probably much easier to implement; however, do we miss out on crucial functionality here? invocation of capa.exe:$ capa.exe /path/to/supicious.exe
$ capa.exe /path/to/supicious.trace.json but note that these cannot be mixed. rule examples:static rule (existing format): rule:
meta:
name: hash data with CRC32
namespace: data-manipulation/checksum/crc32
scope: function
mbc:
- Data::Checksum::CRC32 [C0032.001]
features:
- or:
- and:
- mnemonic: shr
- or:
- number: 0xEDB88320
- bytes: 00 00 00 00 96 30 07 77 2C 61 0E EE BA 51 09 99 19 C4 6D 07 8F F4 6A 70 35 A5 63 E9 A3 95 64 9E = crc32_tab
- number: 8
- characteristic: nzxor
- and:
- number: 0x8320
- number: 0xEDB8
- characteristic: nzxor
- api: RtlComputeCrc32 dynamic rule (proposed format): rule:
meta:
name: hash data with CRC32
namespace: data-manipulation/checksum/crc32
scope: dynamic
mbc:
- Data::Checksum::CRC32 [C0032.001]
features:
- or:
- api: RtlComputeCrc32 note that we update question: can we somehow do this automatically, without changing another example: rule:
meta:
name: reference HTTP User-Agent string
namespace: communication/http
scope: function
features:
- or:
- substring: "Mozilla/5.0"
- substring: "like Gecko"
- api: urlmon.ObtainUserAgentString
- property/read: System.Net.HttpWebRequest::UserAgent rule:
meta:
name: reference HTTP User-Agent string
namespace: communication/http
scope: dynamic
features:
- or:
- substring: "Mozilla/5.0"
- substring: "like Gecko"
- api: urlmon.ObtainUserAgentString
- property/read: System.Net.HttpWebRequest::UserAgent in this example, all the features can probably be extracted by a dynamic trace. the translation from i imagine that |
Beta Was this translation helpful? Give feedback.
-
Dynamic featuresStatic features to support
Static features that are out of (dynamic) scope :)
Additional dynamic-only features
New features
|
Beta Was this translation helpful? Give feedback.
-
thread scope is a subscope of dynamic scopein summary: introduce this might enable us to better express different parts a program (threads) do a different things, like enumerate files, serve HTTP responses, etc. This maps nicely to how a programmer structures their code. in particular, it better supports other proposals like "sequence of API calls", since we want to see these in the context of a thread, and not interspersed with API calls from other threads. however, in practice, I wonder if this would ever actually be used. unless, example rules:proposed: rule:
meta:
name: attach user process memory
namespace: host-interaction/process/inject
scope: thread
features:
- and:
- api: ntoskrnl.KeStackAttachProcess
- api: ntoskrnl.KeUnstackDetachProcess even more experimental: rule:
meta:
name: attach user process memory
namespace: host-interaction/process/inject
scope: thread
features:
- and:
- sequence: # this doesn't exist today
- api: ntoskrnl.KeStackAttachProcess
- api: ntoskrnl.KeUnstackDetachProcess |
Beta Was this translation helpful? Give feedback.
-
Reuse existing rulesWhile trying to write the first example rules and exploring our current set I noticed the following:
|
Beta Was this translation helpful? Give feedback.
-
brainstorming some ideas here... why we're using dynamic scope:Thinking about the best possible design for this scope made me rethink whether it's the right approach; so, I am re-sharing the main motives for this design choice for both documentation purposes, as well as figuring out if the motives are indeed valid. The main reason that motivates this design choice for me is the fact that it requires minimal changes to the current capa code design, as well as the rule syntax. Most of the other alternative that come to mind involve making changes to the rule syntax in a way that might brick people's custom rules, which is something that's undesirable. Other reasons include: making it easier to integrate any future subscopes/parent-scopes into this logic, choosing whether to do dynamic and/or static analysis should be easier this way (filtering can be done by the pre-existing scope-filtering mechanisms). The two main proposed ways to integrate this dynamic scope are the following: 1. dynamic scope as a subscope of the file scope:pros
cons
2. dynamic scope as a parent or an equal of the file scope:I suggest either:
For both of these approaches, we would end up using the same features (as opposed to the first approach wherein that might not be the case necessarily). Therefore, settling for this approach would allow us to get to working on the dynamic extractor right away, while we discuss the remaining rule-writing side of things... pros
cons
implementationNaively, one straightforward way to implement this idea would be to just generate the RuleSet as capa does currently, and then pass that to the dynamic extractor, and all rules containing non-dynamic features would just end up being filtered out since those non-dynamic features were not detected. I believe this would have the benefit of supporting the variety between different extractors and the features they offer, since some might not extract api traces, while others might provide some useful instruction information in the future... Thoughts? |
Beta Was this translation helpful? Give feedback.
-
I wonder if we are mixing concepts and terminology here around "static vs dynamic" and "scopes". Are these definitely the same thing? Does it make sense to consider them separately? I'll propose here to separate the concepts. First, some definitions:
We are trying to add dynamic analysis support to capa. It's desirable to do it in a way that we can reuse the existing static analysis rules, because then we have less work to do. Also, if rules work for both static and dynamic analysis, rule authors can get a better value in writing each rule. To be clear, while this is very nice to have, we should be wary of taking shortcuts that save us a little time now and cost us a lot of time later, so we should be open to updating all the rules if we absolutely have to. In many of the discussions so far, we've talked about adding a "dynamic scope". I think this stems from the idea that we need a place in the rule format to mark if a rule works during static analysis, dynamic analysis, or both, and the I think this might be a mistake and we should separate scopes from analysis ... context/flavor/mechanism ("static" or "dynamic"). Let's call this "analysis flavor" for now, and it includes "static analysis flavor" and "dynamic analysis flavor". Scopes should describe how features are collected together and matched. Not all features are available at all scopes. For example, section names are not available at instruction scope, naturally. Analysis flavors should also describe when a rule can be applied: during static analysis, dynamic analysis, or both. Some features will only work in one flavor; for example, instruction mnemonics will not work in dynamic analysis flavor, because a full instruction trace is not expected to be available. Likewise, an ordered sequence of API calls is not available in the static analysis flavor (today, though maybe it's a good idea for future research). Some features will work in both flavors; for example, an API call can be extracted at both the disassembly and sandbox trace levels. But scopes and analysis flavors should not be the same thing. A rule must be evaluated with a scope, and I think that scope can depend on which analysis flavor is in play, but the scope is not the same thing as the analysis flavor. In the most explicit world, we might have two rules for creating a file:
But obviously there's a lot of repetition, so perhaps we could do something like: rule:
meta:
name: create file
scope:
- static: function
- dynamic: thread
features:
or:
api: CreateFile
api: fopen Which is pretty nice. We might also support things like Now, because we want to support a bunch of existing rules, we could provide some built-in shortcuts and logic, such as:
Then, the following would be equivalent:
We could either build this logic into capa, or do a one-time automated update to the rules using the translations. In the former, rule authors don't have to learn anything new. In the latter, rules are more explicit and dynamic analysis flavor is more of a first-class citizen. With all this in mind, I'd propose that there are two flavors: static and dynamic. And when analyzing with a flavor, there's a scope in play, which include:
We'll need to create a multi-dimensional table that describes which features are available in each flavor and scope, perhaps like:
This proposal doesn't initially address if rules can match across flavors, but my intuition is that this isn't supported without additional research and design. |
Beta Was this translation helpful? Give feedback.
-
I am really excited to hear that capa is adding the capacity to work with dynamic traces! I think dealing with API traces generated from a sandbox is a great start since it will provide more info than static analysis. I think it would also be possible to run the feature extractor at instruction level on an execution trace recorded by tools like Windbg TTD/Reven/RR/Undo, etc. The immediate benefit is to see some behavior that we cannot see if the relevant code is encrypted. For example, if the sample uses a cyrpto function, but the code is encrypted, capa would not be able to see it directly. Besides, we can also see the concrete register/memory value at any time, this makes it possible to not only detect certain operations, but also obtain the data it actually operated on. There would also be some challenges. For example, the trace can be quite long even for a few seconds of execution. Also, the boundary of function is not as clear when it comes to an execution trace. However, from my own experience of working with execution traces, despite the whole trace being very long, the unique instructions/basic blocks are pretty manageable. |
Beta Was this translation helpful? Give feedback.
-
I am sharing some more thoughts I had when drafting a dynamic extractor. First of all, I thought I'd re-summarize what our design-goals are:
Having agreed that these are the main goals, here's what I think would work best for feature extraction: 1. Scopes:First of all, I believe that the scoping should be the following:
This hierarchy introduces 2 new scopes: Process and Thread. This choice was mainly inspired by the analysis traces I saw thus far:
2. RuleSet construction:According to the described hierarchy above, I believe that it should be possible to use the same RuleSet for both static and dynamic extractors (comment?), with out-of-context rules being ignored by the extractor (static extractors would ignore dynamic-only rule). Additionally however, we probably would implement some mechanism to optimize this by filtering out out-of-context rules (by maintaining a list of supported features for each extractor for example, i.e. any.run doesn't support api traces). Furthermore, following the design that has been proposed thus far should make it easier to add new features to extractors as they are introduced, and it should also make it possible to make sandbox-agnostic rules, which should make it possible to support newer sandboxes as they implement more functionality (support any.run when it introduces api traces, or CAPE when it adds origin pid/tid data to each file). 3. Implementation:The way I am currently doing things is similar to static extractors: extract all processes from a trace, for each process extract all file features and threads, for each thread extract all thread features (haven't determined these yet) and function features; function features include all sub-function features (such as api calls) as well as calls-to and from features. |
Beta Was this translation helpful? Give feedback.
-
following the discussion in #1516, lets sketch out the various designs for dynamic scope, including how it looks to invoke capa.exe, what rules look like, and how the results might be rendered.
considerations include:
–do-dynamic-analysis=true
submits file to CAPE API and fetches results when ready, and/or fetching existing resultsBeta Was this translation helpful? Give feedback.
All reactions