Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preloading Modules / Dockerization of FAME / Dockerception (combined PR of all changes, do not merge!) #67

Open
wants to merge 54 commits into
base: master
Choose a base branch
from

Conversation

lucebac
Copy link
Contributor

@lucebac lucebac commented Jul 24, 2019

EDIT: All changes proposed and committed in this pull request will be re-submitted one by one to make reviewing/changing easier for all of us. Please do not merge! The new PR(s) are:


We needed to dockerize FAME for our internal use. Fortunately, we can share these changes with the FAME community and that's what this PR is meant for.

In addition to just dockerizing FAME there were also changes to the core of FAME that were necessary for the dockerization. These changes allowed us to move certain things around and, from our perspective, improved the general code base by e.g. separating webserver related code from worker code and the like.

Disclaimer

We tried to change the code of FAME in a way such that it can still be run without the need of Docker. Nevertheless, we are not able to properly test upgrading older setups to the new codebase because our setup was already built with the dockerized version of FAME. We appreciate any volunteers who are able to provide some testing with upgrading to the new codebase!

Changes (Summary)

  • create docker containers for FAME web and FAME workers and create corresponding requirements.txt files for python
  • create a new worker task that pulls/updates the module repositories and PUTs the updated code back to the webserver and thus
  • remove the dependency of having to run a fame worker next to the fame webserver
  • the git pull/update logic was also improved by then and was made more error tolerant
  • improved the installation of module requirements in the worker
  • fixed a content-type bug where unzipping data might not work under certain circumstances
  • introduced a new type of module: PreloadingModule; these modules allow for using more than just the VirusTotal downloader if only a hash for a sample is known (might be the case if one's company maintains a custom, internal virus database like GDATA does); please refer to the documentation for more details (docs/modules.rst)

Known Issues

Currently, there are still some known issues or possible issues that might occur (especially with the new docker-based community modules) which have not been resolved at the point of creating this PR. Nevertheless, we are working towards fixing those and will provide fixes within the next couple of days (or 1-2 weeks, depending on internal workload).

  • (done) docker-based modules cannot be used with dockerized FAME
  • (done) the reworked/new virustotal downloader needs to be tested and provided to the community; until then, downloading samples from VT will not work with our changes
  • antivirus modules might not work due to the changes to the core code; if they still do, we would like to port these to the workers as well before this PR is merged to be conistent with the improved design principles

leba-gd added 7 commits July 31, 2019 10:08
This allows for install scripts which require root privilegs.
Recent versions of the python:2-slim docker container did not
use bash as the default shell and thus broke virtualenv's 'activate' script.
@raggadhub
Copy link
Contributor

Hi Luca,

Great work!
For the known issue "docker-based modules cannot be used with dockerized FAME" we can use these tricks (see the POC in PR #70) :

  1. bind /var/run/docker.sock between the host and the fame's container (the docker-based modules containers are seen at host docker level, to avoid weird behaviors of docker in docker).

  2. bind /opt/fame/temp between the host and the fame's container (when a docker-based module container try to access the fame's temporary directory bound between itself and the fame's container, it uses in fact the bound host's docker socket, so the host needs to access to the fame's temp directory).

PS: if you want to rebuild the docker based images, don't forget to delete them else the worker script ignores it because they already exist.

@lucebac
Copy link
Contributor Author

lucebac commented Aug 5, 2019

Thanks for your suggestion regarding running docker containers from within docker containers. I have to admit that I already thought of that solution but I did not have time to implement it yet, because there were other things to implement in FAME that had a higher priority for the time being. Though, it's still on our todo list and will be implemented in the near future.

leba-gd added 18 commits August 5, 2019 09:43
requirement which was introduced by using remote kvm hosts via ssh
install scripts of virtualization modules were not run (but required for
e.g. the kvm module); also, bash scripts are now run before the python
requirements installer (was necessary because the kvm module requires
libvirt-python which, in turn, depends on libvirt-dev and must be
installed prior to the python module)
if the module had more than the required IsolatedProcessingModule class,
the agent fails because other classes most probably do not have a 'name'
member
when a module is enabled, the worker must install module dependencies
The FAME containers are now configurable with environment variables only.
Thus, the requirement for binding the fame/conf directory into the
container can be dropped.
Because of the set logic behind the logging mechanism, only
one message per minute was logged for recurring messages. This commit
relaxes the logic to a list-based implementation which keeps the
previously dropped messages.
If triggered_by or acts_on were changed in the interface, all modules
following the updated one also had the new values for triggereb_by or acts_on
set to the new value.
@lucebac
Copy link
Contributor Author

lucebac commented Dec 24, 2019

Pull request for Preloading modules created

@lucebac lucebac changed the title Dockerizing FAME Preloading Modules / Dockerization of FAME / Dockerception (reference PR with changes, do not merge!) Dec 24, 2019
@lucebac lucebac changed the title Preloading Modules / Dockerization of FAME / Dockerception (reference PR with changes, do not merge!) Preloading Modules / Dockerization of FAME / Dockerception (reference PR with all changes, do not merge!) Dec 24, 2019
@lucebac lucebac changed the title Preloading Modules / Dockerization of FAME / Dockerception (reference PR with all changes, do not merge!) Preloading Modules / Dockerization of FAME / Dockerception (combined PR of all changes, do not merge!) Dec 24, 2019
@lucebac
Copy link
Contributor Author

lucebac commented Dec 26, 2019

Pull request for LDAP authentication created

@sinloft sinloft mentioned this pull request Apr 22, 2020
previously this broke the worker because it could not import jinja2
this prevents python form buffering output on stdout/stderr; otherwise
at least the worker may delay log messages on stdout significantly
@Augustin-FL
Copy link
Collaborator

Augustin-FL commented Aug 17, 2022

Hi @lucebac ,

I am trying to figure out all the functionalities you proposed in this PR.

From my understanding, here is what's included in the PR:

  • Make FAME docker-compatible, using DooD for the worker
  • Rework the installation steps (use environnent variables by default, user input otherwise)
  • Change the module update process, to remove the need for having a local worker. Also, move the worker code to a dedicated, independent folder.
  • Make git update process a bit more error-tolerant ("auto-clone the repo if the folder does not exist")
  • Do not install web (Flask, etc) requirements on the worker
  • Various function/file rename (get_processing_module => get_module, default.png => default_avatar.png, etc) without additional change.
  • Implement pep8 compatibility
  • Create a new page (system), designed for admins to resume stalled analysis
  • Define a new module type: preloading modules
  • Add Content-Type: application/binary when downloading a file
  • Remove quote_plus() from the mongoDB connection (not required?)
  • Accept 1 in addition to True when running in single_module mode

Am I missing anything?

@lucebac
Copy link
Contributor Author

lucebac commented Aug 18, 2022

Hi @Augustin-FL,

you ask a tough question. It's been a while since I've implemented these changes and this branch and the master have severely diverged in the meantime. But I believe you got the most important changes correctly.

However, there are a couple things I remember which need to be considered while inspection the changes/diff:

  • preloading modules have already been implemented by another MR which was, if I'm not mistaken, based off some ideas I presented here
  • my implementation of the Docker containers and DooD are incompatible with the current Docker container implementation as far as I remember
  • many of the commits just fix (minor) inconveniences or squash bugs (e.g. PEP8, the git rework, adding a content-type to downloads, etc)

If you really want to use parts of this code it's probably best to just cherry-pick the respective commits (or copy the files over) instead of trying to get this monster of a MR into a state which can be merged. I am happy to answer any question for which I still remember the answer but I cannot promise that I remember everything in the necessary level of detail.

@Augustin-FL
Copy link
Collaborator

Augustin-FL commented Aug 21, 2022

Yep, I'm aware about that :)

In the meantime, several feature have been implemented. I was asking this question to check what are the leftover.

Fyi, here is how I see things :

Already implemented

  • Make FAME docker-compatible (DinD was used instead of DooD...because as you experienced, volume management with DooD is a complete nightmare)
  • Rework the installation steps (use environnent variables by default, user input otherwise)
  • Define a new module type: preloading modules
  • Add Content-Type: application/binary when downloading a file.
  • Check if quote_plus() is actually required for a mongoDB connection. In any case authenticate() does not exist anymore in recent pymongo versions so a rework will be needed on the part of the code carrying out the mongoDB connection
  • Accept 1 in addition to True when running in single_module mode (not really a game changer, but after all why not?...)
  • Make sure the code is pep8/black compliant (not yet implemented but PR ongoing)

More discussion needed

  • Remove the need for having a local worker. I personally don't think this is a good idea : this is how fame has been designed from the start, and it should stay that way IMHO. I'm assuming you implemented this "to respect the docker micro-service model"...Another solution was found for this : run two docker containers with a shared volume.
  • Do not install web requirements on the worker, at the cost of managing two docker images. I'd prefer to keep it simple and have one single docker image, if possible. Also, don't think having flask on the worker cause any issue...Any opinion on that?
  • Resume stalled analysis from the web interface : I don't see any need for this feature...In what situation would you need to resume an analysis ? Database/worker restart?
  • Make git update process a bit more error-tolerant : The feature was not implemented but the error messages (which I agree were a bit confusing) have been improved

Do you agree with me? (@gaelmuller If you have any opinion on this, I'd be also happy to have it)

@lucebac
Copy link
Contributor Author

lucebac commented Aug 24, 2022

* Remove the need for having a local worker. I personally don't think this is a good idea : this is how fame has been designed from the start, and it should stay that way IMHO. I'm assuming you implemented this "to respect the docker micro-service model"...Another solution was found for this : run two docker containers with a shared volume.

AFAIR, the local worker was only required because of how the analysis target files were shared with the worker. With my changes this is no longer necessary: workers do not rely on the data being present on the filesystem but instead just download them over the REST API (like they did with everything else already - think of the modules zip, the analysis results, etc).

* Do not install web requirements on the worker, at the cost of managing two docker images. I'd prefer to keep it simple and have one single docker image, if possible. Also, don't think having flask on the worker cause any issue...Any opinion on that?

That's up to your liking. I, personally, just prefer to not store unneeded code in a container to have the image as small as possible - that's all there was to my decision.

* Resume stalled analysis from the web interface : I don't see any need for this feature...In what situation would you need to resume an analysis ? Database/worker restart?

Mostly, yes. But also worker crashes cause an analysis to stall because the analysis job won't be picked up again. It does not appear often but if it does, it's very convenient to be able to just click a buttont to resume an analysis instead of having to schedule a new one - depending on which modules are involved in the analysis that may even save a significant amount of time (for example sandbox analyses which usually take several minutes to complete).

* Make git update process a bit more error-tolerant : The feature was not implemented but the error messages (which I agree were a bit confusing) have been improved

It's not just error tolerance. I also changed how the modules are being updated. If I'm not mistaken, the web server used to be the component responsible for updating the modules zip. However, this stalls the web interface for as long as the update takes and may even cause the webserver to time out if pulling the git repo takes too long. To avoid that, I introduced a new job for this so that a worker now picks up this task and updates the modules asynchronously (which, coincidentally, also doesn't crash the web interface anymore if there is an exception during the module update).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants