You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Just got done installing your lovely solution for time-lining and I love this workflow! However, I see some points of improvement:
Installer
Make uploading the complete .plaso file back to the S3 Bucket optional through a switch in deploy.sh that just does not activate the script watch-plaso-to-s3.sh
Docker continuously throws an error that it should not run as root. Can be checked using the command sudo docker logs --tail 50 --follow --timestamps timesketch_timesketch-worker_1 | less
Create an option for deleting all raw data after plaso processing for cases when you are strapped for storage.
Create an option for the timesketch instance not being run in AWS. I have it running on a on-prem hypervisor with Velo being run in AWS. Further down the road one might also consider shipping the data to SFTP instead of AWS to allow for a full on-prem solution
watch-s3-to-timesketch.py
If the same hunt is executed twice for some reason the filename in the S3 Bucket will remain the same. It might be interesting to add a unique ID to each item in the S3 Bucket to identify them. I have no good solution for this as of yet, maybe AWS has something already built in. These IDs for every item would then be added to a list/database on the timesketch instance to be checked prior to downloading
Currently there is a While True loop that sends requests at a very high frequency. This can quickly increase your AWS bill. After running my pipeline for roughly 30 hours I had a 30€ bill despite almost no data being transferred. Having the script poll every 10 seconds or so would drastically decrease the number of requests without slowing the pipeline significantly.
The AWS Credentials need to be put in in the sourcecode. I think following the AWS best practices with a dedicated file at ~/.aws/credentials might be better. See this for reference
watch-to-timesketch.sh
The name of the service being installed (data-to-timesketch) is different from the name of the script. This is not the case for the python downloader or the other bash script. It confused me for a moment and I would align that to both be watch-data-to-timesketch
There is a bug that causes all data from the unzipped Kape .zip to be deleted instead of only the unimportant bits. This is due to the filepath, at least in my installation is [...]$SYSTEM/fs/fs/clients/[...] instead of [...]$SYSTEM/fs/clients/[...] Check the following code for reference
# Remove from subdir
mv $PARENT_DATA_DIR/$SYSTEM/fs/clients/*/collections/*/uploads/*$PARENT_DATA_DIR/$SYSTEM/
# Delete unnecessary collection data
rm -r $PARENT_DATA_DIR/$SYSTEM/fs $PARENT_DATA_DIR/$SYSTEM/UploadFlow.json $PARENT_DATA_DIR/$SYSTEM/UploadFlow
Ideas
I think a central config file might solve some of the issues I faced but I am not sure wether this is the best way to go about it. I will try to create a pull request that offers a solution for the topics I mentioned. Furthermore, creating an SFTP based solution in parallel to the AWS based solution would allow one to host your setup locally. I will see if I get around to that either
The text was updated successfully, but these errors were encountered:
Just got done installing your lovely solution for time-lining and I love this workflow! However, I see some points of improvement:
deploy.sh
that just does not activate the scriptwatch-plaso-to-s3.sh
sudo docker logs --tail 50 --follow --timestamps timesketch_timesketch-worker_1 | less
watch-s3-to-timesketch.py
While True
loop that sends requests at a very high frequency. This can quickly increase your AWS bill. After running my pipeline for roughly 30 hours I had a 30€ bill despite almost no data being transferred. Having the script poll every 10 seconds or so would drastically decrease the number of requests without slowing the pipeline significantly.~/.aws/credentials
might be better. See this for referencewatch-to-timesketch.sh
data-to-timesketch
) is different from the name of the script. This is not the case for the python downloader or the other bash script. It confused me for a moment and I would align that to both bewatch-data-to-timesketch
[...]$SYSTEM/fs/fs/clients/[...]
instead of[...]$SYSTEM/fs/clients/[...]
Check the following code for referenceIdeas
I think a central config file might solve some of the issues I faced but I am not sure wether this is the best way to go about it. I will try to create a pull request that offers a solution for the topics I mentioned. Furthermore, creating an SFTP based solution in parallel to the AWS based solution would allow one to host your setup locally. I will see if I get around to that either
The text was updated successfully, but these errors were encountered: