Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] #43

Open
fish-not-phish opened this issue May 30, 2024 · 21 comments
Open

[BUG] #43

fish-not-phish opened this issue May 30, 2024 · 21 comments
Assignees
Labels
bug Something isn't working

Comments

@fish-not-phish
Copy link

Hello,

I am having some problems deploying this stack as it appears the script is not running as expected.
I am running on Ubuntu 22.04 LTS, fresh install. The virtual machine has 8 CPU cores, 16GB of RAM, and 500GB of storage, so I don't suspect a resource issue.

I edited the .env file, changing these 4 items:

  • LOCAL_KBN_URL=https://192.168.0.X:5601
  • ELASTIC_PASSWORD=<redacted_password>
  • KIBANA_PASSWORD=<redacted_password>
  • WindowsDR=1

No other modifications were made.

When I run the script sudo ./elastic-container.sh start, it runs and appears to set up the necessary containers. However, the output does not match what would be expected. The output I get is this:

Attempting to enable the Detection Engine and Prebuilt-Detection Rules

Kibana is up. Proceeding

Detection engine enabled. Installing prepackaged rules.

Prepackaged rules installed!

Enabling detection rules

Waiting 40 seconds for Fleet Server setup

Populating Fleet Settings

However, I never get this included in the output. It's simply missing:

READY SET GO!

Browse to https://localhost:5601

Username: elastic


Passphrase: you-changed-me-from-the-default-right?**

When I try to go to the URL, it isn't up, and is only accessible if I run the restart option for the script: sudo ./elastic-container.sh restart. Then the URL becomes accessible, but the fleet settings are not configured.

When I run sudo ./elastic-container.sh status, this is the output:

NAME               		IMAGE								COMMAND				SERVICE 		CREATED			STATUS				PORTS		
ecp-elasticsearch		docker.elastic.co/elasticsearch/elasticsearch:8.12.2		"/bin/tini -- /usr/l..."	elasticsearch		27 minutes ago		Up 24 minutes (healthy)		0.0.0.0:9200->9200/tcp, :::9200->9200/tcp, 9300/tcp

ecp-fleet-server 		docker.elastic.co/beats/elastic-agent:8.12.2			"/usr/bin/tini -- /u..."	fleet-server		27 minutes ago		Up 24 minutes			0.0.0.0:8220->8220/tcp, :::8220->8220/tcp

ecp-kibana			docker.elastic.co/kibana/kibana:8.12.2				"/bin/tint -- /usr/l..."	kibana			27 minutes ago		Up 24 minutes (healthy)		0.0.0.0:5601->5601/tcp, :::5601->5601/tcp

I noticed that the ecp-fleet-server is online but is not denoted as "healthy". I checked the docker logs and observed connection refused errors:

"message": "Error dialing dial tcp [::1]:9200: connect: connection refused"

and also

"message": "Attempting to reconnec to backoff(elasticsearch(http://localhost:9200)) with 75 reconnect attempt(s)"

Reading this, I do understand that there may be a connectivity issue, however I am not running and UFW and I have not altered IP Tables. The Proxmox firewall for this Virtual Machine is off - so that shouldn't have any impact either.

Any help would be greatly appreciated to get this working.

@fish-not-phish fish-not-phish added the bug Something isn't working label May 30, 2024
@fish-not-phish
Copy link
Author

fish-not-phish commented May 30, 2024

I was looking through other issues and I noticed Issue 23 which talks about the Fleet server not coming back after reboot. I noticed the user mentioned version 8.5 still works. I went ahead and rolled back to that version within the repository history and that version does indeed work without any errors. A temporary fix, but obviously it would be preferred to run the most recent version.

@peasead
Copy link
Owner

peasead commented May 30, 2024

Do you have this issue with 8.13?

If you want to test, you can to a ./elastic-container.sh destroy and then start fresh.

I'll also look into this.

@fish-not-phish
Copy link
Author

Do you have this issue with 8.13?

If you want to test, you can to a ./elastic-container.sh destroy and then start fresh.

I'll also look into this.

I was using STACK_VERSION=8.12.2, I have not tried 8.13. I might go ahead and try to see if that works. I will let you know if 8.13 works or not.

Sadly, I have already destroyed and started fresh and I get the same result each time.

@fish-not-phish
Copy link
Author

fish-not-phish commented May 30, 2024

Doesn't appear to work with 8.13.

I tried changing STACK_VERSION=8.13.0 within .env.

@TrainLam
Copy link

TrainLam commented Jun 3, 2024

Tried 8.4.3 and it is worked.However, the portal responses very slowly.

Tried 8.13.4, 8.12.2 and 8.12.0 and all are failure.

@peasead
Copy link
Owner

peasead commented Jul 2, 2024

Thanks for your patience. I will open an Issue upstream.

When I deployed, the Fleet server wasn't healthy ever. I ran sh elastic-container.sh restart and then everything was healthy and Fleet was available. That's not a good solution, but it can work as a temporary solution. I wonder if there is some race condition where if one of the other containers isn't up and healthy, Fleet chokes and doesn't self-heal.

I'll try a few "relies on" options.

@peasead
Copy link
Owner

peasead commented Jul 2, 2024

Even when it is healthy in Kibana, it never shows healthy in Docker.

image
./elastic-container.sh status
NAME                IMAGE                                                  COMMAND                  SERVICE             CREATED             STATUS                   PORTS
ecp-elasticsearch   docker.elastic.co/elasticsearch/elasticsearch:8.14.1   "/bin/tini -- /usr/l…"   elasticsearch       5 minutes ago       Up 5 minutes (healthy)   0.0.0.0:9200->9200/tcp, 9300/tcp
ecp-fleet-server    docker.elastic.co/beats/elastic-agent:8.14.1           "/usr/bin/tini -- /u…"   fleet-server        5 minutes ago       Up 44 seconds            0.0.0.0:8220->8220/tcp
ecp-kibana          docker.elastic.co/kibana/kibana:8.14.1                 "/bin/tini -- /usr/l…"   kibana              5 minutes ago       Up 4 minutes (healthy)   0.0.0.0:5601->5601/tcp

But it wasn't healthy in Kibana until I did a restart. I tried just restarting the Fleet container and the whole stack. Both brought Fleet online.

I'll follow up here with the Elastic Issue for tracking.

@TrainLam
Copy link

TrainLam commented Jul 3, 2024

Thanks, and it means that 8.14.1 can fix the above issue by mannual restart kibana or all dockers.
Am i correct?

Your comment is appreciated.

@peasead
Copy link
Owner

peasead commented Jul 3, 2024

I believe I tried it both ways and both worked.

@TrainLam
Copy link

TrainLam commented Jul 7, 2024

I tried but fleet server cannot be displayed such as screen1. However, screen2 can show that the fleet server is running.

Screen 1
image

Screen 2
image

@TrainLam
Copy link

TrainLam commented Jul 7, 2024

Tried to do once 8.14.1 but situation is same

@TrainLam
Copy link

TrainLam commented Jul 11, 2024

Tried to test previous version and just version 8.8.2 can execute the elastic-container.sh to build all sucessfully.

@kaliankhe
Copy link

any update on this still facing this issue with 8.14.3 as well

@TrainLam
Copy link

TrainLam commented Aug 9, 2024

Tried with 8.15.0 and it is still not working such as following.

image

@saidhfm
Copy link

saidhfm commented Aug 20, 2024

Finally after 2 hrs of troubleshooting found a workaround with 8.14.0
use this commit code - https://github.com/peasead/elastic-container/tree/0ef92f1e7bce33ca5c42bbe545630fe18c5bf028
copy code from each file and replace in your local files, recheck the .env file that should have STACK_VERSION=8.14.0
try this it will work 100%
if you have more doubts on deployment reach out to me on linkedin i can help you - https://www.linkedin.com/in/saibatchu/

@fish-not-phish
Copy link
Author

Finally after 2 hrs of troubleshooting found a workaround with 8.14.0 use this commit code - https://github.com/peasead/elastic-container/tree/0ef92f1e7bce33ca5c42bbe545630fe18c5bf028 copy code from each file and replace in your local files, recheck the .env file that should have STACK_VERSION=8.14.0 try this it will work 100% if you have more doubts on deployment reach out to me on linkedin i can help you - https://www.linkedin.com/in/saibatchu/

This actually seemed to work for me as well. I will update in 1-2 weeks if there are any health concerns regarding the fleet. I have a VM with a large amount of resources allocated to it, so there should not be any resource-related issues.

@DefSecSentinel
Copy link
Collaborator

Hey @fish-not-phish I'm jumping in to get this issue fixed. I just pushed a change to main in the shell script that fixes an issue with Fleet settings being properly populated. I just tested on macOS standing up a fresh stack and everything works as advertised. Can you pull main again and try standing up a stack then letting me know if you still experience a problem?

@DefSecSentinel
Copy link
Collaborator

You also should not have to change the LOCAL_KBN_URL value

@TrainLam
Copy link

TrainLam commented Sep 5, 2024

I tried to test on Ubuntu and no luck such as following

image

image

@DefSecSentinel
Copy link
Collaborator

I'll do a test on Ubuntu today and see if I can't figure out what's going on.

@octaviotron
Copy link

Recently I use this repo for deploying ELK for testing and studing. It works like a charm in my computer so I decided to put it to work in a server and Fleet did not work.

After hours debugging I realize it has deploy problems in current Debian Stable (bookworm) but it works perfect in current Debian Testing (trixie).

Maybe one of the above cases may solve just using another Docker Host OS. Hope this will be useful to anyone.

I have no skills enough in ELK to realize where the problem resides on this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants