-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
windows restart service #1681
windows restart service #1681
Conversation
a38204e
to
b84ffd4
Compare
1fca6bc
to
e530c23
Compare
c086936
to
faefa20
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial comments on the windows stuff. Haven't dug into the log or sqlite stuff
launcherWatchdogServiceName string = `LauncherKolideWatchdogSvc` | ||
launcherServiceName string = `LauncherKolideK2Svc` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are k2 specific names, as such, I would probably move this all into ee
// do nothing if watchdog is not enabled | ||
if !wc.knapsack.LauncherWatchdogEnabled() { | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oo. This is interesting. What happens if there are pending logs, and then watchdog is disabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was unsure about the right call here- i think if we're expecting that most windows devices will have this enabled someday then removing this enabled check is the right call. I had added because the case of stuck logs seemed extremely rare (and logs would still be recoverable) vs most devices having this disabled and checking logs every 5 minutes for no reason
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is a fine choice. I guess if we get stale logs, hopefully we'll notice the timestamp
} | ||
|
||
func runLauncherWatchdogService(ctx context.Context, w *winWatchdogSvc) error { | ||
ticker := time.NewTicker(1 * time.Minute) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this frequency will be okay. Or if there's going to be some memory leak. Well, it's behind a feature flag, so we'll get to find out!
@@ -0,0 +1,30 @@ | |||
### Watchdog Service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could call this file README.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or move to docs/architecture
?
} | ||
} | ||
|
||
func (w *winWatchdogSvc) checkLauncherStatus(ctx context.Context) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to skip this if we're in powersave mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add a link to the docs I'm working on at the end but the short answer is not easily- unless we are willing to:
- add logic into here to subscribe to power events from this process as well (similar to current launcher functionality)
- add logic to register directly for power events as a handler
I would love to be wrong here but there does not appear to be any reliable mechanism to gather the current state in real time (without significantly complicating the watchdog)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this powersave issue is the one we should focus on. The PR generally looks good, and the small details can drag out forever, but if we don't have an answer for powersave we probably can't ship.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, updated with logic to integrate our existing power event watcher code 👍
@@ -19,20 +19,23 @@ import ( | |||
_ "modernc.org/sqlite" | |||
) | |||
|
|||
type storeName int | |||
type StoreName int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be exported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exported for use in the pkg/log/sqlitelogger
package, looks like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm. Feels a little weird. I wonder if we should be create stores and passing them around. But okay...
// TimestampedIteratorDeleterAppenderCloser is an interface to support the storage and retrieval of | ||
// sets of timestamped values. This can be used where a strict key/value interface may not suffice, | ||
// e.g. for writing logs or historical records to sqlite | ||
type TimestampedIteratorDeleterAppenderCloser interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat!
if err := wc.logPublisher.ForEach(func(rowid, timestamp int64, v []byte) error { | ||
logRecord := make(map[string]any) | ||
if err := json.Unmarshal(v, &logRecord); err != nil { | ||
wc.slogger.Log(ctx, slog.LevelError, "failed to unmarshal sqlite log", "log", string(v)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we include the err
in this log also?
logRecord := make(map[string]any) | ||
if err := json.Unmarshal(v, &logRecord); err != nil { | ||
wc.slogger.Log(ctx, slog.LevelError, "failed to unmarshal sqlite log", "log", string(v)) | ||
logsToDelete = append(logsToDelete, rowid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we can move this line to the top of the ForEach
function, since there's no circumstance where we won't want to delete this log -- instead of having the append here and on line 113?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes thank you! will do
return | ||
} | ||
|
||
if err.Error() == serviceDoesNotExistError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree w/ an if err != nil
+ a switch or if/elseif inside to check the error type
wc.slogger.Log(ctx, slog.LevelError, | ||
"installing launcher watchdog, unable to collect current executable path", | ||
"err", err, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this log since we're returning an error and the calling function logs it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope I can remove, good call thank you!
}, | ||
} | ||
|
||
if err = restartService.SetRecoveryActions(recoveryActions, 10800); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be more readable as a const with a comment --
const serviceResetPeriod = 10800 // 3 hours in seconds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because this looks like a fun, unimportant thing to quibble over, I would go with
const serviceResetPeriodSeconds = 3 * 60 * 60 // 3 hours in seconds
@@ -0,0 +1,30 @@ | |||
### Watchdog Service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or move to docs/architecture
?
@@ -19,20 +19,23 @@ import ( | |||
_ "modernc.org/sqlite" | |||
) | |||
|
|||
type storeName int | |||
type StoreName int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exported for use in the pkg/log/sqlitelogger
package, looks like
"strings" | ||
) | ||
|
||
func (s *sqliteStore) getColumns() *sqliteColumns { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function feels like maybe it belongs in keyvalue_store_sqlite.go
instead, since it applies to both tables? What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that probably makes more sense, I was torn about adding a new shared file but think we can probably wait on that. i'll move there for now!
side note -- do not merge until 1.7 is stable |
@@ -19,20 +19,23 @@ import ( | |||
_ "modernc.org/sqlite" | |||
) | |||
|
|||
type storeName int | |||
type StoreName int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm. Feels a little weird. I wonder if we should be create stores and passing them around. But okay...
} | ||
) | ||
|
||
func NewSqliteLogWriter(ctx context.Context, rootDirectory string, tableName agentsqlite.StoreName) (*SqliteLogWriter, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This signature feels a little off to me. I wonder if it would be cleaner if it took types.LogStore
directly.
@@ -0,0 +1,52 @@ | |||
package sqlitelogger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's anything here related to sqlite or logging? It's implementing an io.WriteCloser
. Why not in ee/agent/storage/sqlite/logstore_sqlite.go
?
(I mean, it's also pretty cool, but I'm not sure why it's split out)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean, I think I originally assumed there would be more to do here. Consolidating should clean up those extra exported types too, I will see how that looks. thank you!
switch s { | ||
case StartupSettingsStore: | ||
return "startup_settings" | ||
case WatchdogLogStore: | ||
return "watchdog_logs" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to push other logs here (say early startup logs) how would we do it? Would we make a new store? Accept it as slightly misnamed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think new store will give us more implementation flexibility and we can still re-use all of the new interfaces and wire that in pretty easily
4c7b268
to
84ce6c1
Compare
PR feedback: less noisy log levels, comment updates and style fixes Co-authored-by: seph <[email protected]>
…ovements, add checks for non-LogStore table methods
Co-authored-by: Rebecca Mahany-Horton <[email protected]>
…ore readable const
…tore and unexport storeName, clean up
84ce6c1
to
2d68f1e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made it thru everything but ee/watchdog/watchdog_service_windows.go
, will come back for that tomorrow AM!
Co-authored-by: Rebecca Mahany-Horton <[email protected]>
Here is a basic sequence diagram displaying the enable path for the windows watchdog service. The
launcher_watchdog_enabled
control flag will trigger the initial configuration and installation, and removal of the flag will trigger removal of the service.launcher_watchdog_enabled
flag, and publishes all sqlite logs to debug.json.