Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

start: allow users to call job start command to start stopped jobs #24150

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
10 changes: 10 additions & 0 deletions command/commands.go
Original file line number Diff line number Diff line change
Expand Up @@ -521,6 +521,11 @@ func Commands(metaPtr *Meta, agentUi cli.Ui) map[string]cli.CommandFactory {
Meta: meta,
}, nil
},
"job start": func() (cli.Command, error) {
return &JobStartCommand{
Meta: meta,
}, nil
},
"job validate": func() (cli.Command, error) {
return &JobValidateCommand{
Meta: meta,
Expand Down Expand Up @@ -1071,6 +1076,11 @@ func Commands(metaPtr *Meta, agentUi cli.Ui) map[string]cli.CommandFactory {
Meta: meta,
}, nil
},
"start": func() (cli.Command, error) {
return &JobStartCommand{
Meta: meta,
}, nil
},
"system": func() (cli.Command, error) {
return &SystemCommand{
Meta: meta,
Expand Down
248 changes: 248 additions & 0 deletions command/job_start.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,248 @@
// Copyright (c) HashiCorp, Inc.
// SPDX-License-Identifier: BUSL-1.1

package command

import (
"fmt"
"github.com/hashicorp/nomad/api"
"github.com/hashicorp/nomad/api/contexts"
"github.com/posener/complete"
"os"
"strings"
"sync"
martisah marked this conversation as resolved.
Show resolved Hide resolved
)

type JobStartCommand struct {
Meta
}

func (c *JobStartCommand) Help() string {
helpText := `
Usage: nomad job start [options] <job>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we don't have any clue here to the user that they can use more than one job ID, or that if they do they all need to be in the same namespace.

Alias: nomad start

Start an existing stopped job. This command is used to start a previously stopped job's
most recent running version up. Upon successful start, an interactive
monitor session will start to display log lines as the job starts up its
martisah marked this conversation as resolved.
Show resolved Hide resolved
allocations based on its most recent running version. It is safe to exit the monitor
early using ctrl+c.

When ACLs are enabled, this command requires a token with the 'submit-job'
and 'read-job' capabilities for the job's namespace. The 'list-jobs'
capability is required to run the command with job prefixes instead of exact
job IDs.


martisah marked this conversation as resolved.
Show resolved Hide resolved
General Options:

` + generalOptionsUsage(usageOptsDefault) + `

Start Options:

-detach
Return immediately instead of entering monitor mode. After the
job start command is submitted, a new evaluation ID is printed to the
screen, which can be used to examine the evaluation using the eval-status
command.

-consul-token
The Consul token used to verify that the caller has access to the Service
Identity policies associated in the targeted version of the job.

-vault-token
The Vault token used to verify that the caller has access to the Vault
policies in the targeted version of the job.

-verbose
Display full information.
`
return strings.TrimSpace(helpText)
}

func (c *JobStartCommand) Synopsis() string {
return "Start a stopped job"
}

func (c *JobStartCommand) AutocompleteFlags() complete.Flags {
return mergeAutocompleteFlags(c.Meta.AutocompleteFlags(FlagSetClient),
complete.Flags{
"-detach": complete.PredictNothing,
"-verbose": complete.PredictNothing,
})
}

func (c *JobStartCommand) AutocompleteArgs() complete.Predictor {
return complete.PredictFunc(func(a complete.Args) []string {
client, err := c.Meta.Client()
if err != nil {
return nil
}

resp, _, err := client.Search().PrefixSearch(a.Last, contexts.Jobs, nil)
if err != nil {
return []string{}
}
return resp.Matches[contexts.Jobs]
})
}
func (c *JobStartCommand) Name() string { return "job start" }

func (c *JobStartCommand) Run(args []string) int {
var detach, verbose bool
var consulToken, vaultToken string

flags := c.Meta.FlagSet(c.Name(), FlagSetClient)
flags.Usage = func() { c.Ui.Output(c.Help()) }
flags.BoolVar(&detach, "detach", false, "")
flags.BoolVar(&verbose, "verbose", false, "")
flags.StringVar(&consulToken, "consul-token", "", "")
flags.StringVar(&vaultToken, "vault-token", "", "")

if err := flags.Parse(args); err != nil {
return 1
}

// Check that we got at least one job
args = flags.Args()
if len(args) < 1 {
c.Ui.Error("This command takes at least one argument: <job>")
c.Ui.Error(commandErrorText(c))
return 1
}

var jobIDs []string
for _, jobID := range flags.Args() {
jobIDs = append(jobIDs, strings.TrimSpace(jobID))
}

// Get the HTTP client
client, err := c.Meta.Client()
if err != nil {
c.Ui.Error(fmt.Sprintf("Error initializing client: %s", err))
return 1
}

statusCh := make(chan int, len(jobIDs))
var wg sync.WaitGroup

for _, jobID := range jobIDs {
jobID := jobID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jobID := range jobIDs is the ideomatic way to write this, but in the goroutine we shadow this variable. That makes it easy to confuse whether we're dealing with the prefix we got from the user or the actual job ID we got from our JobIDByPrefix call. That results in a small bug later down.

Also, as of Go 1.22 we no longer need to rebind the loopvar, so we're free of having to remember! I always forget 😊 so good that you caught it, but we can drop it now. See https://go.dev/wiki/LoopvarExperiment

Suggested change
for _, jobID := range jobIDs {
jobID := jobID
for _, jobIDPrefix := range jobIDs {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooh I didn't know that! Thanks for catching this!


wg.Add(1)
go func() {
defer wg.Done()

// Truncate the id unless full length is requested
length := shortId
if verbose {
length = fullId
}

// Check if the job exists and has been stopped (status is dead)
jobId, namespace, err := c.JobIDByPrefix(client, jobID, nil)
if err != nil {
c.Ui.Error(err.Error())
statusCh <- 1
return
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where we originally shadowed jobID, but as it turns out we're not actually shadowing it because we changed the capitalization. This makes it really easy to miss that we're using the prefix when we make the Jobs().Versions call later.

But it turns out that we already need to call JobByPrefix below, which takes the prefix, so we can just delete this whole block.

Suggested change
jobId, namespace, err := c.JobIDByPrefix(client, jobID, nil)
if err != nil {
c.Ui.Error(err.Error())
statusCh <- 1
return
}

job, err := c.JobByPrefix(client, jobId, nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
job, err := c.JobByPrefix(client, jobId, nil)
job, err := c.JobByPrefix(client, jobIDPrefix, nil)

if err != nil {
c.Ui.Error(err.Error())
statusCh <- 1
return
}
if *job.Status != "dead" {
c.Ui.Info(fmt.Sprintf("Job %v has not been stopped and has the following status: %v", *job.Name, *job.Status))
statusCh <- 0
return

}

// Get all versions associated to current job
q := &api.QueryOptions{Namespace: namespace}

versions, _, _, err := client.Jobs().Versions(jobID, true, q)
martisah marked this conversation as resolved.
Show resolved Hide resolved
if err != nil {
c.Ui.Error(fmt.Sprintf("Error retrieving job versions: %s", err))
statusCh <- 1
}

// Find the most recent running version for this job
var chosenIndex uint64
versionAvailable := false
for i := len(versions) - 1; i >= 0; i-- {
if *versions[i].Status == "running" && *versions[i].Stop == true {
martisah marked this conversation as resolved.
Show resolved Hide resolved
chosenIndex = uint64(i)
versionAvailable = true
break
}

}
if !versionAvailable {
c.Ui.Error(fmt.Sprintf("No previous running versions of job %v, %s", *job.Name, err))
martisah marked this conversation as resolved.
Show resolved Hide resolved
statusCh <- 1
return
}

// Parse the Consul token
if consulToken == "" {
// Check the environment variable
consulToken = os.Getenv("CONSUL_HTTP_TOKEN")
}

// Parse the Vault token
if vaultToken == "" {
// Check the environment variable
vaultToken = os.Getenv("VAULT_TOKEN")
}
martisah marked this conversation as resolved.
Show resolved Hide resolved

// Revert to most recent running version!
m := &api.WriteOptions{Namespace: namespace}

resp, _, err := client.Jobs().Revert(jobID, chosenIndex, nil, m, consulToken, vaultToken)
martisah marked this conversation as resolved.
Show resolved Hide resolved
if err != nil {
c.Ui.Error(fmt.Sprintf("Error retrieving job version %v for job %s: %s,", chosenIndex, jobID, err))
statusCh <- 1
return
}

// Nothing to do
martisah marked this conversation as resolved.
Show resolved Hide resolved
evalCreated := resp.EvalID != ""

if !evalCreated {
statusCh <- 0
return
}

if detach {
c.Ui.Output("Evaluation ID: " + resp.EvalID)
statusCh <- 0
return
}

mon := newMonitor(c.Ui, client, length)
statusCh <- mon.monitor(resp.EvalID)

}()
}
// users will still see
// errors if any while we
// wait for the goroutines
// to finish processing
wg.Wait()

// close the channel to ensure
// the range statement below
// doesn't go on indefinitely
close(statusCh)

// return a non-zero exit code
// if even a single job start fails
for status := range statusCh {
if status != 0 {
return status
}
}
return 0
}
Loading