Skip to content
This repository has been archived by the owner on Jan 15, 2022. It is now read-only.

jobFileProcessor.sh complains about missing arguments. #87

Open
hcoyote opened this issue May 16, 2014 · 4 comments
Open

jobFileProcessor.sh complains about missing arguments. #87

hcoyote opened this issue May 16, 2014 · 4 comments

Comments

@hcoyote
Copy link

hcoyote commented May 16, 2014

Running from origin/master.

I have things patched up enough to get the jobFilePreprocessor.sh and jobFileLoader.sh connecting to our Hadoop environment. The last step in hraven-etl.sh invokes jobFileProcessor.sh, but this throws errors about missing arguments.

I poked around in the code and it's not really clear what these should be. machinetype appears like it should be set to "default" if not explicitly set, but the arg processor makes this argument required. Additionally, I can't find a great deal of discussion on what's supposed to be in the cost properties.

ERROR: Missing required options: z, m

usage: JobFileProcessor  [-b <batch-size>] -c <cluster> [-d] -m
       <machinetype> [-p <processFileSubstring>] [-r] [-t <thread-count>]
       -z <costfile>
 -b,--batchSize <batch-size>                        The number of files to
                                                    process in one batch.
                                                    Default 100
 -c,--cluster <cluster>                             cluster for which jobs
                                                    are processed
 -d,--debug                                         switch on DEBUG log
                                                    level
 -m,--machineType <machinetype>                     The type of machine
                                                    this job ran on
  -p,--processFileSubstring <processFileSubstring>   use only those process
                                                     records where the
                                                     process file path
                                                     contains the provided
                                                     string. Useful when
                                                     processing production
                                                     jobs in parallel to
                                                     historic loads.
  -r,--reprocess                                     Reprocess only those
                                                     records that have been
                                                     marked to be
                                                     reprocessed. Otherwise
                                                     process all rows
                                                     indicated in the
                                                     processing records,
                                                     but successfully
                                                     processed job files
                                                     are skipped.
  -t,--threads <thread-count>                        Number of parallel
                                                     threads to use to run
                                                     Hadoop jobs
                                                     simultaniously.
                                                     Default = 1
  -z,--costFile <costfile>                           The cost properties
                                                     file on local disk
@vrushalic
Copy link
Collaborator

Hi Travis,

Yes, I can see that the jobFilePreprocessor.sh was not updated. Give me a few mins to update it now.
I will add some more documentation to a sample cost file in the conf dir. The job cost will be stored as a column in hbase.

The cost properties file could even be empty file, since it simply won't calculate the cost then.

vrushalic pushed a commit that referenced this issue May 16, 2014
@vrushalic
Copy link
Collaborator

Updated the script and added a sample file. Please give this a try and let me know.

angadsingh pushed a commit to angadsingh/hraven that referenced this issue May 20, 2014
# By Vrushali Channapattan
# Via Joep Rottinghuis (1) and Vrushali Channapattan (1)
* 'master' of https://github.com/twitter/hraven:
  Issue twitter#87 Updating jobFileProcessor.sh with latest arguments and adding a sample cost file
  Updating class names to reflect their intention better, adding some more tests and cleaning up documentation
  Updating formatting
  Modifying to include AppService, App and AppKey classes, also making a single api call for new jobs given a cluster and making user as a query param
  Updating to move get new jobs to job history service
  Updating some more comments
  Updating java docs
  Updating to remove capacity info, ensure APIs don't mix service class calls
  Updating to add final modifiers, removing abstract in interfaces, changing to return Object instead of long
  updating to enable different schedulers via factory and interface
  Issue twitter#82 Allowing for different schedulers to be supported, presently adding for fair scheduler and Updating other things as per review comments.
  minor formatting changes
  Issue twitter#82: Add a newJobs REST API, Issue twitter#81: Correct the timestamp being stored in appVersion table, Issue twitter#80: Have queue/pool name returned at flow level

Conflicts:
	bin/etl/jobFileProcessor.sh
	hraven-core/src/main/java/com/twitter/hraven/FlowKey.java
	hraven-core/src/main/java/com/twitter/hraven/datasource/JobHistoryService.java
@hcoyote
Copy link
Author

hcoyote commented May 21, 2014

Thanks, I'll see if I can get it working tomorrow.

@vrushalic
Copy link
Collaborator

Hi,
Did this work for you?

thanks
Vrushali

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants