-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Helm chart deployment failing with /app/app-config.gz: not in gzip format
#10
Comments
Thanks @vara-bonthu for reporting this issue. Looks like when the project was open sourced, we missed documenting this step. A prerequisite before the YAML string is encoded is compressing the string. A Python code snippet that does it looks like:
Then the resulting |
Thanks, @yuchaoran20! I will give it a try and raise a PR for the docs. |
Thanks! This above issue has been resolved now with the Python script. I am able to successfully deploy the BPG helm chart. I have moved on to the next step of executing the sample job but I have hit a new error. This does look like some permission issue or Kubernetes client. Job details ➜ vara git:(bpm) ✗ curl -u admin:admin http://localhost:8080/apiv2/spark -i -X POST \
-H 'Content-Type: application/json' \
-d '{
"applicationName": "demo",
"queue": "dev",
"sparkVersion": "3.2",
"mainApplicationFile": "s3a//spark-3279499/uploaded/foo/MinimalSparkApp.py",
"driver": {
"cores": 1,
"memory": "2g"
},
"executor": {
"instances": 1,
"cores": 1,
"memory": "2g"
}
}'
HTTP/1.1 500 Internal Server Error
Date: Mon, 07 Nov 2022 22:27:20 GMT
Content-Type: application/json
Content-Length: 196
{"code":500,"message":"io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create] for kind: [SparkApplicationResource] with name: [null] in namespace: [spark-team-a] failed."}% BPG Pod log error
This issue is similar to mine kubeflow/spark-operator#1277 How do i pass |
Hi @vara-bonthu yes the BPG does create It looks like that your fabric8 client is trying to use local kube config file, while it is supposed to use the tokens from BPG config. A few pointers that may help:
|
Thanks @tongtianqi777
Here is the config from the BPG Pod
Yes, I have all the config in my context=$(kubectl config current-context)
serverUrl=$(kubectl config view -o jsonpath='{.clusters[?(@.name == "'${context}'")].cluster.server}')
saSecret=$(kubectl -n spark-team-a get sa/spark-team-a -o json | jq -r '.secrets[] | .name')
saToken=$(kubectl -n spark-team-a get secret/${saSecret} -o json | jq -r '.data.token')
saCA=$(kubectl -n spark-team-a get secret/${saSecret} -o json | jq -r '.data."ca.crt"') Did you see anything wrong in the config? defaultSparkConf:
spark.kubernetes.submission.connectionTimeout: 30000
spark.kubernetes.submission.requestTimeout: 30000
spark.kubernetes.driver.connectionTimeout: 30000
spark.kubernetes.driver.requestTimeout: 30000
spark.sql.debug.maxToStringFields: 75
sparkClusters:
- weight: 100
id: cluster-id-1
eksCluster: arn:aws:eks:us-west-2:345645745645:cluster/spark-k8s-operator
masterUrl: https://37F929459SDHFHDFHC7275B1.sk1.us-west-2.eks.amazonaws.com
caCertDataSOPS: REDACTED/FOR/SECURITY/LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUMvakNDQWVhZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWU==
userTokenSOPS: REDACTED/FOR/SECURITY/ZXlKaGJHY2lPaUpTVXpJMU5pSXNJbXRwWkNJNklrUXplRGhsV1hOQlNuaEZiM1ptTFVFeFREaDNhR3gzWlRoSllXRmtSWEZaVTNZNGRYa3dPRTVzVHpnaWZRLmV5SnBjM01pT2lKcmRXSmxjbTVsZEdWekwzTmxjblpwWTJWaFkyTnZkVzUScDFwbU96QW5odkNCS3VDaUdoaUE0SlNMMGR5TDlpMTdhTVFrVVZHYW1WTkdBSFYxMFJadzBKSnd5Rjh1YXpXMkMyTFlmNHZobmlyTDl4MkVsV2x0T01yRW9WUTJiM3djemx3N2xGWEFLMk5DYlJQNWxmVklfdlBOaDhTaGdVa0s3U0VzRFR0U1Y1LTBZU1FVZ3RpUzF5VlZqVi1lWjFrX2Q2Mnl6U0g2bkNzUGdSelNQakx1VHd1NEs4Ny05V1dDd3RwV0pCZW05eVcwcVROUDVCSWl3
userName: spark-team-a # Data Team name
sparkApplicationNamespace: spark-team-a # Namespace for running Spark jobs with Cluster role and Cluster role bindings to access Spark Operator
sparkServiceAccount: spark-team-a # Service account for running Spark jobs configured with IRSA
sparkVersions:
- 3.2
- 3.1
queues: # These are the queues available in YuniKorn
- dev
- test
- default
- prod
ttlSeconds: 86400 # 1 day TTL for terminated spark application
timeoutMillis: 180000
sparkUIUrl: http://localhost:8080
batchScheduler: yunikorn
sparkConf:
spark.kubernetes.executor.podNamePrefix: '{spark-application-resource-name}'
spark.eventLog.enabled: "true"
spark.kubernetes.allocation.batch.size: 2000
spark.kubernetes.allocation.batch.delay: 1s
spark.eventLog.dir: s3a://spark-32794/eventlog
spark.history.fs.logDirectory: s3a://spark-32794/eventlog
spark.hadoop.fs.s3a.impl: org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.fs.s3a.change.detection.version.required: false
spark.hadoop.fs.s3a.change.detection.mode: none
spark.hadoop.fs.s3a.fast.upload: true
spark.jars.packages: org.apache.hadoop:hadoop-aws:3.2.2
spark.hadoop.fs.s3a.aws.credentials.provider: com.amazonaws.auth.WebIdentityTokenCredentialsProvider # Use IRSA
# spark.hadoop.hive.metastore.uris: thrift://hms.endpoint.com:9083
spark.sql.warehouse.dir: s3a://spark-3279/warehouse
spark.sql.catalogImplementation: hive
spark.jars.ivy: /opt/spark/work-dir/.ivy2
spark.hadoop.fs.s3a.connection.ssl.enabled: false
sparkUIOptions:
ServicePort: 4040
ingressAnnotations:
nginx.ingress.kubernetes.io/rewrite-target: /$2
nginx.ingress.kubernetes.io/proxy-redirect-from: http://\$host/
nginx.ingress.kubernetes.io/proxy-redirect-to: /spark-applications/{spark-application-resource-name}/
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/configuration-snippet: |-
proxy_set_header Accept-Encoding "";
sub_filter_last_modified off;
sub_filter '<head>' '<head> <base href="/spark-applications/{spark-application-resource-name}/">';
sub_filter 'href="/' 'href="';
sub_filter 'src="/' 'src="';
sub_filter '/{{num}}/jobs/' '/jobs/';
sub_filter "setUIRoot('')" "setUIRoot('/spark-applications/{spark-application-resource-name}/')";
sub_filter "document.baseURI.split" "document.documentURI.split";
sub_filter_once off;
ingressTLS: # TODO configure with proper domain name
- hosts:
- localhost
secretName: localhost-tls-secret
driver:
env:
- name: STATSD_SERVER_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
- name: STATSD_SERVER_PORT
value: "8125"
- name: AWS_STS_REGIONAL_ENDPOINTS
value: "regional"
executor:
env:
- name: STATSD_SERVER_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
- name: STATSD_SERVER_PORT
value: "8125"
- name: AWS_STS_REGIONAL_ENDPOINTS
value: "regional"
sparkImages:
- name: apache/spark-py:v3.2.2
types:
- Python
version: "3.2"
- name: apache/spark:v3.2.2
types:
- Java
- Scala
version: "3.2"
s3Bucket: spark-3279
s3Folder: uploaded
sparkLogS3Bucket: spark-3279
sparkLogIndex: index/index.txt
batchFileLimit: 2016
sparkHistoryDns: localhost
gatewayDns: localhost
sparkHistoryUrl: http://localhost:8088
allowedUsers:
- '*'
blockedUsers:
- blocked_user_1
queues:
- name: dev
maxRunningMillis: 21600000
queueTokenSOPS: {}
dbStorageSOPS:
connectionString: jdbc:postgresql://bpg.abcdefgh.us-west-2.rds.amazonaws.com:5432/bpg?useUnicode=yes&characterEncoding=UTF-8&useLegacyDatetimeCode=false&connectTimeout=10000&socketTimeout=30000
user: bpg
password: <REDACTED/FOR/SECURITY>
dbName: bpg
statusCacheExpireMillis: 9000
server:
applicationConnectors:
- type: http
port: 8080
logging:
level: INFO
loggers:
com.apple.spark: INFO
sops: {}
Spark Operator addon has been deployed using Helm Chart with namespace as a YuniKorn has been deployed using helm chart in I am able to use |
Hey @vara-bonthu , if you are using IntelliJ for debugging, you can follow the steps below to run BPG locally with break points:
to speed up the start time, optionally you can set the |
switch postgres to mysql connector
I have tried to deploy the Batch Processing Gateway using the Helm chart in EKS cluster but i have hit an error saying
gzip: /app/app-config.gz: not in gzip format
.1/ Built a new docker image for (x86 arch) from
Dockerfile
in main branch and pushed the image to public ECR repo.public.ecr.aws/r1l5w1y9/batch-processing-gateway
. Here is the values.yaml2/ Generated a bpg config yaml file and converted this file to base64 encoded string and added this string to
encodedConfig
in Helm chart values.yaml3/ This deployment also uses AWS S3 bucket and PostgresRDS with IRSA configured for BPG service account added to bpg and bpg-helper deployments
Here is the Terraform code snippet for the deployment
I can see the output of ConfigMap value for
bpg
but the pod is still throwing that config is not found.Please advise if i am missing anything
Errors
I am seeing the following error from BPG and BPG-helper pods.
The text was updated successfully, but these errors were encountered: