Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dcos Spark doesn’t run jobs #70

Open
janpavtel opened this issue Oct 13, 2016 · 6 comments
Open

dcos Spark doesn’t run jobs #70

janpavtel opened this issue Oct 13, 2016 · 6 comments

Comments

@janpavtel
Copy link

Please answer the following questions before submitting your issue. Thanks!

What version of DC/OS + DC/OS CLI are you using (dcos --version)?

dcoscli.version=0.4.13
dcos.version=1.7.0
dcos.commit=92d61c576b3fe0dd1b8b15e7695b55ff7ce254fd
dcos.bootstrap-id=0ab2e04446f34465aed3b1ffb4f56836d681d6c7

What operating system and version are you using?

Ubuntu 16.04 LTS

What did you do?

  • Create dcos cluster using Azure Container Service
  • Install spark via CLI:
./dcos package install spark

Installing Marathon app for package [spark] version [1.0.2-2.0.0]
Installing CLI subcommand for package [spark] version [1.0.2-2.0.0]
New command available: dcos spark
DC/OS Spark is being installed!

    Documentation: https://docs.mesosphere.com/current/usage/service-guides/spark/
    Issues: https://docs.mesosphere.com/support/

  • Run task
./dcos spark run --submit-args='-Dspark.mesos.coarse=true --driver-cores 1 --driver-memory 1024M --class org.apache.spark.examples.SparkPi https://downloads.mesosphere.com/spark/assets/spark-examples_2.10-1.4.0-SNAPSHOT.jar 30'

Run job succeeded. Submission id: driver-20161013074900-0001
  • Check job status after 20 min
./dcos spark status driver-20161013074900-0001
Submission ID: driver-20161013074900-0001
Driver state: QUEUED

What did you expect to see?

Spark should run jobs.

What did you see instead?

Job is in queued state all the time.

Spark is listed in packages but not as service

./dcos package list
NAME         VERSION      APP                   COMMAND  DESCRIPTION                                                                                                                                         
chronos      2.4.0        /chronos-default      ---      A fault tolerant job scheduler for Mesos which handles dependencies and ISO8601 based schedules.                                                    
marathon-lb  1.2.2        /marathon-lb-default  ---      HAProxy configured using Marathon state                                                                                                             
spark        1.0.2-2.0.0  /spark                spark    Spark is a fast and general cluster computing system for Big Data.  Documentation: https://docs.mesosphere.com/current/usage/service-guides/spark/  
./dcos service
NAME         HOST     ACTIVE  TASKS  CPU   MEM     DISK  ID                                         
chronos   10.32.0.4    True     0    0.0   0.0     0.0   b7a4a4e3-c62f-4175-bc54-c7305a411174-0002  
marathon  172.16.0.5   True     7    6.0  7168.0  128.0  0f2fb632-3277-41d4-be0a-ed90d5c7c27d-0000  

System stats

{
    "allocator/event_queue_dispatches": 0,
    "frameworks/marathon/messages_processed": 117795,
    "frameworks/marathon/messages_received": 117795,
    "master/cpus_percent": 0.6,
    "master/cpus_revocable_percent": 0,
    "master/cpus_revocable_total": 0,
    "master/cpus_revocable_used": 0,
    "master/cpus_total": 10,
    "master/cpus_used": 6,
    "master/disk_percent": 0.000538493899873791,
    "master/disk_revocable_percent": 0,
    "master/disk_revocable_total": 0,
    "master/disk_revocable_used": 0,
    "master/disk_total": 237700,
    "master/disk_used": 128,
    "master/dropped_messages": 0,
    "master/elected": 1,
    "master/event_queue_dispatches": 25,
    "master/event_queue_http_requests": 0,
    "master/event_queue_messages": 0,
    "master/frameworks_active": 2,
    "master/frameworks_connected": 2,
    "master/frameworks_disconnected": 0,
    "master/frameworks_inactive": 0,
    "master/invalid_executor_to_framework_messages": 0,
    "master/invalid_framework_to_executor_messages": 0,
    "master/invalid_status_update_acknowledgements": 0,
    "master/invalid_status_updates": 0,
    "master/mem_percent": 0.241224970553592,
    "master/mem_revocable_percent": 0,
    "master/mem_revocable_total": 0,
    "master/mem_revocable_used": 0,
    "master/mem_total": 29715,
    "master/mem_used": 7168,
    "master/messages_authenticate": 0,
    "master/messages_deactivate_framework": 0,
    "master/messages_decline_offers": 2125953,
    "master/messages_executor_to_framework": 0,
    "master/messages_exited_executor": 0,
    "master/messages_framework_to_executor": 0,
    "master/messages_kill_task": 609,
    "master/messages_launch_tasks": 30777,
    "master/messages_reconcile_tasks": 38819,
    "master/messages_register_framework": 2,
    "master/messages_register_slave": 1,
    "master/messages_reregister_framework": 909,
    "master/messages_reregister_slave": 13,
    "master/messages_resource_request": 0,
    "master/messages_revive_offers": 3893,
    "master/messages_status_update": 43410,
    "master/messages_status_update_acknowledgement": 43402,
    "master/messages_suppress_offers": 0,
    "master/messages_unregister_framework": 0,
    "master/messages_unregister_slave": 0,
    "master/messages_update_slave": 14,
    "master/outstanding_offers": 0,
    "master/recovery_slave_removals": 0,
    "master/slave_registrations": 1,
    "master/slave_removals": 0,
    "master/slave_removals/reason_registered": 0,
    "master/slave_removals/reason_unhealthy": 0,
    "master/slave_removals/reason_unregistered": 0,
    "master/slave_reregistrations": 4,
    "master/slave_shutdowns_canceled": 0,
    "master/slave_shutdowns_completed": 0,
    "master/slave_shutdowns_scheduled": 0,
    "master/slaves_active": 5,
    "master/slaves_connected": 5,
    "master/slaves_disconnected": 0,
    "master/slaves_inactive": 0,
    "master/task_failed/source_slave/reason_container_launch_failed": 18737,
    "master/task_killed/source_master/reason_framework_removed": 1,
    "master/task_killed/source_slave/reason_executor_unregistered": 4,
    "master/task_lost/source_slave/reason_executor_terminated": 2,
    "master/tasks_error": 0,
    "master/tasks_failed": 19649,
    "master/tasks_finished": 8623,
    "master/tasks_killed": 605,
    "master/tasks_killing": 0,
    "master/tasks_lost": 2,
    "master/tasks_running": 7,
    "master/tasks_staging": 0,
    "master/tasks_starting": 0,
    "master/uptime_secs": 10799156.4709491,
    "master/valid_executor_to_framework_messages": 0,
    "master/valid_framework_to_executor_messages": 0,
    "master/valid_status_update_acknowledgements": 43402,
    "master/valid_status_updates": 43410,
    "registrar/queued_operations": 0,
    "registrar/registry_size_bytes": 1159,
    "registrar/state_fetch_ms": 4.617984,
    "registrar/state_store_ms": 6.88896,
    "system/cpus_total": 2,
    "system/load_15min": 0.15,
    "system/load_1min": 0.16,
    "system/load_5min": 0.17,
    "system/mem_free_bytes": 338624512,
    "system/mem_total_bytes": 7305834496
}

from dcos-cli issue

@debasishg
Copy link

Also facing the same issue with the same example.

ubuntu@ip-10-10-1-77:~/dcos$ dcos --version
dcoscli.version=0.4.14
dcos.version=1.8.6
dcos.commit=cfccfbf84bbba30e695ae4887b65db44ff216b1d
dcos.bootstrap-id=405172d16eaff8798d6b090dac99b51a8a9004d7```

@debasishg
Copy link

Looks like I have been able to fix this issue. In my case I noticed that spark was being shown in Completed Frameworks instead of Active Frameworks in http://dcos_url/mesos.

I uninstalled Spark as ..

  1. dcos package uninstall spark
  2. Remove the znode from ZK for Spark. This is the vital step which I was missing earlier (https://docs.mesosphere.com/1.8/usage/service-guides/spark/uninstall/). ZK maintains state which does not get cleaned by uninstall and have to be cleaned manually.

Reinstall Spark and now the submit works and the job finishes.

@ignacio-dc
Copy link

I am having this exact same issue but in AWS with a fresh install, and spark is the only service installed

@mgummelt
Copy link
Contributor

mgummelt commented Jul 5, 2017

Hi @ignacio-dc. Please ensure that your Spark Dispatcher is properly registered by verifiying that it appears in the active frameworks listed in /mesos/state.json. If it doesn't, it's likely that you failed to fully uninstall Spark from a previous install, and must do that: https://docs.mesosphere.com/1.8/usage/service-guides/spark/uninstall/)

If you continue to have problems, please open a new issue. This issue has been closed.

@skonto
Copy link
Contributor

skonto commented Nov 21, 2017

@ArtRand @susanxhuynh lets close this.

@hantuzun
Copy link

hantuzun commented Nov 21, 2017

I'm experiencing the same error now but it's about the installation of Spark, not about running jobs. We may close this issue.

Edit: My new issue is Spark package fails to install with permission errors #208

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants