Skip to content

Starting a production server for Shipyard (Django)

Don Kirkby edited this page Feb 4, 2015 · 34 revisions

Starting a production server

This is where I am going to record notes on setting up a production server for Shipyard (a Django web application) on our production cluster. I will be following the documentation provided by the Django developers so much of this may be redundant - this will just be a scratch-pad to track this process rather than a formal document.

Apache

  • The cluster is running CentOS release 5.9 (Final). This is the distribution associated with Penguin Computing Scyld Clusterware 5.9 (note, our support subscription for Scyld has expired so we no longer have access to their yum repositories; not sure whether this will cause problems).
  • Apache is installed (version 2.2.3) but not running.
  • Let's see if we can start up the Apache server:
[art@Bulbasaur httpd]$ sudo /usr/sbin/apachectl start
httpd: apr_sockaddr_info_get() failed for Bulbasaur
httpd: Could not reliably determine the server's fully qualified domain name, using 127.0.0.1 for ServerName
  • Navigating to localhost in a web browser on the cluster gives the Penguin Computing splash page, so it seems to be function despite the failed domain name resolution.
  • I'm also able to hit this splash page from my workstation by opening 192.168.69.179
  • This is an acceptable level of functionality, but we might want to look into setting ServerName in the future.

mod_wsgi

  • This is an Apache server module that provides a WSGI-compliant interface for Python applications.
  • Not installed on cluster. Let's do this from the yum repo:
[art@Bulbasaur httpd]$ sudo yum install mod_wsgi.x86_64
  • mod_wsgi.so now appears in /etc/httpd/modules
  • Created backup of /etc/httpd/conf/httpd.conf
  • Added line to httpd.conf:
LoadModule wsgi_module modules/mod_wsgi.so
  • Restart Apache with sudo /usr/sbin/apachectl restart

Installing Django

  • Django is not installed on the cluster. Python 2.7.3 is installed at /usr/local/bin
  • Downloaded and installed Django-1.6.7 (source at /home/art/src)
  • Where should we clone the Shipyard git repo? We need to consider this because we need to configure Apache to allow access to this path.
  • Put in /usr/local/share because that is where the MiSeq repo currently lives, and it seems a reasonable place to put a Django app (local because it is not distributed with the system, share because it is architecture independent)
  • git is version 1.7.12.4, which is a little old. Might need an update.
  • Cloned repo to /usr/local/share/Shipyard

Configuring Apache for Django

  • Following the basic configuration instructions, add the following to httpd.conf, replacing /path/to/mysite.com with the absolute path to the Shipyard installation and making the required changes for versions of Apache older than 2.4:
WSGIScriptAlias / /path/to/mysite.com/mysite/wsgi.py
WSGIPythonPath /path/to/mysite.com

<Directory /path/to/mysite.com/mysite>
<Files wsgi.py>
Order deny,allow
Allow from all
</Files>
</Directory>
  • Internal Server Error; from /var/httpd/log/error_log:
[Fri Oct 03 13:17:16 2014] [error] [client 192.168.69.152] mod_wsgi (pid=25139): Target WSGI script '/usr/local/share/Shipyard/shipyard/shipyard/wsgi.py' cannot be loaded as Python module.
[Fri Oct 03 13:17:16 2014] [error] [client 192.168.69.152] mod_wsgi (pid=25139): Exception occurred processing WSGI script '/usr/local/share/Shipyard/shipyard/shipyard/wsgi.py'.
[Fri Oct 03 13:17:16 2014] [error] [client 192.168.69.152] Traceback (most recent call last):
[Fri Oct 03 13:17:16 2014] [error] [client 192.168.69.152]   File "/usr/local/share/Shipyard/shipyard/shipyard/wsgi.py", line 27, in ?
[Fri Oct 03 13:17:16 2014] [error] [client 192.168.69.152]     from django.core.wsgi import get_wsgi_application
[Fri Oct 03 13:17:16 2014] [error] [client 192.168.69.152] ImportError: No module named django.core.wsgi
  • /usr/local/share/Shipyard/shipyard/shipyard/wsgi.py exists
  • I'm able to load get_wsgi_application using the import statement above
  • Googling around suggests that this is a file permissions issue; I set wsgi.py to a+x
  • This might be a problem with using yum to install mod_wsgi.
  • This might be a problem with using yum to install mod_wsgi:
[art@Bulbasaur modules]$ ldd mod_wsgi.so 
        linux-vdso.so.1 =>  (0x00007fff14ee3000)
        libpython2.4.so.1.0 => /usr/lib64/libpython2.4.so.1.0 (0x00002b3cb762f000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b3cb7964000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00002b3cb7b81000)
        libutil.so.1 => /lib64/libutil.so.1 (0x00002b3cb7d85000)
        libm.so.6 => /lib64/libm.so.6 (0x00002b3cb7f88000)
        libc.so.6 => /lib64/libc.so.6 (0x00002b3cb820c000)
        /lib64/ld-linux-x86-64.so.2 (0x0000003821e00000)

It seems to be pointing to Python2.4 and not Python2.7

  • Download mod_wsgi source code from GitHub
  • in order to configure and install mod_wsgi from source, we need Apache apxs2, "a tool for building and installing extension modules for the Apache HyperText Transfer Protocol (HTTP) server".
  • install with yum via httpd-devel.x86_64
  • apxs is now at /usr/sbin/apxs
  • ran ./configure and make in /home/art/src/mod_wsgi-4.3.0
  • compile error:
/usr/bin/ld: /usr/local/lib/libpython2.7.a(node.o): relocation R_X86_64_32 against `a local symbol' can not be used when making a shared object; recompile with -fPIC
/usr/local/lib/libpython2.7.a: could not read symbols: Bad value
collect2: ld returned 1 exit status
apxs:Error: Command failed with rc=65536
.
make: *** [src/server/mod_wsgi.la] Error 1
  • Python 2.7.3 was compiled without a shared library (no libpython2.7.so on system, only libpython2.7.a); see here

  • recompiling with ./configure --enable-shared - copy libpython2.7.so to where mod_wsgi expects to find it (/usr/local/lib seems to do the trick)

  • That did the trick! We're able to compile mod_wsgi -- this produced mod_wsgi-py27.so, which I copied over to /usr/lib64/httpd/modules/ (which is the target of /etc/httpd/modules)

  • Failed to restart Apache, had to copy over libpython2.7.so.1.0 to /usr/local/lib

  • And our reward is a new error message:

[Fri Oct 03 15:26:24 2014] [info] [client 192.168.69.152] mod_wsgi (pid=42958, process='', application='192.168.69.179|/shipyard'): Loading WSGI script '/usr/local/share/Shipyard/shipyard/shipyard/wsgi.py'.
[Fri Oct 03 15:26:25 2014] [error] [client 192.168.69.152] mod_wsgi (pid=42958): Exception occurred processing WSGI script '/usr/local/share/Shipyard/shipyard/shipyard/wsgi.py'.
[Fri Oct 03 15:26:25 2014] [error] [client 192.168.69.152] Traceback (most recent call last):
[Fri Oct 03 15:26:25 2014] [error] [client 192.168.69.152]   File "/usr/local/lib/python2.7/site-packages/django/core/handlers/wsgi.py", line 187, in __call__
[Fri Oct 03 15:26:25 2014] [error] [client 192.168.69.152]     self.load_middleware()
[Fri Oct 03 15:26:25 2014] [error] [client 192.168.69.152]   File "/usr/local/lib/python2.7/site-packages/django/core/handlers/base.py", line 44, in load_middleware
[Fri Oct 03 15:26:25 2014] [error] [client 192.168.69.152]     for middleware_path in settings.MIDDLEWARE_CLASSES:
[Fri Oct 03 15:26:25 2014] [error] [client 192.168.69.152]   File "/usr/local/lib/python2.7/site-packages/django/conf/__init__.py", line 54, in __getattr__
[Fri Oct 03 15:26:25 2014] [error] [client 192.168.69.152]     self._setup(name)
[Fri Oct 03 15:26:25 2014] [error] [client 192.168.69.152]   File "/usr/local/lib/python2.7/site-packages/django/conf/__init__.py", line 49, in _setup
[Fri Oct 03 15:26:25 2014] [error] [client 192.168.69.152]     self._wrapped = Settings(settings_module)
[Fri Oct 03 15:26:25 2014] [error] [client 192.168.69.152]   File "/usr/local/lib/python2.7/site-packages/django/conf/__init__.py", line 132, in __init__
[Fri Oct 03 15:26:25 2014] [error] [client 192.168.69.152]     % (self.SETTINGS_MODULE, e)
[Fri Oct 03 15:26:25 2014] [error] [client 192.168.69.152] ImportError: Could not import settings 'shipyard.settings' (Is it on sys.path? Is there an import error in the settings file?): No module named settings

But this is a run-of-the-mill Django error because we haven't created settings.py!

Configuring Shipyard

  • Following the INSTALL.md instructions, we make a copy of settings_default.py called settings.py in the same folder and edit the database settings.
  • Remember to set the SECRET_KEY!
  • Set the MEDIA_ROOT to a folder that the apache user can see, because that's where data sets and code resources get written. You might have to make a new directory for it.
  • Look like we're running into more user permission problems:
[Fri Oct 03 15:55:45 2014] [error] [client 192.168.69.152] mod_wsgi (pid=42952): Exception occurred processing WSGI script '/usr/local/share/Shipyard/shipyard/shipyard/wsgi.py'.
[Fri Oct 03 15:55:45 2014] [error] [client 192.168.69.152] Traceback (most recent call last):
[Fri Oct 03 15:55:45 2014] [error] [client 192.168.69.152]   File "/usr/local/lib/python2.7/site-packages/django/core/handlers/wsgi.py", line 187, in __call__
[Fri Oct 03 15:55:45 2014] [error] [client 192.168.69.152]     self.load_middleware()
[Fri Oct 03 15:55:45 2014] [error] [client 192.168.69.152]   File "/usr/local/lib/python2.7/site-packages/django/core/handlers/base.py", line 44, in load_middleware
[Fri Oct 03 15:55:45 2014] [error] [client 192.168.69.152]     for middleware_path in settings.MIDDLEWARE_CLASSES:
[Fri Oct 03 15:55:45 2014] [error] [client 192.168.69.152]   File "/usr/local/lib/python2.7/site-packages/django/conf/__init__.py", line 54, in __getattr__
[Fri Oct 03 15:55:45 2014] [error] [client 192.168.69.152]     self._setup(name)
[Fri Oct 03 15:55:45 2014] [error] [client 192.168.69.152]   File "/usr/local/lib/python2.7/site-packages/django/conf/__init__.py", line 50, in _setup
[Fri Oct 03 15:55:45 2014] [error] [client 192.168.69.152]     self._configure_logging()
[Fri Oct 03 15:55:45 2014] [error] [client 192.168.69.152]   File "/usr/local/lib/python2.7/site-packages/django/conf/__init__.py", line 80, in _configure_logging
[Fri Oct 03 15:55:45 2014] [error] [client 192.168.69.152]     logging_config_func(self.LOGGING)
[Fri Oct 03 15:55:45 2014] [error] [client 192.168.69.152]   File "/usr/local/lib/python2.7/logging/config.py", line 777, in dictConfig
[Fri Oct 03 15:55:45 2014] [error] [client 192.168.69.152]     dictConfigClass(config).configure()
[Fri Oct 03 15:55:45 2014] [error] [client 192.168.69.152]   File "/usr/local/lib/python2.7/logging/config.py", line 575, in configure
[Fri Oct 03 15:55:45 2014] [error] [client 192.168.69.152]     '%r: %s' % (name, e))
[Fri Oct 03 15:55:45 2014] [error] [client 192.168.69.152] ValueError: Unable to configure handler 'file': [Errno 13] Permission denied: '/shipyard.log'
  • Django is unable to create shipyard.log in the directory /usr/local/share/Shipyard/shipyard
  • In settings.py:
    'handlers': {
        'mail_admins': {
            'level': 'ERROR',
            'filters': ['require_debug_false'],
            'class': 'django.utils.log.AdminEmailHandler'
        },
        'console': {
            'level': 'DEBUG',
            'class': 'logging.StreamHandler',
            'formatter': 'debug'
        },
        'file': {
            'level': 'DEBUG',
            'class':'logging.handlers.RotatingFileHandler',
            'filename': 'shipyard.log',
            'formatter': 'debug',
            'maxBytes': 1024*1024*15, # 15MB
            'backupCount': 10
        }
    },

The filename entry should be specified as an absolute path, so Django was attempting to write a log file to the root folder. I changed this to /tmp/shipyard.log and it seems happy.

  • Still 500 error with this message:
[Thu Oct 09 15:27:52 2014] [error] [client 127.0.0.1]     raise ImproperlyConfigured("Error loading either pysqlite2 or sqlite3 modules (tried in that order): %s" % exc)
[Thu Oct 09 15:27:52 2014] [error] [client 127.0.0.1] ImproperlyConfigured: Error loading either pysqlite2 or sqlite3 modules (tried in that order): No module named _sqlite3
  • So Django doesn't see the sqlite3 module (caveat: I recently upgraded sqlite3 on the cluster in order to upgrade Firefox from an a3ncient version that is no longer supported.)

  • We needed to install sqlite-devel with yum so that _sqlite3.so is available

  • (by the way, setting $PYTHONPATH fouled up yum. Have to watch out for that.)

  • _sqlite3.so was installed at /opt/scyld/python/2.6.5 so it isn't available for our current Python (2.7.3 at /usr/local)

  • try to compile shared library to /usr/local instead

  • For now, I just copied over _sqlite3.so to /usr/local/lib/python2.7/lib-dynload/ and this seems to work.

  • now we have Shipyard running, but we get an OperationalError, unable to open database file

  • this is because I haven't run nukeDB.sh like I'm supposed to, so there IS no database file

  • this raises its own issues:

ValueError: Unable to configure handler 'file': [Errno 13] Permission denied: '/tmp/shipyard.log'
  • Why in the world would we have no permission to write to /tmp?
  • Because it was created with no write access:
-rw-r--r-- 1 apache  apache  3052 Oct  9 15:59 shipyard.log
  • I set the log file to 777.

  • nukeDB.bash still fails with Operational Error: unable to open database file

  • Django apparently does not have permission to write to the folder /usr/local/share/Shipyard/db

  • 777 this too!

  • another error from nukeDB.bash:

IntegrityError: method_method.driver_id may not be NULL
Traceback (most recent call last):
  File "<console>", line 1, in <module>
NameError: name 'm' is not defined
Traceback (most recent call last):
  File "<console>", line 1, in <module>
NameError: name 'in1' is not defined
Traceback (most recent call last):
  File "<console>", line 1, in <module>
NameError: name 'in1' is not defined
Traceback (most recent call last):
  File "<console>", line 1, in <module>
NameError: name 'm' is not defined
Traceback (most recent call last):
  File "<console>", line 1, in <module>
NameError: name 'out1' is not defined
Traceback (most recent call last):
  File "<console>", line 1, in <module>
NameError: name 'out1' is not defined
  • Yeah, this was all permissions-related. After conferring with Don and Richard L., I created a group called shipyard and made all human users members. Then I recursive chgrped the folder /usr/local/share/Shipyard and set permissions to 775.

  • Richard L. set a sticky bit so that any directories created within the folder inherit the same user and group permissions.

  • nukeDB.bash now runs to completion correctly

  • CSS not being served.

  • Following instructions at Django docs to modify httpd.conf accordingly

  • Not working, and we're running into problems having Shipyard served from [cluster IP]/shipyard instead of the URL root - it breaks all the Django URLs

  • serving from / instead

  • found typo in AliasMatch regular expression, that fixed the CSS problem

  • fail to run nukeForDemo.bash:

django.db.utils.IntegrityError: Problem installing fixture '/usr/local/share/Shipyard/shipyard/archive/fixtures/demo.json': Could not load archive.Dataset(pk=1): archive_dataset.date_modified may not be NULL

Further issues

  • Failed to execute Sums and Products default script:
OperationalError: attempt to write a readonly database
2014-10-10 15:24:10[DEBUG]django.db.backends.execute(): (0.000) QUERY = u'SELECT "archive_run"."id", "archive_run"."start_time", "archive_run"."end_time", "archive_run"."user_id", "archive_run"."pipeline_id", "archive_run"."name", "archive_run"."description", "archive_run"."parent_runstep_id" FROM "archive_run" WHERE "archive_run"."pipeline_id" = %s  LIMIT 21' - PARAMS = (3,); args=(3,)
  • had to change group and permissions of /db/shipyard.db

Installing PostgreSQL

Following instructions from the PostgreSQL wiki. I tried downloading the latest version (9.4) from the PostgreSQL web site, but there was some problem with the public keys. Instead, I just installed the version in the CentOS repository (8.4).

sudo yum install postgresql84-server

The service and chkconfig commands aren't on the path, so had to give the full path.

sudo /sbin/service postgresql initdb
sudo /sbin/chkconfig postgresql on
sudo /sbin/service postgresql start

Then followed the instructions in Shipyard's INSTALL file. Use 192.168.1.1 for the database host in settings.py. When it asked whether to make shipyard a superuser or grant other permissions, said no.

Used yum to install the python database driver.

yum list python-psycopg*
sudo yum install python-psycopg2.x86_64

That installed psycopg2 to the wrong version of Python, so I had to install it with pip. sudo doesn't get the LD_LIBRARY_PATH passed to it, so you have to be very explicit when you call pip on this system.

sudo LD_LIBRARY_PATH=:/usr/local/lib python /usr/local/bin/pip install psycopg2

Can't connect to PostgreSQL from other hosts, so add this line to /etc/postgresql/9.3/main/pg_hba.conf.

host    all         shipyard    192.168.1.0/24        md5

Need to make PostgreSQL listen on other IP address, so uncomment and modify the listen_addresses setting in /var/lib/pgsql/data/postgresql.conf.

listen_addresses = 'localhost,192.168.1.1'

Then restart PostgreSQL so it reads the configuration.

/etc/init.d/postgresql restart

Load OpenMPI for Apache

Ran the initDB.bash script, and tried to hit the web site at http://192.168.69.179/. I get an error because the wrong version of MPI is loaded.

ImportError: libmpi.so.1: cannot open shared object file: No such file or directory

Worked around it for now by changing when the import happens, so Apache starts. When I try the runfleet admin command, the workers can't see the Shipyard source code on the head node. Mount the source code folder according to instructions in the Beowulf Admin Guide (PDF) chapter on file systems.

Fixed a bunch of code to support the MEDIA_ROOT folder, then found that we hadn't deployed the static files in a while.

sudo LD_LIBRARY_PATH=:/usr/local/lib ./manage.py collectstatic

Time zone wasn't being set properly on the compute nodes, because the /usr/share/zoneinfo folder wasn't being mapped. Mapped it using the same steps as for the source code folder.

More permissions

Uploading a dataset failed with this error:

Error while adding datasets. [Errno 13] Permission denied: '/data/shipyard/Datasets/2015_01/2000A-V3LOOP_S2_L001_R1_001.fastq'

It looks like that directory is owned by dkirkby, so run the reset command under the apache user.

sudo -u apache LD_LIBRARY_PATH=$LD_LIBRARY_PATH ./manage.py reset -ldemo

When you run the fleet, it also has to run under the apache user.

sudo -u apache LD_LIBRARY_PATH=$LD_LIBRARY_PATH PATH=$PATH ./manage.py runfleet --workers 151 &>/dev/null &

If you want to see the top processes and where they are running during a pipeline run, you can use this:

ps -eo pcpu,pid,user,args|bpstat -P|python -c "import sys; print ''.join(['+' + line for line in sys.stdin])"|sort -g -k 2 -r | head -30

It reports all processes in the cluster with their CPU load, prepends the compute node number, sticks a + in front of that because processes that aren't owned by bproc don't get a node number, then sorts by descending CPU load, and prints the top 30.