All novice Linux users and seasoned sysadmins know that Cron is a great simple solution to the problem of regularly calling the same application, it is present in almost every Linux system, has simple syntax and a number of logging and warning functions that make it a great addition to system administrator tools.
However, the Cron solution has some limitations.
- The minimum time is limited to 1 minute – this is the shortest repeating interval that you can configure.
- Minimal support to avoid concurrency.
- It almost completely ignores the state of the command being called, that is, if the script fails, Cron will just keep trying (as it should), and you, as an administrator, should check and diagnose the problem.
- But what if you need to implement a robust “long-lived” application in Python or PHP?
Not everything can fit into convenient 1-minute time intervals for program code execution. In addition, there is an unnecessary overhead for loading the required libraries and opening file and socket descriptors each time the program is called for execution.
What if your application code needs to maintain a connection to a remote server and respond to events on an ad-hoc basis? Probably someone will say that these are already full-fledged demon programs. But the line between a daemon and a long-lived application in scripting languages is not very clear.
Services
Server system administrators are familiar with the concept of a “service” in the operating system, or more specifically, daemon programs that run without interacting with user terminal I / O. MacOS (and NetBSD) has LaunchD, and Linux currently has a SystemD-based init system. All of these OS-level solutions allow administrators to write a small amount of configuration that instructs the operating system to launch applications at a specific point in the OS life cycle (at boot, at logon, etc.) and ensure that they continue to run.
Thus, all of the above systems take care of a number of key aspects of application performance. They all control the process (s) by a process ID, they offer logging and output control, and they usually have options to control restarts and the number of attempts to start in case of an error.
Supervisor
All of the above is fine, but writing service specifications can be tricky. It is required that you understand the various levels of execution and complexity of the OS-scale service management system and have a good Russian academic background in computer science.
Fortunately, however, there is a simpler intermediate solution that gives us all the benefits (and more) without the tedious configuration of Supervisor.
Its architecture includes three main parts:
- supervisord is a daemon that keeps all software processes “running”.
- supervisorctl is a command line client for the daemon where you can load, start, stop, restart, and delete new supervisor jobs.
- web and XMLRPC interfaces to do most of what you can do from supervisorctl.
The last components are not interesting for us now.
Let’s assume that you have a Centos 7/8 server ready to go, otherwise you can deploy it to OpenStack Cloud 3hcloud, an Openstack Big Object Data Provider, or wherever you have a virtual machine or hardware server.
Installing Supervisor
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | [root@dell8 ~]# yum search supervisor Loaded plugins: fastestmirror Loading mirror speeds from cached hostfile * base: mirror.datacenter.by * epel: mirror.datacenter.by * extras: mirror.datacenter.by * nux-dextop: mirror.li.nux.ro * updates: mirror.datacenter.by ====================================================== N/S matched: supervisor ======================================================= nodejs-supervisor.noarch : A supervisor program for running nodejs programs python-simplevisor.noarch : Python simple daemons supervisor supervisor.noarch : A System for Allowing the Control of Process State on UNIX Name and summary matches only, use "search all" for everything. [root@dell8 ~]# [root@dell8 ~]# yum install supervisor.noarch [root@dell8 ~]# rpm -qi supervisor.noarch Name : supervisor Version : 3.4.0 Release : 1.el7 Architecture: noarch Install Date: Tue 01 Jun 2021 09:56:39 PM MSK Group : System Environment/Base Size : 2715962 License : ZPLv2.1 and BSD and MIT Signature : RSA/SHA256, Wed 11 Mar 2020 03:48:18 AM MSK, Key ID 6a2faea2352c64e5 Source RPM : supervisor-3.4.0-1.el7.src.rpm Build Date : Wed 11 Mar 2020 03:45:12 AM MSK Build Host : buildvm-ppc64le-10.ppc.fedoraproject.org Relocations : (not relocatable) Packager : Fedora Project Vendor : Fedora Project URL : http://supervisord.org/ Bug URL : https://bugz.fedoraproject.org/supervisor Summary : A System for Allowing the Control of Process State on UNIX Description : The supervisor is a client/server system that allows its users to control a number of processes on UNIX-like operating systems. [root@dell8 ~]# |
You can notice, that supervisor implemented as a python application if will check list of files in package:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | [root@dell8 ~]# rpm -ql supervisor.noarch /etc/logrotate.d/supervisor /etc/supervisord.conf /etc/supervisord.d /etc/tmpfiles.d/supervisor.conf /usr/bin/echo_supervisord_conf /usr/bin/pidproxy /usr/bin/supervisorctl /usr/bin/supervisord /usr/lib/python2.7/site-packages/supervisor ... /usr/lib/systemd/system/supervisord.service /usr/share/doc/supervisor-3.4.0 /usr/share/doc/supervisor-3.4.0/CHANGES.txt /usr/share/doc/supervisor-3.4.0/COPYRIGHT.txt /usr/share/doc/supervisor-3.4.0/LICENSES.txt /usr/share/doc/supervisor-3.4.0/README.rst /var/log/supervisor /var/run/supervisor [root@dell8 ~]# |
Obviously, any python application always have problems of compatibility, relability and performance. But in such context of managment this have not big meaning.Even if it will be python 2.7 in Centos 7.
Now you need check, that file of configuration contains clause ‘[include]’ for files of configurations of programm.
1 2 3 4 | [root@dell8 ~]# egrep -A 1 '^\[include\]' /etc/supervisord.conf [include] files = supervisord.d/*.ini [root@dell8 ~]# |
If everything is ok, we need activate and start supervisor daemon in OS.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | [root@dell8 ~]# systemctl enable supervisord Created symlink from /etc/systemd/system/multi-user.target.wants/supervisord.service to /usr/lib/systemd/system/supervisord.service. [root@dell8 ~]# systemctl start supervisord [root@dell8 ~]# systemctl status supervisord ● supervisord.service - Process Monitoring and Control Daemon Loaded: loaded (/usr/lib/systemd/system/supervisord.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2021-06-01 22:09:58 MSK; 3s ago Process: 11456 ExecStart=/usr/bin/supervisord -c /etc/supervisord.conf (code=exited, status=0/SUCCESS) Main PID: 11465 (supervisord) CGroup: /system.slice/supervisord.service └─11465 /usr/bin/python /usr/bin/supervisord -c /etc/supervisord.conf Jun 01 22:09:58 dell8 systemd[1]: Starting Process Monitoring and Control Daemon... Jun 01 22:09:58 dell8 systemd[1]: Started Process Monitoring and Control Daemon. [root@dell8 ~]# |
Connection to service nad tests for a Python’s case
Supervisorctl is a simple CLI interface with decent built in help. Let’s start by confirming the version we have running. Type version and hit enter:
1 2 3 4 5 6 7 8 9 10 11 12 13 | [root@dell8 ~]# supervisorctl supervisor> version 3.4.0 supervisor> supervisor> help default commands (type help <topic>): ===================================== add exit open reload restart start tail avail fg pid remove shutdown status update clear maintail quit reread signal stop version supervisor> |
These are the commands that are available within Supervisor. Don’t worry too much about these at the moment as we are going to define a Program first so enter exit to return to the normal command line.
Our first Supervisor’s Program (endless process) on python
Supervisor jobs or processes are called programs and are defined using simple syntax, either in the main supervisor configuration file or through separate files. We saw an include directive in supervisor.conf, which allows us to configure various programs as separate files.
First we’re going to need a Python application to run. This can be any long lived Python script but for the sake of this post I’ve created a simple application that will generate a random integer between 1 and 10 and sleep for that number of seconds. Except if it generates a 10 in which case it will crash:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | #!/usr/bin/env python3 # Sample Python script that generates a random sleep interval and will # occasionally crash - for testing Supervisor import logging from random import randint from time import sleep logging.basicConfig( format='%(asctime)s %(levelname)s %(message)s', level=logging.DEBUG ) def main(): while True: i = randint(1, 10) # trigger a crash if we get a 10 if i == 10: logging.error('Generated {}. Application Crashing'.format(i)) raise Exception('Application Crashing') else: logging.info('Generated {}. Sleeping'.format(i)) sleep(i) if __name__ == "__main__": print('Starting the simple test application') main() |
Let’s create this file in some suitable directory on the server and run:
1 2 3 4 5 6 7 8 9 10 11 | user@dell8 -> ./supervisor.py Starting the simple test application 2029-06-01 22:49:26,052 INFO Generated 2. Sleeping 2029-06-01 22:49:28,055 INFO Generated 5. Sleeping 2029-06-01 22:49:33,056 ERROR Generated 10. Application Crashing Traceback (most recent call last): File "./supervisor.py", line 29, in <module> main() File "./supervisor.py", line 21, in main raise Exception('Application Crashing') Exception: Application Crashing |
Now we need to write a Supervisor Program to run this very useful script. Programs are written in ini format and start with the directive program and the name of your application.
Now need to write a Supervisor Program to run this script. These programs are written in ini format and start with the directive program and the name of application. Names should be simple!
Firstly we’ll need to type them into supervisorctl and secondly any excess punctuation can break the ini format. This is the simplest form of Supervisor Program we can create. Let’s make a new file in our /etc/supervisor.d/ directory so we can load it:
1 2 3 4 5 6 7 8 | [root@dell8 supervisord.d]# pwd /etc/supervisord.d [root@dell8 supervisord.d]# cat supervisor_python_daemon.ini [program:supervisor_python_daemon] command=/usr/bin/python3 /opt/services/supervisor_python_daemon.py [root@dell8 supervisord.d]# |
Now we jump back into supervisorctl terminal. We need to run a couple of commands in Supervisor. First we need to reread to load the new config file. Then we need to add supervisor_python_daemon to add and start it. Finally we’ll check the status of the job:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | [root@dell8 supervisord.d]# supervisorctl supervisor> supervisor> reread supervisor_python_daemon: available supervisor> avail supervisor_python_daemon avail auto 999:999 supervisor> add supervisor_python_daemon supervisor_python_daemon: added process group supervisor> status supervisor_python_daemon RUNNING pid 15415, uptime 0:00:10 supervisor> status supervisor_python_daemon RUNNING pid 15415, uptime 0:00:17 supervisor> status supervisor_python_daemon STARTING supervisor> status supervisor_python_daemon RUNNING pid 15439, uptime 0:00:05 supervisor> |
Our app works, but are we getting the added benefit of all this work? Well, let’s see.
If you’ve run the app outside of Supervisor (or read the source code), you’ll see that we have a number of script outputs. We have a print statement on initialization, log messages, and an application crash trace. If you run the script in the terminal, you expect all of this output to be reflected in your shell. But where is it in Supervisor?
One of the most convenient features offered by the standard configuration is the ability to redirect Standard Out and Standard Error to log files. Supervisor does this by default and creates the main supervisord.log file.
1 2 3 4 5 6 7 | [root@dell8 supervisor]# tail -n 5 /var/log/supervisor/supervisord.log 2029-06-01 23:14:35,739 INFO spawned: 'supervisor_python_daemon' with pid 16022 2029-06-01 23:14:36,797 INFO success: supervisor_python_daemon entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2029-06-01 23:15:05,838 INFO exited: supervisor_python_daemon (exit status 1; not expected) 2029-06-01 23:15:06,841 INFO spawned: 'supervisor_python_daemon' with pid 16064 2029-06-01 23:15:07,908 INFO success: supervisor_python_daemon entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) [root@dell8 supervisor]# |
Keep watching and you should see the app crash time and time again, and every time, Supervisor brings it back to life. It’s a Pythonic Reincarnation Miracle!
Supervisor has many options for both the main application and the definition of programs. /Etc/supervisord.conf needs to be reviewed and edited as the file is well commented and the default configuration is likely to require some work.
To define a program, you should definitely read the documentation in the Program: X Section Values, which details all the parameters available to you. A couple of things to try:
- Make a change to your application and use supervisorctl to reload and restart it
- Limit your application to 3 runs before giving up (because we all know the definition of insanity)
- Try spinning up multiple copies of your application
- Redirect your application output to a different set of log files
Also please don’t forget that, having an application that comes back to life is no replacement for writing solid code in the first place. You should always monitor your logs and, where possible, fix issues that cause your applications to crash. Remember, exceptions should be exceptional.
If you have any questions or feedback, please feel free to reach out on email services@openstack.by.