Four Steps to Daemonize Your Go Programs

Table of Contents

If you have ever worked with Ruby, or have maybe maintained a Rails application, I am sure the name Sidekiq will sound familiar. For those unfamiliar with the project, Sidekiq is a job system for Ruby. It is a wildly popular project, and the author has turned it into a successful business.

None of the above would be relevant if Sidekiq’s author Mike Perham, in 2014, did not write a concise and informative post titled “Don’t Daemonize your Daemons!”. In it, he covers four guidelines to daemonizing programs correctly:

Log to STDOUT
Shut down on SIGTERM/SIGINT
Reload config on SIGHUP
Provide the necessary config file for your favorite init system to control your daemon

(You can also read the whole article on his website.)

So I was thinking, why don’t we explore how to apply these guidelines while daemonizing a Go program?

Website Observer #

The program in question is a simple command-line program that can monitor any website by sending periodic HTTP requests to it. If you ever heard of Datadog’s synthetic tests or Pingdom, think of our program as their little sibling.

The observer program will read its configuration from flags, environment variables, or a configuration file. If the configuration is not present as a flag, it will look into the ENV vars for it and then in a configuration file (if present). If nothing is found, it will use the default value or exit with an error depending on how crucial the configuration is.

To do this, we will use the namsral/flag package, which is a drop-in replacement for Go’s flag package, with the addition of parsing files and environment variables. Being a drop-in replacement means that using the namsral/flag package is as simple as using the flag package from the standard library.

First, observer will have a config type, which will encapsulate the configuration for the website that it will observe:

const defaultTick = 60 * time.Second

type config struct {
	contentType string
	server      string
	statusCode  int
	tick        time.Duration
	url         string
	userAgent   string
}

func (c *config) init(args []string) error {
	flags := flag.NewFlagSet(args[0], flag.ExitOnError)
	flags.String(flag.DefaultConfigFlagname, "", "Path to config file")

	var (
		statusCode  = flags.Int("status", 200, "Response HTTP status code")
		tick        = flags.Duration("tick", defaultTick, "Ticking interval")
		server      = flags.String("server", "", "Server HTTP header value")
		contentType = flags.String("content_type", "", "Content-Type HTTP header value")
		userAgent   = flags.String("user_agent", "", "User-Agent HTTP header value")
		url         = flags.String("url", "", "Request URL")
	)

	if err := flags.Parse(args[1:]); err != nil {
		return err
	}

	c.statusCode = *statusCode
	c.tick = *tick
	c.server = *server
	c.contentType = *contentType
	c.userAgent = *userAgent
	c.url = *url

	return nil
}

The init function will take the command line arguments as input and build a FlagSet, which represents a set of defined flags. Each of the flags is listed and parsed; then, their values are assigned to the config. Additionally, having the flag.DefaultConfigFilename as a flag as well enables our observer to load the configuration from a config.conf file. The .conf file has a key=value format, with new lines after each key-value pair.

Here’s the main function:

func main() {
	ctx := context.Background()
	ctx, cancel := context.WithCancel(ctx)

	c := &config{}

	defer func() {
		cancel()
	}()

	if err := run(ctx, c); err != nil {
		fmt.Fprintf(os.Stderr, "%s\n", err)
		os.Exit(1)
	}
}

Following Mat Ryer’s advice, we are going to keep main very thin while keeping the main logic of the observer in the run method. main here just sets up the main context that will propagate down to the run method, and it initializes the observer config. Then it passes all of the relevant arguments to the run method.

Here’s the run method:

func run(ctx context.Context, c *config) error {
	c.init(os.Args)

	for {
		select {
		case <-ctx.Done():
			return nil
		case <-time.Tick(c.tick):
			resp, err := http.Get(c.url)
			if err != nil {
				return err
			}

			if resp.StatusCode != c.statusCode {
				log.Printf("Status code mismatch, got: %d\n", resp.StatusCode)
			}

			if s := resp.Header.Get("server"); s != c.server {
				log.Printf("Server header mismatch, got: %s\n", s)
			}

			if ct := resp.Header.Get("content-type"); ct != c.contentType {
				log.Printf("Content-Type header mismatch, got: %s\n", ct)
			}

			if ua := resp.Header.Get("user-agent"); ua != c.userAgent {
				log.Printf("User-Agent header mismatch, got: %s\n", ua)
			}
		}
	}
}

First, the run method initializes the config instance c, using the init method. Then, it loops infinitely until the context ctx is done. When ctx is done, it means the observer process is terminated, so it merely returns a nil and finishes with its execution.

Alternatively, it will execute the other case every tick. By using the time.Tick channel here we run this code by receiving a signal through the channel every c.tick period. For example, if c.tick is 30 seconds, we will receive a signal every 30 seconds, meaning the code will run every 30 seconds.

The code itself is simple – it sends an HTTP GET request to the URL assigned to c.url. Once the response returns, the run method compares the relevant response headers and the status code with the once provided through the configuration. If any mismatch is detected, it logs the error.

Running the observer is relatively simple. One way is to supply a config file through the command line:

$ ./observer -config ./config.conf
2020/04/23 19:41:54 Status code mismatch, got: 200

Alternatively, using flags:

$ ./observer -status=500 -tick=10s -url=https://ieftimov.com -server=Cloudflare
2020/04/23 19:43:34 Status code mismatch, got: 200
2020/04/23 19:43:34 Server header mismatch, got: cloudflare
2020/04/23 19:43:34 Content-Type header mismatch, got: text/html; charset=utf-8

We can do the same using environment variables, or a combination of all three: config file, environment variables, and flags.

Now, how can we apply the four simple rules of daemonization?

Logging to `STDOUT` #

While daemons don’t have much to do with web services, one of the 12 factors of modern web services are treating logs as event streams. While the 12 factors in this particular case are not applicable, the guiding principle stays: the daemon itself should not manage log streams, nor it should not concern itself with writing to or managing log files. Instead, daemons should send their log stream, unbuffered, to STDOUT.

The service management system will capture each daemon’s stream. The init config file is what we will use to configure logging, such as where the logs should be stored or streamed.

So, how can we adapt the observer to log to STDOUT?

First, we will add another argument to the run function, called out of type io.Writer. Then, we will invoke the log.SetOutput function passing the out as argument to it.

func run(ctx context.Context, c *config, out io.Writer) error {
	c.init(os.Args)
	log.SetOutput(out)

	for {
		select {
		case <-ctx.Done():
			return nil
		case <-time.Tick(c.tick):
			// Identical to above, removed from brewity
		}
	}
}

By doing this, we will have to pass STDOUT from the main function, but we keep our run function more testable. Using a separate run method means we can invoke it with any instance that implements the io.Writer interface. We basically couple the run method to a behavior instead of type.

Then, we need to update the main function to pass the additional argument to the run function when invoking it. And the io.Writer will be simple os.Stdout:

func main() {
	ctx := context.Background()
	ctx, cancel := context.WithCancel(ctx)

	c := &config{}

	defer func() {
		cancel()
	}()

	if err := run(ctx, c, os.Stdout); err != nil {
		fmt.Fprintf(os.Stderr, "%s\n", err)
		os.Exit(1)
	}
}

If we run the program again we won’t see a difference:

$ ./observer -status=500 -tick=10s -url=https://ieftimov.com -server=Cloudflare
2020/04/23 19:43:34 Status code mismatch, got: 200
2020/04/23 19:43:34 Server header mismatch, got: cloudflare
2020/04/23 19:43:34 Content-Type header mismatch, got: text/html; charset=utf-8

Why is that? Well, the log package logs to STDERR by default, so there is no visible change of behavior there. Still, we make the dependency on an output stream explicit to the run function, which clearly states that run needs to know where to send its logs when running.

Shut down on `SIGTERM`/`SIGINT` #

In Go, having errors as values is very helpful to think about what will happen to our program if an error is returned. While this makes our Go programs always have some repetitive error handling, it also gives us confidence that our program will gracefully handle any error.

Termination signals #

*nix operating systems (OS) employ a system of signals, which is a mechanism of the OS to ask a process to perform a particular action. There are two general types of signals: those that cause termination of a process and those that do not.

(Refer to the full list of the POSIX-defined signals to learn more.)

Using these system signals, a process that has received one can choose one of the following behaviors to take place: perform the default POSIX-defined action, ignore the signal, or catch the signal with a signal handler and perform some sort of a custom action.

Some signals that just can’t be caught or ignored; it means that the default action has to happen. For example, SIGSTOP and SIGKILL are such signals. Once a process receives any of these two signals, we just know that it will be stopped/killed by the OS.

But other signals are more polite. While we cannot ignore them, they give a chance to our process to clean up and go away with grace. Most of the ones on the list are of the polite kind. In this section, we will look into the SIGTERM and SIGINT signals and how we can treat them in our Go programs.

Handling `SIGTERM` & `SIGNIT` #

The os/signal package implements access to incoming signals with the purpose of signal handling. Through the Notify function, a Go program can accept signals thorough a channel of type os.Signal.

In our observer’s case, we don’t have to do any cleanup once it receives a SIGTERM/SIGINT. All we have to do is to stop further execution and shut down gracefully. So, how can we achieve that?

First, we need to create a channel through which we will accept these two signals:

signalChan := make(chan os.Signal, 1)
signal.Notify(signalChan, syscall.SIGINT, syscall.SIGTERM)

Once the observer process receives a SIGINT or a SIGTERM, it will proxy it through the signalChan channel. To process the signals, we would need to create a goroutine that will receive signals through the signalChan. Once it gets a signal, it will have to cancel() the context, which would stop the further execution of the run method:

go func() {
        select {
        case = <-signalChan:
                log.Printf("Got SIGINT/SIGTERM, exiting.")
                cancel()
                os.Exit(1)
        case <-ctx.Done():
                log.Printf("Done.")
                os.Exit(1)
        }
}()

So, once the cancel function is executed, in the for loop of the run method the execution will stop:

func run(ctx context.Context, c *config, stdout io.Writer) error {
        c.init(os.Args)
        log.SetOutput(os.Stdout)

        for {
                select {
                case <-ctx.Done():
                        return nil
                case <-time.Tick(c.tick):
                        // Same as above...
                }
        }
}

The last thing we need to do in the main function is to close the signalChan channel when the programs exits:

func main() {
	// Same as above...

	defer func() {
                signal.Stop(signalChan)
		cancel()
	}()

	// Same as above...
}

The Stop function will stop relaying incoming signals to signalChan. When Stop returns, it is guaranteed that signalChan will receive no more signals.

Let’s run the observer program and see the signal handling in action:

$ ./observer -config=config.conf
2020/04/26 00:14:46 Status code mismatch, got: 200
...
2020/04/26 00:15:46 Status code mismatch, got: 200

Now, having the PID of observer, we can send any signal using the kill command line tool:

$ kill -SIGINT 37212

By executing the kill command, we will send a SIGINT to the observer process. This will force observer to wrap up the execution, log a line to STDOUT and exit:

$ ./observer -config=config.conf
2020/04/26 00:29:22 Status code mismatch, got: 200
2020/04/26 00:29:22 Got SIGINT/SIGTERM, exiting.
exit status 1

We can try the same exercise with SIGTERM as well:

› kill -SIGTERM 37827

Causes observer to exit with the same behavior:

$ ./observer -config=config.conf
2020/04/26 00:33:42 Status code mismatch, got: 200
2020/04/26 00:33:44 Got SIGINT/SIGTERM, exiting.
exit status 1

Reload config on `SIGHUP` #

Now that we know how to handle signals, we need to add another signal to the mix - SIGHUP. To do that, we can just add syscall.SIGHUP to the signal.Notify call:

signalChan := make(chan os.Signal, 1)
signal.Notify(signalChan, syscall.SIGINT, syscall.SIGTERM, syscall.SIGHUP)

Now that we have SIGHUP covered, in the goroutine that handles the signals, once a SIGHUP is received, it should re-run the config.init method. By doing that, we will reload the configuration of the observer, loading any changes in the configuration:

go func() {
        select {
        case s := <-signalChan:
                switch s {
                case syscall.SIGINT, syscall.SIGTERM:
                        log.Printf("Got SIGINT/SIGTERM, exiting.")
                        cancel()
                        os.Exit(1)
                case syscall.SIGHUP:
                        log.Printf("Got SIGHUP, reloading.")
                        c.init(os.Args)
                }
        case <-ctx.Done():
                log.Printf("Done.")
                os.Exit(1)
        }
}()

The change is relatively small. By using a switch construct, detect the received signal. If it’s a SIGHUP, we invoke c.init(os.Args). Otherwise, we cancel() the context and os.Exit the program.

We can test this using the same trick from before:

$ kill -SIGHUP 38761

Will cause the observer to reload:

$ ./observer -config=config.conf
2020/04/26 01:15:40 Status code mismatch, got: 200
2020/04/26 01:15:44 Got SIGHUP, reloading.

This looks nice. Let’s shut down the server now by sending a SIGTERM:

$ kill -SIGTERM 38761

In case you’re following along, you will find out that the observer is still running; this is a bug – the goroutine that was receiving signals exited because the select construct completed once it received the SIGHUP.

To make the goroutine accept signals without exiting, we need to make the goroutine run infinitely – using a for loop:

go func() {
        for {
                select {
                case s := <-signalChan:
                        switch s {
                        case syscall.SIGINT, syscall.SIGTERM:
                                log.Printf("Got SIGINT/SIGTERM, exiting.")
                                cancel()
                                os.Exit(1)
                        case syscall.SIGHUP:
                                log.Printf("Got SIGHUP, reloading.")
                                c.init(os.Args)
                        }
                case <-ctx.Done():
                        log.Printf("Done.")
                        os.Exit(1)
                }
        }
}()

By wrapping the whole goroutine in a for loop, we will make sure that it will not exit, except when a SIGINT/SIGTERM is received, or if the context is done. By having this endless goroutine, we also can send multiple SIGHUPs to the observer, and it will process them correctly.

Let’s send two SIGHUPs, to perform two reloads, and SIGTERM to shut down the observer:

$ kill -SIGHUP 38960
$ kill -SIGHUP 38960
$ kill -SIGTERM 38960

And the observer output:

$ ./observer -config=config.conf
2020/04/26 01:25:02 Status code mismatch, got: 200
2020/04/26 01:25:03 Got SIGHUP, reloading.
2020/04/26 01:25:05 Got SIGHUP, reloading.
2020/04/26 01:25:08 Got SIGINT/SIGTERM, exiting.

And that’s it. The observer now knows how to log to STDOUT, gracefully exit when it receives SIGINT or SIGTERM and reloads the configuration when it receives a SIGHUP.

Provide the necessary config file for your favorite init system to control your daemon #

Now, given that computer of choice is a MacBook, I will explain here how you can create a config file for launchd – macOS’s service management framework for starting, stopping and managing daemons, applications, processes, and scripts. In macOS, the system runs daemons, while the users run programs as agents. So, we will turn our observer into an agent.

In the past, I have written about creating and managing macOS agents, so if you would like to more about this topic, you can head read that as well. Still, let’s see a minimal launchd configuration for observer:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
    <dict>
        <key>Label</key>
        <string>com.ieftimov.observer</string>

        <key>RunAtLoad</key>
        <true/>

        <key>KeepAlive</key>
        <true/>

        <key>ProgramArguments</key>
        <array>
          <string>/usr/local/bin/observer</string>
          <string>-config</string>
          <string>/etc/observer.conf</string>
        </array>

        <key>StandardOutPath</key>
        <string>/tmp/observer.log</string>

        <key>StandardErrorPath</key>
        <string>/tmp/observer.error.log</string>
    </dict>
</plist>

The configuration is relatively straightforward, here are all of the pieces in order:

The Label identifies the job and has to be unique for the launchd instance. Think of it as a unique name for the given agent.
RunAtLoad means launchd will start the job as soon as it loads it.
KeepAlive tells launchd to keep the agent running no matter what.
ProgramArguments provides command-line options to the agent command. In our case, this will create the following command: /usr/local/bin/observer -config /etc/observer.conf.
StandardOutPath and StandardErrorPath are the paths to where launchd will write the respective output. In our case, we write these to the tmp directory. An alternative would be to add the log files to /var/log, but that requires granting write access of the agent to /var/log.

To make sure we can run the agent, we have to also supply the configuration file observer.conf in the /etc directory. On my machine, its contents are as follows:

status=500
tick=30s
url=https://ieftimov.com
server=cloudflare
content_type=text/html; charset=utf-8
user_agent=

After placing the observer.conf file in /etc, for the agent to work, we have to place its .plist file in ~/Library/LaunchAgents, and load it with:

$ launchctl load ~/Library/LaunchAgents/com.ieftimov.observer.plist

Now, if we would tail -f the log files in /tmp we will see its outputs there:

$ tail -f /tmp/observer.*

==> /tmp/observer.error.log <==

==> /tmp/observer.log <==
2020/05/02 11:49:03 Status code mismatch, got: 200

Voila! The agent is running and its logging output to STDOUT, while launchd is redirecting that output to a log file.

If we would like to run the observer on GNU/Linux, we cannot use this launchd configuration.

In Linux-land, systemd is widespread and popular. If you are interested in a deeper explanation of systemd units and unit files, Digital Ocean’s blog has an article on “Understanding Systemd Units and Unit Files” by Justin Ellingwood. I recommend reading it! And keep in mind, the community’s opinion on systemd is pretty divided.

There are a bunch of other alternatives, but my knowledge of GNU/Linux init systems is minimal. Therefore, I will stop right here and ask for your help: if you would like to contribute a Linux init system configuration to this article, drop the link to a gist/repo in the comments, and I will include it in this article.

Of course, with proper attribution.

You can see the final implementation of the observer here.

Website Observer #

Logging to STDOUT #

Shut down on SIGTERM/SIGINT #