Four Steps to Daemonize Your Go Programs
Table of Contents
If you have ever worked with Ruby, or have maybe maintained a Rails application, I am sure the name Sidekiq will sound familiar. For those unfamiliar with the project, Sidekiq is a job system for Ruby. It is a wildly popular project, and the author has turned it into a successful business.
None of the above would be relevant if Sidekiq’s author Mike Perham, in 2014, did not write a concise and informative post titled “Don’t Daemonize your Daemons!”. In it, he covers four guidelines to daemonizing programs correctly:
- Log to
STDOUT
- Shut down on
SIGTERM
/SIGINT
- Reload config on
SIGHUP
- Provide the necessary config file for your favorite init system to control your daemon
(You can also read the whole article on his website.)
So I was thinking, why don’t we explore how to apply these guidelines while daemonizing a Go program?
Website Observer #
The program in question is a simple command-line program that can monitor any website by sending periodic HTTP requests to it. If you ever heard of Datadog’s synthetic tests or Pingdom, think of our program as their little sibling.
The observer
program will read its configuration from flags, environment
variables, or a configuration file. If the configuration is not present as a
flag, it will look into the ENV
vars for it and then in a configuration file
(if present). If nothing is found, it will use the default value or exit with
an error depending on how crucial the configuration is.
To do this, we will use the namsral/flag
package, which is a drop-in replacement for
Go’s flag package, with the addition of parsing files and environment
variables. Being a drop-in replacement means that using the namsral/flag
package is as simple as using the flag
package from the standard library.
First, observer
will have a config
type, which will encapsulate the
configuration for the website that it will observe:
const defaultTick = 60 * time.Second
type config struct {
contentType string
server string
statusCode int
tick time.Duration
url string
userAgent string
}
func (c *config) init(args []string) error {
flags := flag.NewFlagSet(args[0], flag.ExitOnError)
flags.String(flag.DefaultConfigFlagname, "", "Path to config file")
var (
statusCode = flags.Int("status", 200, "Response HTTP status code")
tick = flags.Duration("tick", defaultTick, "Ticking interval")
server = flags.String("server", "", "Server HTTP header value")
contentType = flags.String("content_type", "", "Content-Type HTTP header value")
userAgent = flags.String("user_agent", "", "User-Agent HTTP header value")
url = flags.String("url", "", "Request URL")
)
if err := flags.Parse(args[1:]); err != nil {
return err
}
c.statusCode = *statusCode
c.tick = *tick
c.server = *server
c.contentType = *contentType
c.userAgent = *userAgent
c.url = *url
return nil
}
The init
function will take the command line arguments as input and build a
FlagSet
, which represents a set of defined flags. Each of the flags is listed
and parsed; then, their values are assigned to the config
. Additionally,
having the flag.DefaultConfigFilename
as a flag as well enables our
observer
to load the configuration from a config.conf
file. The .conf
file has a key=value
format, with new lines after each key-value pair.
Here’s the main function:
func main() {
ctx := context.Background()
ctx, cancel := context.WithCancel(ctx)
c := &config{}
defer func() {
cancel()
}()
if err := run(ctx, c); err != nil {
fmt.Fprintf(os.Stderr, "%s\n", err)
os.Exit(1)
}
}
Following Mat Ryer’s
advice,
we are going to keep main
very thin while keeping the main logic of the
observer
in the run
method. main
here just sets up the main context that
will propagate down to the run
method, and it initializes the observer
config
. Then it passes all of the relevant arguments to the run
method.
Here’s the run
method:
func run(ctx context.Context, c *config) error {
c.init(os.Args)
for {
select {
case <-ctx.Done():
return nil
case <-time.Tick(c.tick):
resp, err := http.Get(c.url)
if err != nil {
return err
}
if resp.StatusCode != c.statusCode {
log.Printf("Status code mismatch, got: %d\n", resp.StatusCode)
}
if s := resp.Header.Get("server"); s != c.server {
log.Printf("Server header mismatch, got: %s\n", s)
}
if ct := resp.Header.Get("content-type"); ct != c.contentType {
log.Printf("Content-Type header mismatch, got: %s\n", ct)
}
if ua := resp.Header.Get("user-agent"); ua != c.userAgent {
log.Printf("User-Agent header mismatch, got: %s\n", ua)
}
}
}
}
First, the run
method initializes the config
instance c
, using the init
method. Then, it loops infinitely until the context ctx
is done. When ctx
is done, it means the observer
process is terminated, so it merely returns a
nil
and finishes with its execution.
Alternatively, it will execute the other case every tick
. By using the
time.Tick
channel here we run this code by receiving a signal through the
channel every c.tick
period. For example, if c.tick
is 30 seconds, we will
receive a signal every 30 seconds, meaning the code will run every 30 seconds.
The code itself is simple β it sends an HTTP GET
request to the URL assigned
to c.url
. Once the response returns, the run
method compares the relevant
response headers and the status code with the once provided through the
configuration. If any mismatch is detected, it logs the error.
Running the observer
is relatively simple. One way is to supply a config file
through the command line:
$ ./observer -config ./config.conf
2020/04/23 19:41:54 Status code mismatch, got: 200
Alternatively, using flags:
$ ./observer -status=500 -tick=10s -url=https://ieftimov.com -server=Cloudflare
2020/04/23 19:43:34 Status code mismatch, got: 200
2020/04/23 19:43:34 Server header mismatch, got: cloudflare
2020/04/23 19:43:34 Content-Type header mismatch, got: text/html; charset=utf-8
We can do the same using environment variables, or a combination of all three: config file, environment variables, and flags.
Now, how can we apply the four simple rules of daemonization?
Logging to STDOUT
#
While daemons don’t have much to do with web services, one of the 12 factors of
modern web services are treating logs as event streams. While the 12 factors in
this particular case are not applicable, the guiding principle stays: the
daemon itself should not manage log streams, nor it should not concern itself
with writing to or managing log files. Instead, daemons should send their log
stream, unbuffered, to STDOUT
.
The service management system will capture each daemon’s stream. The init
config file is what we will use to configure logging, such as where the logs
should be stored or streamed.
So, how can we adapt the observer
to log to STDOUT
?
First, we will add another argument to the run
function, called out
of type
io.Writer
. Then, we will invoke the log.SetOutput
function passing the
out
as argument to it.
func run(ctx context.Context, c *config, out io.Writer) error {
c.init(os.Args)
log.SetOutput(out)
for {
select {
case <-ctx.Done():
return nil
case <-time.Tick(c.tick):
// Identical to above, removed from brewity
}
}
}
By doing this, we will have to pass STDOUT
from the main
function, but we
keep our run
function more testable. Using a separate run
method means we
can invoke it with any instance that implements the io.Writer
interface. We
basically couple the run
method to a behavior instead of type.
Then, we need to update the main
function to pass the additional argument to
the run
function when invoking it. And the io.Writer
will be simple
os.Stdout
:
func main() {
ctx := context.Background()
ctx, cancel := context.WithCancel(ctx)
c := &config{}
defer func() {
cancel()
}()
if err := run(ctx, c, os.Stdout); err != nil {
fmt.Fprintf(os.Stderr, "%s\n", err)
os.Exit(1)
}
}
If we run the program again we won’t see a difference:
$ ./observer -status=500 -tick=10s -url=https://ieftimov.com -server=Cloudflare
2020/04/23 19:43:34 Status code mismatch, got: 200
2020/04/23 19:43:34 Server header mismatch, got: cloudflare
2020/04/23 19:43:34 Content-Type header mismatch, got: text/html; charset=utf-8
Why is that? Well, the log
package logs to STDERR
by
default, so
there is no visible change of behavior there. Still, we make the dependency on
an output stream explicit to the run
function, which clearly states that
run
needs to know where to send its logs when running.
Shut down on SIGTERM
/SIGINT
#
In Go, having errors as values is very helpful to think about what will happen to our program if an error is returned. While this makes our Go programs always have some repetitive error handling, it also gives us confidence that our program will gracefully handle any error.
Termination signals #
*nix operating systems (OS) employ a system of signals, which is a mechanism of the OS to ask a process to perform a particular action. There are two general types of signals: those that cause termination of a process and those that do not.
(Refer to the full list of the POSIX-defined signals to learn more.)
Using these system signals, a process that has received one can choose one of the following behaviors to take place: perform the default POSIX-defined action, ignore the signal, or catch the signal with a signal handler and perform some sort of a custom action.
Some signals that just can’t be caught or ignored; it means that the default
action has to happen. For example, SIGSTOP
and SIGKILL
are such signals.
Once a process receives any of these two signals, we just know that it will be
stopped/killed by the OS.
But other signals are more polite. While we cannot ignore them, they give a
chance to our process to clean up and go away with grace. Most of the ones on
the
list
are of the polite kind. In this section, we will look into the SIGTERM
and
SIGINT
signals and how we can treat them in our Go programs.
Handling SIGTERM
& SIGNIT
#
The os/signal
package implements access
to incoming signals with the purpose of signal handling. Through the
Notify
function, a Go program can
accept signals thorough a channel of type os.Signal
.
In our observer
’s case, we don’t have to do any cleanup once it receives a
SIGTERM
/SIGINT
. All we have to do is to stop further execution and shut
down gracefully. So, how can we achieve that?
First, we need to create a channel through which we will accept these two signals:
signalChan := make(chan os.Signal, 1)
signal.Notify(signalChan, syscall.SIGINT, syscall.SIGTERM)
Once the observer
process receives a SIGINT
or a SIGTERM
, it will proxy
it through the signalChan
channel. To process the signals, we would need to
create a goroutine that will receive signals through the signalChan
. Once it
gets a signal, it will have to cancel()
the context, which would stop the
further execution of the run
method:
go func() {
select {
case = <-signalChan:
log.Printf("Got SIGINT/SIGTERM, exiting.")
cancel()
os.Exit(1)
case <-ctx.Done():
log.Printf("Done.")
os.Exit(1)
}
}()
So, once the cancel
function is executed, in the for
loop of the run
method the execution will stop:
func run(ctx context.Context, c *config, stdout io.Writer) error {
c.init(os.Args)
log.SetOutput(os.Stdout)
for {
select {
case <-ctx.Done():
return nil
case <-time.Tick(c.tick):
// Same as above...
}
}
}
The last thing we need to do in the main
function is to close the
signalChan
channel when the programs exits:
func main() {
// Same as above...
defer func() {
signal.Stop(signalChan)
cancel()
}()
// Same as above...
}
The Stop
function will stop relaying incoming signals to signalChan
. When
Stop
returns, it is guaranteed that signalChan
will receive no more
signals.
Let’s run the observer
program and see the signal handling in action:
$ ./observer -config=config.conf
2020/04/26 00:14:46 Status code mismatch, got: 200
...
2020/04/26 00:15:46 Status code mismatch, got: 200
Now, having the PID of observer
, we can send any signal using the kill
command line tool:
$ kill -SIGINT 37212
By executing the kill
command, we will send a SIGINT
to the observer
process. This will force observer
to wrap up the execution, log a line to
STDOUT
and exit:
$ ./observer -config=config.conf
2020/04/26 00:29:22 Status code mismatch, got: 200
2020/04/26 00:29:22 Got SIGINT/SIGTERM, exiting.
exit status 1
We can try the same exercise with SIGTERM
as well:
βΊ kill -SIGTERM 37827
Causes observer
to exit with the same behavior:
$ ./observer -config=config.conf
2020/04/26 00:33:42 Status code mismatch, got: 200
2020/04/26 00:33:44 Got SIGINT/SIGTERM, exiting.
exit status 1
Reload config on SIGHUP
#
Now that we know how to handle signals, we need to add another signal to the
mix - SIGHUP
. To do that, we can just add syscall.SIGHUP
to the
signal.Notify
call:
signalChan := make(chan os.Signal, 1)
signal.Notify(signalChan, syscall.SIGINT, syscall.SIGTERM, syscall.SIGHUP)
Now that we have SIGHUP
covered, in the goroutine that handles the signals,
once a SIGHUP
is received, it should re-run the config.init
method. By
doing that, we will reload the configuration of the observer
, loading any
changes in the configuration:
go func() {
select {
case s := <-signalChan:
switch s {
case syscall.SIGINT, syscall.SIGTERM:
log.Printf("Got SIGINT/SIGTERM, exiting.")
cancel()
os.Exit(1)
case syscall.SIGHUP:
log.Printf("Got SIGHUP, reloading.")
c.init(os.Args)
}
case <-ctx.Done():
log.Printf("Done.")
os.Exit(1)
}
}()
The change is relatively small. By using a switch
construct, detect the
received signal. If it’s a SIGHUP
, we invoke c.init(os.Args)
. Otherwise,
we cancel()
the context and os.Exit
the program.
We can test this using the same trick from before:
$ kill -SIGHUP 38761
Will cause the observer
to reload:
$ ./observer -config=config.conf
2020/04/26 01:15:40 Status code mismatch, got: 200
2020/04/26 01:15:44 Got SIGHUP, reloading.
This looks nice. Let’s shut down the server now by sending a SIGTERM
:
$ kill -SIGTERM 38761
In case you’re following along, you will find out that the observer
is still
running; this is a bug βΒ the goroutine that was receiving signals exited
because the select
construct completed once it received the SIGHUP
.
To make the goroutine accept signals without exiting, we need to make the
goroutine run infinitely β using a for
loop:
go func() {
for {
select {
case s := <-signalChan:
switch s {
case syscall.SIGINT, syscall.SIGTERM:
log.Printf("Got SIGINT/SIGTERM, exiting.")
cancel()
os.Exit(1)
case syscall.SIGHUP:
log.Printf("Got SIGHUP, reloading.")
c.init(os.Args)
}
case <-ctx.Done():
log.Printf("Done.")
os.Exit(1)
}
}
}()
By wrapping the whole goroutine in a for
loop, we will make sure that it will
not exit, except when a SIGINT
/SIGTERM
is received, or if the context is
done. By having this endless goroutine, we also can send multiple SIGHUP
s to
the observer
, and it will process them correctly.
Let’s send two SIGHUP
s, to perform two reloads, and SIGTERM
to shut down
the observer
:
$ kill -SIGHUP 38960
$ kill -SIGHUP 38960
$ kill -SIGTERM 38960
And the observer
output:
$ ./observer -config=config.conf
2020/04/26 01:25:02 Status code mismatch, got: 200
2020/04/26 01:25:03 Got SIGHUP, reloading.
2020/04/26 01:25:05 Got SIGHUP, reloading.
2020/04/26 01:25:08 Got SIGINT/SIGTERM, exiting.
And that’s it. The observer
now knows how to log to STDOUT
, gracefully exit
when it receives SIGINT
or SIGTERM
and reloads the configuration when it
receives a SIGHUP
.
Provide the necessary config file for your favorite init system to control your daemon #
Now, given that computer of choice is a MacBook, I will explain here how you
can create a config file for launchd
β macOS’s
service management framework for starting, stopping and managing daemons,
applications, processes, and scripts. In macOS, the system runs daemons, while
the users run programs as agents. So, we will turn our observer
into an
agent.
In the past, I have written about creating and managing macOS
agents, so if you would like to
more about this topic, you can head read that as well. Still, let’s see a
minimal launchd
configuration for observer
:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.ieftimov.observer</string>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<true/>
<key>ProgramArguments</key>
<array>
<string>/usr/local/bin/observer</string>
<string>-config</string>
<string>/etc/observer.conf</string>
</array>
<key>StandardOutPath</key>
<string>/tmp/observer.log</string>
<key>StandardErrorPath</key>
<string>/tmp/observer.error.log</string>
</dict>
</plist>
The configuration is relatively straightforward, here are all of the pieces in order:
- The
Label
identifies the job and has to be unique for the launchd instance. Think of it as a unique name for the given agent. RunAtLoad
meanslaunchd
will start the job as soon as it loads it.KeepAlive
tells launchd to keep the agent running no matter what.ProgramArguments
provides command-line options to the agent command. In our case, this will create the following command:/usr/local/bin/observer -config /etc/observer.conf
.StandardOutPath
andStandardErrorPath
are the paths to wherelaunchd
will write the respective output. In our case, we write these to thetmp
directory. An alternative would be to add the log files to/var/log
, but that requires granting write access of the agent to/var/log
.
To make sure we can run the agent, we have to also supply the configuration
file observer.conf
in the /etc
directory. On my machine, its contents are
as follows:
status=500
tick=30s
url=https://ieftimov.com
server=cloudflare
content_type=text/html; charset=utf-8
user_agent=
After placing the observer.conf
file in /etc
, for the agent to work, we
have to place its .plist
file in ~/Library/LaunchAgents
, and load it with:
$ launchctl load ~/Library/LaunchAgents/com.ieftimov.observer.plist
Now, if we would tail -f
the log files in /tmp
we will see its outputs
there:
$ tail -f /tmp/observer.*
==> /tmp/observer.error.log <==
==> /tmp/observer.log <==
2020/05/02 11:49:03 Status code mismatch, got: 200
Voila! The agent is running and its logging output to STDOUT
, while launchd
is redirecting that output to a log file.
If we would like to run the observer
on GNU/Linux, we cannot use this
launchd
configuration.
In Linux-land, systemd
is widespread and popular. If you are interested in a
deeper explanation of systemd
units and unit files, Digital Ocean’s blog has
an
article
on “Understanding Systemd Units and Unit Files” by Justin Ellingwood. I
recommend reading it! And keep in mind, the community’s opinion on systemd
is pretty
divided.
There are a bunch of other alternatives, but my knowledge of GNU/Linux init systems is minimal. Therefore, I will stop right here and ask for your help:Β if you would like to contribute a Linux init system configuration to this article, drop the link to a gist/repo in the comments, and I will include it in this article.
Of course, with proper attribution.
You can see the final implementation of the observer here.