Testing in Go: Stop Leaking Files
Table of Contents
No test suite is perfect. Some test suites are missing good helper functions; others are under-configured or over-customize. Some have obsolete packages included and are left unmaintained. Folks that have experience with more mature projects will likely agree that all of the above can be found in the wild.
Often, when we test our Go programs, need to create files. Such files can be just fixture files, or whole file trees, to set up the correct environment for the tests to run. Once the tests finish running, we have to clean them up, so they don’t linger around.
We should rely on the test suite’s set of helpers to provide us with a way to manage test files if they exist in the first place. Unfortunately, not all test suites have such clean-up helpers set up. Sometimes, you might find a few different implementations, instead of one obvious way to do it.
Coming up in Go v1.15, the testing
package
will improve the support for creating temporary test files and directories.
Let’s see how we can put them to use.
Converting PDFs to TXT files #
While thinking about a small program that will aid our understanding of leaking test files, I was asking myself, “What is a program that generates files?”. Because I work for a company that does a lot with documents, I thought, “let’s do something straightforward with PDFs”.
Imagine we have a program that extracts text out of PDFs. Why? Well, for one, if we want to know how long it will take us to read the PDF, we can take the number of words in a PDF and divide it by the average reading speed (which, according to Google, is 250 words per minute).
But to do that, we first have to take a PDF and create a TXT file with all of the sentences inside. To do that, we can use one of the many PDF parsing libraries for Go.
Here’s the code that takes a PDF, extracts all rows of text from it, and saves them in a TXT file.
First, the persist
function:
func persist(content []byte, w io.Writer) error {
_, err := w.Write(content)
if err != nil {
log.Println("Failed to persist contents.")
return err
}
return nil
}
It takes some bytes as content
and persists them to a io.Writer
- it can be
a file, a strings.StringBuilder
or a different type that implements the
io.Writer
interface. The argument types are generic (interfaces) by design,
so the function arguments are more liberal. You will see why when we test this
function.
The slurp
function is next. It will take a pdf.Reader
type, which is a type
from the pdf
library we use. We will slurp all of the relevant content (text)
from the PDF and return a slice of bytes.
func slurp(r *pdf.Reader) []byte {
var bs []byte
total := r.NumPage()
for i := 1; i <= total; i++ {
p := r.Page(i)
if p.V.IsNull() {
continue
}
rows, _ := p.GetTextByRow()
for _, row := range rows {
for _, word := range row.Content {
bs = append(bs, []byte(word.S)...)
}
}
}
return bs
}
By returning a slice of bytes, instead of a string, we can use a generic
interface such as io.Writer
(like in
persist
). The interface is applicable because the Write
function it
implements takes a slice of bytes as an argument - making the arguments of
slurp
and the returned values of persist
compatible.
Next, the function that ties them all together – run
:
func run(args []string, out io.Writer) error {
log.SetOutput(out)
if len(args) < 3 {
return fmt.Errorf("Expected at least 2 arguments, got %d.", len(args)-1)
}
pdfFile, r, err := pdf.Open(args[1])
if err != nil {
return err
}
defer pdfFile.Close()
contents := slurp(r)
txtFile, err := os.Create(args[2])
if err != nil {
return err
}
defer txtFile.Close()
err = persist(contents, txtFile)
if err != nil {
return err
}
return nil
}
run
’s role is to check the arguments received from main
and the io.Writer
to where it should send all of its output. Then it opens the PDF file for
reading and passes the file reference to the slurp
function.
Once slurp
returns the contents of the file, which is just a slice of bytes
run
will create a new file for writing, called txtFile
. Once it opens the
file, it will send the contents
and the txtFile
to persist
as arguments.
As we already saw above, persist
will save the contents
to the file and
return any potential errors. If no errors are returned, run
successfully
exits.
Lastly, the straightforward main
function:
func main() {
if err := run(os.Args, os.Stdout); err != nil {
log.Fatal(err)
}
}
Locally, I have a simple PDF with some text inside that I’ve found online. Its contents, according to the author, are popular interview questions. We will run the above program with any PDF, and as long as it finds some text inside it will save it to an output file.
Here it is in action:
$ go run main.go input.pdf out.txt
$ cat out.txt
50 Common Interview Questions and Answers1. Tell me about yourself: The most often asked question in interviews. ...
That’s really it. Our program took the contents from input.pdf
and stored
them in out.txt
. Let’s see how we can test this program.
Testing persist
#
The persist
function does not do much. In fact it just invokes the Write
function of the io.Writer
instance. Since we are using a file, that is part
of the standard library, we do not need to test it. But, given that there’s
some error handling, which is a custom implementation, we can add some tests to
strive to get to that full test coverage.
TestPersist
, in all of its glory:
func TestPersist(t *testing.T) {
tt := []struct {
name string
content []byte
out func() (io.ReadWriter, error)
}{
{
name: "WithNoContent",
content: []byte{},
out: func() (io.ReadWriter, error) {
return os.Create("empty.txt")
},
},
{
name: "WithContent",
content: []byte{},
out: func() (io.ReadWriter, error) {
return os.Create("not-empty.txt")
},
},
}
for _, tc := range tt {
t.Run(tc.name, func(t *testing.T) {
f, err := tc.out()
if err != nil {
t.Fatalf("Cannot create output file: %s", err)
}
err = persist(tc.content, f)
if err != nil {
t.Fatalf("Cannot persits to output file: %s", err)
}
b := []byte{}
if _, err = io.ReadFull(f, b); err != nil {
t.Fatalf("Cannot read test output file: %s", err)
}
if !bytes.Equal(b, tc.content) {
t.Errorf("Persisted content is different than saved content.")
}
})
}
}
Each of the test cases, part of the table-driven tests, contains the name
of
the test case, the content
that it will persist, and the out
function,
which will create the output file.
In the test itself, we create a subtest for each of the test cases,
which will try to write the content
to a file, and then it will read all of
the content back from the file. If the persisted content and the test case
content are the same, then the test successfully passes.
If we run the tests, this is the output we will see:
$ go test -v -run TestPersist
=== RUN TestPersist
=== RUN TestPersist/WithNoContent
=== RUN TestPersist/WithContent
--- PASS: TestPersist (0.00s)
--- PASS: TestPersist/WithNoContent (0.00s)
--- PASS: TestPersist/WithContent (0.00s)
PASS
ok github.com/fteem/go-playground/testing-in-go-leak-test-files 0.250s
We ran the two test cases where we try to persist a file with no content and some content. Both of them passed, and we can move on!
Testing slurp
#
The slurp
function is more involved. It requires two different test files
– two dummy PDFs with some content and no content (empty). Then, by passing the
two different files to slurp
, we can test if extracting the text from the PDF
works as intended.
This is the test:
func TestSlurp(t *testing.T) {
tt := []struct {
name string
pdfPath string
size int
}{
{
name: "PDFWithContent",
pdfPath: "testdata/content.pdf",
size: 11463,
},
{
name: "PDFWithoutContent",
pdfPath: "testdata/empty.pdf",
size: 0,
},
}
for _, tc := range tt {
t.Run(tc.name, func(t *testing.T) {
pdfFile, r, err := pdf.Open(tc.pdfPath)
if err != nil {
t.Fatalf("Couldn't open PDF %s, error: %s", tc.pdfPath, err)
}
defer pdfFile.Close()
contents := slurp(r)
if len(contents) != tc.size {
t.Errorf("Expected contents to be %d bytes, got %d", tc.size, len(contents))
}
})
}
}
Each of the test cases from the table-driven tests will have a pdfPath
, which
is an actual PDF file on disk. For each of the test cases, a subtest will be
run, which will open the PDF using the ledongthuc/pdf
library. We will then
pass the reference to the PDF file to the slurp function, expecting the
contents
to be returned by slurp
.
Once it returns the content
, we simply compare the size
- the number of
bytes that are expected (tc.size
), comparing it against the len(contents)
,
which is the size of the bytes returned. If the sizes match, we assume that the
content is correct, and the test will pass.
Here’s the test in action:
$ go test -v -run TestSlurp
=== RUN TestSlurp
=== RUN TestSlurp/PDFWithContent
=== RUN TestSlurp/PDFWithoutContent
--- PASS: TestSlurp (0.03s)
--- PASS: TestSlurp/PDFWithContent (0.03s)
--- PASS: TestSlurp/PDFWithoutContent (0.00s)
PASS
ok github.com/fteem/go-playground/testing-in-go-leak-test-files 0.253s
Testing run
#
The run
function is what glues everything together. It validates the
arguments, then opens the PDF for reading, slurps all of the contents using
slurp
, and lastly saves all of the text content to the TXT file using
persist
.
Here’s the test:
func TestRun(t *testing.T) {
tt := []struct { name string input string output string
}{
{
name: "WithValidArguments",
input: "testdata/input.pdf",
output: "testdata/output.txt",
},
{
name: "WithEmptyInput",
input: "testdata/empty.pdf",
output: "testdata/output.txt",
},
}
for _, tc := range tt {
t.Run(tc.name, func(t *testing.T) {
err := run([]string{"foo", tc.input, tc.output}, os.Stdout)
if err != nil {
t.Fatalf("Expected no error, got: %s", err)
}
if _, err := os.Stat(tc.output); os.IsNotExist(err) {
t.Errorf("Expected persisted file at %s, did not find it: %s", tc.output, err)
}
})
}
}
In TestRun
, we check if, for each of the test PDFs we provide, the run
function crates the corresponding TXT file. In TestRun
, we do not care about
the actual contents – we can assume that the rest of the unit tests covers that
part of the functionality.
Then, for each test case, we use os.Stat
, which will return an error if the
file does not exist. If the file does exist, we consider the run
function as
properly functioning and mark the test as passed.
Here’s the test in action:
$ go test -v -run TestRun
=== RUN TestRun
=== RUN TestRun/WithValidArguments
=== RUN TestRun/WithEmptyInput
--- PASS: TestRun (0.03s)
--- PASS: TestRun/WithValidArguments (0.03s)
--- PASS: TestRun/WithEmptyInput (0.00s)
PASS
ok github.com/fteem/go-playground/testing-in-go-leak-test-files 0.106s
Another test we can also run is to test the returned errors. We will create
another function called TestRunErrors
, which will cover the potential errors
returned by run
. Here’s the test function:
func TestRunErrors(t *testing.T) {
tt := []struct {
name string
input string
output string
}{
{
name: "WithoutArguments",
input: "",
output: "",
},
{
name: "WithoutOneArgument",
input: "testdata/input.pdf",
output: "",
},
{
name: "WithNonexistentInput",
input: "testdata/nonexistent.pdf",
output: "testdata/output.txt",
},
}
for _, tc := range tt {
t.Run(tc.name, func(t *testing.T) {
err := run([]string{"foo", tc.input, tc.output}, os.Stdout)
if err == nil {
t.Fatalf("Expected an error, did not get one.")
}
})
}
}
The TestRunErrors
is similar to TestRun
, with having the focus on the
returned errors. It checks that for each of the bad inputs the run
function
receives, that it returns an error. We could take this a step further by
implementing sentinel
errors and asserting on
them, but this will do just fine this article.
Here’s the TestRunErrors
function in action:
$ go test -v -run TestRunErrors
=== RUN TestRunErrors
=== RUN TestRunErrors/WithoutArguments
=== RUN TestRunErrors/WithoutOneArgument
=== RUN TestRunErrors/WithNonexistentInput
--- PASS: TestRunErrors (0.02s)
--- PASS: TestRunErrors/WithoutArguments (0.00s)
--- PASS: TestRunErrors/WithoutOneArgument (0.02s)
--- PASS: TestRunErrors/WithNonexistentInput (0.00s)
PASS
ok github.com/fteem/go-playground/testing-in-go-leak-test-files 0.140sbash
Cleaning up after our test #
If you were following along, you should notice that test files are created in your project’s directory. The files stay there because we never clean up the output files that our program creates when the tests run.
Starting from Go v1.15, there will be a nice way to do this: TB.TempDir()
.
To clean up the test files, we can use TB.TempDir
as a parent directory
wherever we are passing the output file path. Once the tests pass, Go will
automatically get rid of this directory, without us having to do any clean-up.
First, let’s see how we can change the TestPersist
function to clean up the
empty.txt
and non-empty.txt
files it creates:
func TestPersist(t *testing.T) {
tt := []struct {
name string
content []byte
out func() (io.ReadWriter, error)
}{
{
name: "WithNoContent",
content: []byte{},
out: func() (io.ReadWriter, error) {
return os.Create(filepath.Join(t.TempDir(), "empty.txt"))
},
},
{
name: "WithContent",
content: []byte{},
out: func() (io.ReadWriter, error) {
return os.Create(filepath.Join(t.TempDir(), "not-empty.txt"))
},
},
}
for _, tc := range tt {
// Snipped...
}
}
The only notable change is to use filepath.Join
with t.TempDir()
and the
file name as arguments. This combo will compose a valid temporary path, that Go
will remove once the tests finish. Given that at the time of writing this, Go
1.15 is still not out, we can use the gotip
tool to run Go’s latest master
version:
$ gotip test -v -run TestPersist
=== RUN TestPersist
=== RUN TestPersist/WithNoContent
=== RUN TestPersist/WithContent
--- PASS: TestPersist (0.00s)
--- PASS: TestPersist/WithNoContent (0.00s)
--- PASS: TestPersist/WithContent (0.00s)
PASS
ok github.com/fteem/go-playground/testing-in-go-leak-test-files 0.048s
If we inspect the project root, we will see that no new files are being created. The output files are cleaned up after the tests have finished running.
gotip
tool compiles and runs the go
command from the
development tree. Using the gotip
command, instead of the normal go
command, will run the latest version of the language, as seen in the main Git
trunk. You can see its
documentation for more
details.
Next, we can do the same change to the TestRun
function:
func TestRun(t *testing.T) {
tt := []struct {
name string
input string
output string
}{
{
name: "WithValidArguments",
input: "testdata/input.pdf",
output: filepath.Join(t.TempDir(), "output.txt"),
},
{
name: "WithEmptyInput",
input: "testdata/empty.pdf",
output: filepath.Join(t.TempDir(), "output.txt"),
},
}
for _, tc := range tt {
// Same as before...
}
}
In the TestRun
function, we use the same trick - we use the T.TempDir
function to concatenate the path of the output
file. Running the test the
same way, we can see T.TempDir
once more in action:
$ gotip test -v -run TestRun
=== RUN TestRun
=== RUN TestRun/WithValidArguments
=== RUN TestRun/WithEmptyInput
--- PASS: TestRun (0.03s)
--- PASS: TestRun/WithValidArguments (0.03s)
--- PASS: TestRun/WithEmptyInput (0.00s)
=== RUN TestRunErrors
=== RUN TestRunErrors/WithoutArguments
=== RUN TestRunErrors/WithoutOneArgument
=== RUN TestRunErrors/WithNonexistentInput
--- PASS: TestRunErrors (0.02s)
--- PASS: TestRunErrors/WithoutArguments (0.00s)
--- PASS: TestRunErrors/WithoutOneArgument (0.02s)
--- PASS: TestRunErrors/WithNonexistentInput (0.00s)
PASS
ok github.com/fteem/go-playground/testing-in-go-leak-test-files 0.265s
If you want to check out the motivation and the discussion around this addition to the new Go version 1.15, you can head over to the original proposal.