Ilija Eftimov

A human, interested in building products and software.

22 May 2020

Testing in Go: Stop Leaking Files

No test suite is perfect. Some test suites are missing good helper functions; others are under-configured or over-customize. Some have obsolete packages included and are left unmaintained. Folks that have experience with more mature projects will likely agree that all of the above can be found in the wild.

Often, when we test our Go programs, need to create files. Such files can be just fixture files, or whole file trees, to set up the correct environment for the tests to run. Once the tests finish running, we have to clean them up, so they don't linger around.

We should rely on the test suite's set of helpers to provide us with a way to manage test files if they exist in the first place. Unfortunately, not all test suites have such clean-up helpers set up. Sometimes, you might find a few different implementations, instead of one obvious way to do it.

Coming up in Go v1.15, the testing package will improve the support for creating temporary test files and directories. Let's see how we can put them to use.

Converting PDFs to TXT files

While thinking about a small program that will aid our understanding of leaking test files, I was asking myself, “What is a program that generates files?". Because I work for a company that does a lot with documents, I thought, “let's do something straightforward with PDFs”.

Imagine we have a program that extracts text out of PDFs. Why? Well, for one, if we want to know how long it will take us to read the PDF, we can take the number of words in a PDF and divide it by the average reading speed (which, according to Google, is 250 words per minute).

But to do that, we first have to take a PDF and create a TXT file with all of the sentences inside. To do that, we can use one of the many PDF parsing libraries for Go.

Here's the code that takes a PDF, extracts all rows of text from it, and saves them in a TXT file.

First, the persist function:

func persist(content []byte, w io.Writer) error {
	_, err := w.Write(content)
	if err != nil {
		log.Println("Failed to persist contents.")
		return err
	}

	return nil
}

It takes some bytes as content and persists them to a io.Writer - it can be a file, a strings.StringBuilder or a different type that implements the io.Writer interface. The argument types are generic (interfaces) by design, so the function arguments are more liberal. You will see why when we test this function.

The slurp function is next. It will take a pdf.Reader type, which is a type from the pdf library we use. We will slurp all of the relevant content (text) from the PDF and return a slice of bytes.

func slurp(r *pdf.Reader) []byte {
	var bs []byte

	total := r.NumPage()
	for i := 1; i <= total; i++ {
		p := r.Page(i)
		if p.V.IsNull() {
			continue
		}

		rows, _ := p.GetTextByRow()
		for _, row := range rows {
			for _, word := range row.Content {
				bs = append(bs, []byte(word.S)...)
			}
		}
	}
	return bs
}

By returning a slice of bytes, instead of a string, we can use a generic interface such as io.Writer (like in persist). The interface is applicable because the Write function it implements takes a slice of bytes as an argument - making the arguments of slurp and the returned values of persist compatible.

Next, the function that ties them all together – run:

func run(args []string, out io.Writer) error {
	log.SetOutput(out)

	if len(args) < 3 {
		return fmt.Errorf("Expected at least 2 arguments, got %d.", len(args)-1)
	}

	pdfFile, r, err := pdf.Open(args[1])
	if err != nil {
		return err
	}
	defer pdfFile.Close()

	contents := slurp(r)

	txtFile, err := os.Create(args[2])
	if err != nil {
		return err
	}
	defer txtFile.Close()

	err = persist(contents, txtFile)
	if err != nil {
		return err
	}

	return nil
}

run's role is to check the arguments received from main and the io.Writer to where it should send all of its output. Then it opens the PDF file for reading and passes the file reference to the slurp function.

Once slurp returns the contents of the file, which is just a slice of bytes run will create a new file for writing, called txtFile. Once it opens the file, it will send the contents and the txtFile to persist as arguments.

As we already saw above, persist will save the contents to the file and return any potential errors. If no errors are returned, run successfully exits.

Lastly, the straightforward main function:

func main() {
	if err := run(os.Args, os.Stdout); err != nil {
		log.Fatal(err)
	}
}

Locally, I have a simple PDF with some text inside that I've found online. Its contents, according to the author, are popular interview questions. We will run the above program with any PDF, and as long as it finds some text inside it will save it to an output file.

Here it is in action:

$ go run main.go input.pdf out.txt
$ cat out.txt
50 Common Interview Questions and Answers1. Tell me about yourself: The most often asked question in interviews. ...

That's really it. Our program took the contents from input.pdf and stored them in out.txt. Let's see how we can test this program.

Testing persist

The persist function does not do much. In fact it just invokes the Write function of the io.Writer instance. Since we are using a file, that is part of the standard library, we do not need to test it. But, given that there's some error handling, which is a custom implementation, we can add some tests to strive to get to that full test coverage.

TestPersist, in all of its glory:

func TestPersist(t *testing.T) {
	tt := []struct {
		name    string
		content []byte
		out     func() (io.ReadWriter, error)
	}{
		{
			name:    "WithNoContent",
			content: []byte{},
			out: func() (io.ReadWriter, error) {
				return os.Create("empty.txt")
			},
		},
		{
			name:    "WithContent",
			content: []byte{},
			out: func() (io.ReadWriter, error) {
				return os.Create("not-empty.txt")
			},
		},
	}

	for _, tc := range tt {
		t.Run(tc.name, func(t *testing.T) {
			f, err := tc.out()
			if err != nil {
				t.Fatalf("Cannot create output file: %s", err)
			}

			err = persist(tc.content, f)
			if err != nil {
				t.Fatalf("Cannot persits to output file: %s", err)
			}

			b := []byte{}
			if _, err = io.ReadFull(f, b); err != nil {
				t.Fatalf("Cannot read test output file: %s", err)
			}

			if !bytes.Equal(b, tc.content) {
				t.Errorf("Persisted content is different than saved content.")
			}
		})
	}
}

Each of the test cases, part of the table-driven tests, contains the name of the test case, the content that it will persist, and the out function, which will create the output file.

In the test itself, we create a subtest for each of the test cases, which will try to write the content to a file, and then it will read all of the content back from the file. If the persisted content and the test case content are the same, then the test successfully passes.

If we run the tests, this is the output we will see:

$ go test -v -run TestPersist
=== RUN   TestPersist
=== RUN   TestPersist/WithNoContent
=== RUN   TestPersist/WithContent
--- PASS: TestPersist (0.00s)
    --- PASS: TestPersist/WithNoContent (0.00s)
    --- PASS: TestPersist/WithContent (0.00s)
PASS
ok  	github.com/fteem/go-playground/testing-in-go-leak-test-files	0.250s

We ran the two test cases where we try to persist a file with no content and some content. Both of them passed, and we can move on!

Testing slurp

The slurp function is more involved. It requires two different test files – two dummy PDFs with some content and no content (empty). Then, by passing the two different files to slurp, we can test if extracting the text from the PDF works as intended.

This is the test:

func TestSlurp(t *testing.T) {
	tt := []struct {
		name    string
		pdfPath string
		size    int
	}{
		{
			name:    "PDFWithContent",
			pdfPath: "testdata/content.pdf",
			size:    11463,
		},
		{
			name:    "PDFWithoutContent",
			pdfPath: "testdata/empty.pdf",
			size:    0,
		},
	}

	for _, tc := range tt {
		t.Run(tc.name, func(t *testing.T) {
			pdfFile, r, err := pdf.Open(tc.pdfPath)
			if err != nil {
				t.Fatalf("Couldn't open PDF %s, error: %s", tc.pdfPath, err)
			}
			defer pdfFile.Close()

			contents := slurp(r)

			if len(contents) != tc.size {
				t.Errorf("Expected contents to be %d bytes, got %d", tc.size, len(contents))
			}
		})
	}
}

Each of the test cases from the table-driven tests will have a pdfPath, which is an actual PDF file on disk. For each of the test cases, a subtest will be run, which will open the PDF using the ledongthuc/pdf library. We will then pass the reference to the PDF file to the slurp function, expecting the contents to be returned by slurp.

Once it returns the content, we simply compare the size - the number of bytes that are expected (tc.size), comparing it against the len(contents), which is the size of the bytes returned. If the sizes match, we assume that the content is correct, and the test will pass.

Here's the test in action:

$ go test -v -run TestSlurp
=== RUN   TestSlurp
=== RUN   TestSlurp/PDFWithContent
=== RUN   TestSlurp/PDFWithoutContent
--- PASS: TestSlurp (0.03s)
    --- PASS: TestSlurp/PDFWithContent (0.03s)
    --- PASS: TestSlurp/PDFWithoutContent (0.00s)
PASS
ok  	github.com/fteem/go-playground/testing-in-go-leak-test-files	0.253s

Testing run

The run function is what glues everything together. It validates the arguments, then opens the PDF for reading, slurps all of the contents using slurp, and lastly saves all of the text content to the TXT file using persist.

Here's the test:

func TestRun(t *testing.T) {
	tt := []struct { name   string input  string output string
	}{
		{
			name:   "WithValidArguments",
			input:  "testdata/input.pdf",
			output: "testdata/output.txt",
		},
		{
			name:   "WithEmptyInput",
			input:  "testdata/empty.pdf",
			output: "testdata/output.txt",
		},
	}

	for _, tc := range tt {
		t.Run(tc.name, func(t *testing.T) {
			err := run([]string{"foo", tc.input, tc.output}, os.Stdout)
			if err != nil {
				t.Fatalf("Expected no error, got:  %s", err)
			}

			if _, err := os.Stat(tc.output); os.IsNotExist(err) {
				t.Errorf("Expected persisted file at %s, did not find it: %s", tc.output, err)
			}
		})
	}
}

In TestRun, we check if, for each of the test PDFs we provide, the run function crates the corresponding TXT file. In TestRun, we do not care about the actual contents – we can assume that the rest of the unit tests covers that part of the functionality.

Then, for each test case, we use os.Stat, which will return an error if the file does not exist. If the file does exist, we consider the run function as properly functioning and mark the test as passed.

Here's the test in action:

$ go test -v -run TestRun
=== RUN   TestRun
=== RUN   TestRun/WithValidArguments
=== RUN   TestRun/WithEmptyInput
--- PASS: TestRun (0.03s)
    --- PASS: TestRun/WithValidArguments (0.03s)
    --- PASS: TestRun/WithEmptyInput (0.00s)
PASS
ok  	github.com/fteem/go-playground/testing-in-go-leak-test-files	0.106s

Another test we can also run is to test the returned errors. We will create another function called TestRunErrors, which will cover the potential errors returned by run. Here's the test function:

func TestRunErrors(t *testing.T) {
	tt := []struct {
		name   string
		input  string
		output string
	}{
		{
			name:   "WithoutArguments",
			input:  "",
			output: "",
		},
		{
			name:   "WithoutOneArgument",
			input:  "testdata/input.pdf",
			output: "",
		},
		{
			name:   "WithNonexistentInput",
			input:  "testdata/nonexistent.pdf",
			output: "testdata/output.txt",
		},
	}

	for _, tc := range tt {
		t.Run(tc.name, func(t *testing.T) {
			err := run([]string{"foo", tc.input, tc.output}, os.Stdout)

			if err == nil {
				t.Fatalf("Expected an error, did not get one.")
			}
		})
	}
}

The TestRunErrors is similar to TestRun, with having the focus on the returned errors. It checks that for each of the bad inputs the run function receives, that it returns an error. We could take this a step further by implementing sentinel errors and asserting on them, but this will do just fine this article.

Here's the TestRunErrors function in action:

$ go test -v -run TestRunErrors
=== RUN   TestRunErrors
=== RUN   TestRunErrors/WithoutArguments
=== RUN   TestRunErrors/WithoutOneArgument
=== RUN   TestRunErrors/WithNonexistentInput
--- PASS: TestRunErrors (0.02s)
    --- PASS: TestRunErrors/WithoutArguments (0.00s)
    --- PASS: TestRunErrors/WithoutOneArgument (0.02s)
    --- PASS: TestRunErrors/WithNonexistentInput (0.00s)
PASS
ok  	github.com/fteem/go-playground/testing-in-go-leak-test-files	0.140sbash

Cleaning up after our test

If you were following along, you should notice that test files are created in your project's directory. The files stay there because we never clean up the output files that our program creates when the tests run.

Starting from Go v1.15, there will be a nice way to do this: TB.TempDir(). To clean up the test files, we can use TB.TempDir as a parent directory wherever we are passing the output file path. Once the tests pass, Go will automatically get rid of this directory, without us having to do any clean-up.

First, let's see how we can change the TestPersist function to clean up the empty.txt and non-empty.txt files it creates:

func TestPersist(t *testing.T) {
	tt := []struct {
		name    string
		content []byte
		out     func() (io.ReadWriter, error)
	}{
		{
			name:    "WithNoContent",
			content: []byte{},
			out: func() (io.ReadWriter, error) {
				return os.Create(filepath.Join(t.TempDir(), "empty.txt"))
			},
		},
		{
			name:    "WithContent",
			content: []byte{},
			out: func() (io.ReadWriter, error) {
				return os.Create(filepath.Join(t.TempDir(), "not-empty.txt"))
			},
		},
	}

	for _, tc := range tt {
		// Snipped...
	}
}

The only notable change is to use filepath.Join with t.TempDir() and the file name as arguments. This combo will compose a valid temporary path, that Go will remove once the tests finish. Given that at the time of writing this, Go 1.15 is still not out, we can use the gotip tool to run Go's latest master version:

$ gotip test -v -run TestPersist
=== RUN   TestPersist
=== RUN   TestPersist/WithNoContent
=== RUN   TestPersist/WithContent
--- PASS: TestPersist (0.00s)
    --- PASS: TestPersist/WithNoContent (0.00s)
    --- PASS: TestPersist/WithContent (0.00s)
PASS
ok  	github.com/fteem/go-playground/testing-in-go-leak-test-files	0.048s

If we inspect the project root, we will see that no new files are being created. The output files are cleaned up after the tests have finished running.

Note

The gotip tool compiles and runs the go command from the development tree. Using the gotip command, instead of the normal go command, will run the latest version of the language, as seen in the main Git trunk.

You can see its documentation for more details.

Next, we can do the same change to the TestRun function:

func TestRun(t *testing.T) {
	tt := []struct {
		name   string
		input  string
		output string
	}{
		{
			name:   "WithValidArguments",
			input:  "testdata/input.pdf",
			output: filepath.Join(t.TempDir(), "output.txt"),
		},
		{
			name:   "WithEmptyInput",
			input:  "testdata/empty.pdf",
			output: filepath.Join(t.TempDir(), "output.txt"),
		},
	}

	for _, tc := range tt {
		// Same as before...
	}
}

In the TestRun function, we use the same trick - we use the T.TempDir function to concatenate the path of the output file. Running the test the same way, we can see T.TempDir once more in action:

$ gotip test -v -run TestRun
=== RUN   TestRun
=== RUN   TestRun/WithValidArguments
=== RUN   TestRun/WithEmptyInput
--- PASS: TestRun (0.03s)
    --- PASS: TestRun/WithValidArguments (0.03s)
    --- PASS: TestRun/WithEmptyInput (0.00s)
=== RUN   TestRunErrors
=== RUN   TestRunErrors/WithoutArguments
=== RUN   TestRunErrors/WithoutOneArgument
=== RUN   TestRunErrors/WithNonexistentInput
--- PASS: TestRunErrors (0.02s)
    --- PASS: TestRunErrors/WithoutArguments (0.00s)
    --- PASS: TestRunErrors/WithoutOneArgument (0.02s)
    --- PASS: TestRunErrors/WithNonexistentInput (0.00s)
PASS
ok  	github.com/fteem/go-playground/testing-in-go-leak-test-files	0.265s

If you want to check out the motivation and the discussion around this addition to the new Go version 1.15, you can head over to the original proposal.

Join the Newsletter

I write about backend technologies, programming and cloud architectures. Join hundreds of other developers that get my newsletter, once a month.

    I respect your privacy. Never spam. Unsubscribe at any time.

    You can also subscribe via RSS or to my Telegram channel.

    comments powered by Disqus