“The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming.” - The Art of Computer Programming
Why bother?
Profiling is a technique to analyze the utilization of computational resources of applications, such as CPU and memory. It’s very useful to spot snippets of code with a high consumption of resources and excessive function calls. The goal with profiling is to find sections of code that can be optmized, and opportunities of improving code performance as well.
When we talk about code optimization, one of the most important things is to know when to optimize our programs. We should be aware of code optimization impacts at the expense of code readability and design. In most situations, it’s more valuable to have clean code with a flexible design instead of a high performance code with low maintainability. We don’t need code with the best performance, we just need code with enough performance.
Moreover, most of the time, the biggest part of costs come from some fewer sections of our code. With that, it’s wise to focus on optimizing those fewer sections of code instead of spending time trying to find ways of optimizing sections of code that will have lower or no impact in the overall program performance.
How does it work?
We can think about the process of profiling and code optimization in three steps:
-
Collecting data: the profiling tool collects samples of CPU and memory from the running program. To do that, we can use the testing tool (
go test
) or thehttp/pprof
package (in this case, an endpoint is exposed in our server); -
Visualizing data: we can visualize the collected samples both in the terminal and in web interface, we can do that using the
go tool pprof
command; -
Optimizing the program: at last, we decide how to optimize our program (and if we should optimize it at all). In general, we have two ways of doing that: we can change the data structures or we can change the algorithm;
Example: Memory profiling
The program below sorts the numbers from a file named numbers.txt
.
This file has one million numbers generated randomly, between zero and one million.
The function SortNumbers()
opens the file, then uses a Scanner
, from the bufio
package,
to read one line at a time and add the number read to the numbers
slice.
To simplify the example, let’s keep the numbers read as strings and ignore the error handling
(do not do that at home).
package main
import (
"bufio"
"fmt"
"os"
"sort"
)
func main() {
numbers := SortNumbers("./numbers.txt")
fmt.Println(numbers)
}
func SortNumbers(filepath string) []string {
var numbers []string
f, _ := os.Open(filepath)
scanner := bufio.NewScanner(f)
for scanner.Scan() {
numbers = append(numbers, scanner.Text())
}
sort.Strings(numbers)
return numbers
}
1. Collecting data
For this example, we are going to use go test
to collect samples of memory usage.
To do that, we create the following benchmark testing file:
package main
import "testing"
func BenchmarkSortNumbers(b *testing.B) {
for i := 0 ; i < b.N; i++ {
SortNumbers("./numbers.txt")
}
}
Now we run the following command:
➜ go test -bench SortNumbers -benchmem -memprofile mem.prof ./example/...
-bench
accepts a regex pattern with the name of the benchmark test to be run.
-benchmem
prints additional information about memory allocations.
-memprofile
indicates that we are running a memory profiling and that the sample data
should be saved to the mem.prof file.
In addition to generating two new files, mem.prof and example.test (test binary), the command execution shows us the following output:
goos: linux
goarch: amd64
pkg: example
cpu: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
... 3 497555656 ns/op 96406397 B/op 1000040 allocs/op
PASS
ok example 3.017s
2. Visualizing data
In the output above, we can already notice some relevant information such as the total of memory allocations made by our program (1000040). That is sufficient enough to have some considerable preemptions of the garbage collector during the program execution.
Now, let’s use the pprof
tool to visualize and analyze the samples with more detail.
To do that, we run:
➜ go tool pprof mem.prof
pprof
has some interesting subcommands. To see the full list, you can
use the help subcommand at any time. Let’s use the top5 subcommand
to visualize the five most significant function calls.
(pprof) top5
Showing nodes accounting for 558.16MB, 100% of 558.16MB total
Showing top 5 nodes out of 6
flat flat% sum% cum cum%
504.66MB 90.41% 90.41% 558.16MB 100% example.SortNumbers
53.50MB 9.59% 100% 53.50MB 9.59% bufio.(*Scanner).Text (inline)
0 0% 100% 558.16MB 100% example.BenchmarkSortNumbers
0 0% 100% 469.17MB 84.06% testing.(*B).launch
0 0% 100% 88.99MB 15.94% testing.(*B).run1.func1
As we have only the SortNumbers()
function, we can see that
the 558.16MB are being allocated within that. Now, let’s use the
list subcommand to see more details of that function.
(pprof) list SortNumbers
Total: 558.16MB
ROUTINE ======================== profiling-poc/example.SortNumbers
504.66MB 558.16MB (flat, cum) 100% of Total
. . 21:
. . 22: f, _ := os.Open(filepath)
. . 23:
. . 24: scanner := bufio.NewScanner(f)
. . 25: for scanner.Scan() {
504.66MB 558.16MB 26: numbers = append(numbers, scanner.Text())
. . 27: }
. . 28:
. . 29: sort.Strings(numbers)
. . 30:
. . 31: return numbers
(pprof)
We can see that the memory allocation happens on line 26. As we are using a
slice of variable length, in a few iterations more and more memory needs to be allocated.
(Using the command list Text()
we could also identify that only something around 25MB
were being allocated during the scanner.Text()
calls).
3. Optimizing the program
To optimize our program, we’re going to be using the replace algorithm strategy.
In this case, we are going to replace the implementation of SortNumbers()
function
by the following one:
func SortNumbers(filepath string) []string {
var numbers []string
bs, err := os.ReadFile(filepath)
if err != nil {
return numbers
}
numbers = strings.Split(string(bs), "\n")
sort.Strings(numbers)
return numbers
}
In this implementation, we are reading all lines from numbers.txt
at once.
We are also delegating the allocation of numbers
to the strings.Split()
call.
Internally, this method allocates a fixed-size slice. Therefore, we can hope for
better results this time.
➜ go test -bench SortNumbers -benchmem -memprofile mem.prof ./example/...
goos: linux
goarch: amd64
pkg: example
cpu: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
... 3 495895932 ns/op 29786464 B/op 8 allocs/op
PASS
ok profiling-poc/example 2.978s
Using the go test
command to execute the benchmark tests once again,
we see that we reduced the total number of memory allocations to only 8. Great!
Now let’s analyze the result of go tool pprof
one more time.
(pprof) list SortNumbers
Total: 163.87MB
ROUTINE ======================== example.SortNumbers
39.42MB 163.87MB (flat, cum) 100% of Total
. . 31: //return numbers
. . 32://}
. . 33:
. . 34:func SortNumbers(filepath string) []string {
. . 35: var books []string
. 32.85MB 36: bs, err := os.ReadFile(filepath)
. . 37: if err != nil {
. . 38: return books
. . 39: }
39.42MB 131.02MB 40: books = strings.Split(string(bs), "\n")
. . 41: sort.Strings(books)
. . 42: return books
. . 43:}
. . 44:
. . 45:func generateFile(filepath string, n int) {
163.87MB of memory allocation (compared to 558.16MB of the first implementation).
Again, great!
We have an allocation of 32.85MB of bs
buffer, which holds all the content of
numbers.txt
file. We also have a second allocation of 131.02MB, used to split
the file content into a slice of strings. Not bad.
What’s more?
In the same way that we profilled the memory usage, we can use the same approach
to profile the CPU usage, with the goal of finding functions that take more time
to be executed, or functions that are being executed many times. To do that, we
just have to run the go test
command using the -cpuprofile
flag.
➜ go test -bench SortNumbers -cpuprofile cpu.prof ./example/...
At last, it’s worth remembering that there are other ways of profiling in Go.
As mentioned in the begining of this post, we can use the http/pprof
package.
Besides that, we have specific tools for each operating system, we have perf
on Linux and Instruments on macOS.
You can learn more about them reading the diagnostics
language documentation.