Allocations

Overview

In server-side Swift applications, memory allocations are fundamental for various tasks like creating objects, manipulating data structures, and managing resources. Swift allocates memory resources as needed and provides built-in memory management mechanisms, such as automatic reference counting (ARC), to handle allocations, deallocations, and memory ownership.

Allocations aid in optimizing memory usage by allocating the precise amount of memory required for each object or data structure, reducing memory wastage and improving application performance. However, Swift allocations can be padded to enforce memory alignment requirements for data types or structures that need to be accessed efficiently by the hardware, reducing the risk of misaligned memory access issues and improving performance.

Additionally, proper allocation management prevents memory leaks and ensures that memory is released when it is no longer needed. This helps in maintaining the stability and reliability of server applications.

Heaps and stacks

Generally speaking, Swift has two fundamental locations for memory allocations: Heaps and Stacks.

Swift automatically allocates memory in either the heap or the stack data structure.

For high-performance software in Swift, understanding the source of your heap allocations and reducing the number of allocations your software provides is paramount. Identifying these questions is similar to identifying other performance questions, such as:

Note: While heap allocations can be relatively expensive regarding computational overhead, they provide flexibility and dynamic memory management capabilities essential for tasks like working with variable-sized or dynamic data structures.

Profiling

You can use different tools and techniques to profile your Swift code, depending on the specific requirements of your project. Some commonly used profiling techniques include:

For macOS, you can use the Allocations instrument in Xcode Instruments to help you analyze and optimize memory usage in your apps. The Allocations instrument tracks the size and number of all heap and anonymous virtual memory allocations and organizes them by category.

If your production workloads run on Linux instead of macOS, the number of allocations can differ significantly depending on your setup.

This document mainly focuses on the number of heap allocations and not their size.

Getting started

Swift’s optimizer produces faster code and allocates less memory in release mode. By profiling your Swift code in the release mode and optimizing based on the results, you can achieve better performance and efficiency in your applications.

Follow the steps below:

Step 1. Build your code in release mode by running this command:

swift run -c release

Step 2. Install perf to profile your code for your environment to gather performance-related data and optimize the performance of your Swift server applications.

Step 3. Clone the FlameGraph project to generate a flame graph visualization that helps you quickly identify hotspots in the codebase, visualize call paths, understand the flow of execution, and optimize performance. To generate a flame graph, you will need to clone the FlameGraph repository on your machine or into a container, making it available at ~/FlameGraph.

Run this command to clone the https://github.com/brendangregg/FlameGraph repository in ~/FlameGraph:

git clone https://github.com/brendangregg/FlameGraph

When running in Docker, use this command to bind-mount the FlameGraph repository into the container:

docker run -it --rm \
           --privileged \
           -v "/path/to/FlameGraphOnYourMachine:/FlameGraph:ro" \
           -v "$PWD:PWD" -w "$PWD" \
           swift:latest

By visually highlighting the most frequently called functions or the functions consuming the most processing time, you can focus your optimization efforts on improving the performance of critical code paths.

Tools

You can identify areas for optimization and make informed decisions to improve the performance and efficiency of your Swift server code using the Linux perf tool.

The perf tool is a performance profiling and analysis tool available on Linux systems. Although it is not specific to Swift, it can be valuable for profiling Swift code on the server for the following reasons:

Tip 1: If you’re running perf in a Docker container, you will need a privileged container to provide the necessary permissions and access to the tool to gather performance data.

Tip 2: Prefix the commands with sudo if you need root access. See Getting perf to work for more information.

Installing a perf user probe

As previously mentioned, this document’s example programs focus on counting the number of allocations.

Most allocations use a Swift program’s malloc function on Linux. Installing perf user probes on the allocation function provides information about when an allocation function is called.

In this instance, a user probe was installed for all allocation functions because Swift uses other functions like calloc and posix_memalign.

# figures out the path to libc
libc_path=$(readlink -e /lib64/libc.so.6 /lib/x86_64-linux-gnu/libc.so.6)

# delete all existing user probes on libc (instead of * you can also list them individually)
perf probe --del 'probe_libc:*'

# installs a probe on `malloc`, `calloc`, and `posix_memalign`
perf probe -x "$libc_path" --add malloc --add calloc --add posix_memalign

Subsequently, an event in perf will trigger whenever one of the allocation functions is called.

The output should look like this:

Added new events:
  probe_libc:malloc    (on malloc in /usr/lib/x86_64-linux-gnu/libc-2.31.so)
  probe_libc:calloc    (on calloc in /usr/lib/x86_64-linux-gnu/libc-2.31.so)
  probe_libc:posix_memalign (on posix_memalign in /usr/lib/x86_64-linux-gnu/libc-2.31.so)

[...]

Here, you can see that perf triggers new events probe_libc:malloc; probe_libc:calloc each time the respective function is called.

To confirm the user probe probe_libc:malloc works, run this command:

perf stat -e probe_libc:malloc -- bash -c 'echo Hello World'

The output should look similar to this:

Hello World

 Performance counter stats for 'bash -c echo Hello World':

              1021      probe_libc:malloc

       0.003840500 seconds time elapsed

       0.000000000 seconds user
       0.003867000 seconds sys

In this case, it appears the user probe called the allocation functions 1021 times.

Important: If the probe called the allocation functions 0 times, it would indicate an error.

Running allocation analysis

By running allocation analysis, you can gain a better understanding of the memory usage patterns in your application and identify and fix memory issues such as leaks or inefficient usage, ultimately improving the performance and stability of your code.

Example program

Once you’ve confirmed the user probe on malloc is working, you can analyze the allocations of a program. For instance, you can analyze a program that performs ten subsequent HTTP requests using AsyncHTTPClient.

Analyzing a program using AsyncHTTPClient can help optimize its performance, improve error handling, ensure proper concurrency and threading, enhance code readability and maintainability, and assess scalability considerations.

Here’s an example of the program source code with the following dependencies:

dependencies: [
    .package(url: "https://github.com/swift-server/async-http-client.git", from: "1.3.0"),
    .package(url: "https://github.com/apple/swift-nio.git", from: "2.29.0"),
    .package(url: "https://github.com/apple/swift-log.git", from: "1.4.2"),
],

An example program using AsyncHTTPClient can be written as:

import AsyncHTTPClient
import NIO
import Logging

let urls = Array(repeating:"http://httpbin.org/get", count: 10)
var logger = Logger(label: "ahc-alloc-demo")

logger.info("running HTTP requests", metadata: ["count": "\(urls.count)"])
MultiThreadedEventLoopGroup.withCurrentThreadAsEventLoop { eventLoop in
    let httpClient = HTTPClient(eventLoopGroupProvider: .shared(eventLoop),
                                backgroundActivityLogger: logger)

    func doRemainingRequests(_ remaining: ArraySlice<String>,
                             overallResult: EventLoopPromise<Void>,
                             eventLoop: EventLoop) {
        var remaining = remaining
        if let first = remaining.popFirst() {
            httpClient.get(url: first, logger: logger).map { [remaining] _ in
                eventLoop.execute { // for shorter stacks
                    doRemainingRequests(remaining, overallResult: overallResult, eventLoop: eventLoop)
                }
            }.whenFailure { error in
                overallResult.fail(error)
            }
        } else {
            return overallResult.succeed(())
        }
    }

    let promise = eventLoop.makePromise(of: Void.self)
    // Kick off the process
    doRemainingRequests(urls[...],
                        overallResult: promise,
                        eventLoop: eventLoop)

    promise.futureResult.whenComplete { result in
        switch result {
        case .success:
            logger.info("all HTTP requests succeeded")
        case .failure(let error):
            logger.error("HTTP request failure", metadata: ["error": "\(error)"])
        }

        httpClient.shutdown { maybeError in
            if let error = maybeError {
                logger.error("AHC shutdown failed", metadata: ["error": "\(error)"])
            }
            eventLoop.shutdownGracefully { maybeError in
                if let error = maybeError {
                    logger.error("EventLoop shutdown failed", metadata: ["error": "\(error)"])
                }
            }
        }
    }
}

logger.info("exiting")

If running a program as a Swift package, compile it in the release mode first, using this command:

swift build -c release

A binary called .build/release/your-program-name should render and can be analyzed to get the number of allocations.

Counting allocations

Counting allocations and visualizing them as a graph can help you analyze memory utilization, profile memory usage, optimize performance, refactor and optimize code, and debug memory-related issues in your program.

Before visualizing the allocations as a flame graph, start with an analysis using the binary to get the number of allocations by running the command:

perf stat -e 'probe_libc:*' -- .build/release/your-program-name

This command instructs perf to run your program and count the number of times the user probe probe_libc:malloc was hit or allocated memory within your application.

The output should look similar to this:

Performance counter stats for '.build/release/your-program-name':

                68      probe_libc:posix_memalign
                35      probe_libc:calloc_1
                 0      probe_libc:calloc
              2977      probe_libc:malloc

[...]

In this instance, the program allocated 2977 times through malloc and a small number of times through the other allocation functions.

It’s important to note that the -e probe_libc:* command is used instead of individually listing every event such as:

Tip: This approach assumes you don’t have other perf user probes installed. If other perf user probes are installed, you need to specify each event you want to use individually.

Collecting raw data

Collecting raw data is crucial for obtaining an accurate representation of the system’s behavior, performing detailed performance analysis and debugging, analyzing trends, enabling profiling flexibility, and guiding performance optimization efforts.

The perf command doesn’t allow for creating live graphs while the program is running. However, the Linux Perf tool provides a perf record utility command that captures performance events for later analysis. The collected data can then be transformed into a graph.

In general, the command perf record can be used to run the program and libc_probe:malloc to collect information, as shown here:

perf record --call-graph dwarf,16384 \
     -m 50000 \
     -e 'probe_libc:*' -- \
     .build/release/your-program-name

Breaking down this command provides the following construct:

Your program output should look similar to this:

<your program's output>
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 401.088 MB perf.data (49640 samples) ]

By placing user probes at strategic points in your codebase, you can track and log allocation events to gain insights into memory allocation patterns, identify potential performance issues or memory leaks, and analyze memory usage in your application.

Important: If the perf output returns lost chunks and makes a check the IO/CPU overload! request, see Overcoming lost chunks of data below.

Creating flame graphs

Once you’ve successfully recorded data using perf record, you can invoke the following command to produce an SVG file with the flame graph:

perf script | \
    /FlameGraph/stackcollapse-perf.pl - | \
    swift demangle --simplified | \
    /FlameGraph/flamegraph.pl --countname allocations \
        --width 1600 > out.svg

Here’s a breakdown of this command construct:

Once the command has been completed, an SVG file is generated that you can open in your browser.

Note: Lengthy run times may result depending on the data size, algorithm complexity, resource limitations such as CPU power or memory, poorly optimized or inefficient code, external services, APIs, or network latency causing a slowdown.

Reading flame graphs

This flame graph is a direct result of the example program in this section. Hover over the stack frames to get more information, or click on any stack frame to zoom in on a sub-tree.

Flame graph

Allocation flame graphs on macOS

Although much of this tutorial focuses on the perf tool, you can create the same graphs using macOS.

Step 1. To get started, collect the raw data using the DTrace framework by running this command:

sudo dtrace -n 'pid$target::malloc:entry,pid$target::posix_memalign:entry,pid$target::calloc:entry,pid$target::malloc_zone_malloc:entry,pid$target::malloc_zone_calloc:entry,pid$target::malloc_zone_memalign:entry { @s[ustack(100)] = count(); } ::END { printa(@s); }' -c .build/release/your-program > raw.stacks

Like Linux’s perf user probes, DTrace also uses probes. The previous command instructs dtrace to aggregate the number of calls to the allocation function equivalents:

Note: On Apple platforms, Swift uses a slightly larger number of allocation functions than Linux.

Step 2. Once the data is collected, run this command to create an SVG file:

cat raw.stacks |\
    /FlameGraph/stackcollapse.pl - | \
    swift demangle --simplified | \
    /FlameGraph/flamegraph.pl --countname allocations \

        --width 1600 > out.svg

You will notice this command is similar to the perf invocation, except:

Other perf tricks

Swift’s allocation patterns

Optimizing memory allocations and improving code efficiency based on the information provided by the flame graph can help make your Swift code more performant and visually appealing. The shape of allocations in Swift can vary depending on the type of memory being allocated and the way it is used.

Some common shapes of allocations in Swift include:

For example, a class instance (which allocates) calls swift_allocObject, which calls swift_slowAlloc, which calls malloc that contains the user probe.

“Prettifying” allocation patterns

To make your flame graph look good (after demangling the collapsed stacks) insert the following code into the Linux perf script code (above) by:

These changes should look like this:

sed -e 's/specialized //g' \
    -e 's/;swift_allocObject;swift_slowAlloc;__libc_malloc/;A/g'

To produce a visually appealing SVG file flame graph when analyzing memory allocations in Swift, use the complete command:

perf script | \
    /FlameGraph/stackcollapse-perf.pl - | \
    swift demangle --simplified | \
    sed -e 's/specialized //g' \
        -e 's/;swift_allocObject;swift_slowAlloc;__libc_malloc/;A/g' | \
    /FlameGraph/flamegraph.pl --countname allocations --flamechart --hash \
    > out.svg

Overcoming lost chunks of data

When using perf with the DWARF call stack unwinding, you may encounter this issue:

[ perf record: Woken up 189 times to write data ]
Warning:
Processed 4346 events and lost 144 chunks!

Check IO/CPU overload!

[ perf record: Captured and wrote 30.868 MB perf.data (3817 samples) ]

If perf indicates it lost several chunks, it means it lost data. When perf loses data, you can use these options to help resolve the issue:

Overall, these practices help you understand your program’s behavior, identify bottlenecks, and improve performance in your Swift applications.