A previous article, Debugging applications within Red Hat OpenShift containers, gives an overview of tools for debugging applications within Red Hat OpenShift containers, and existing restrictions on their use. One of the restrictions discussed in that article was an inability to install debugging tool packages into an ordinary, unprivileged container once it was already instantiated. In such a container, debugging tool packages have to be included when the container image is built, because once the container is instantiated, using package installation commands requires elevated privileges that are not available to the ordinary container user.
However, there are important situations where it is desirable to install a debugging tool into an already-instantiated container. In particular, if the resolution of a problem requires access to the temporary state of a long-running containerized application, the usual method of adding debugging tools to the container by rebuilding the container image and restarting the application will destroy that temporary state.
To provide a way to add debugging tools to unprivileged containers, I developed a utility, called oc-inject
, that can temporarily copy a debugging tool into a container. Instead of relying on package management or other privileged operations, oc-inject
’s implementation is based on the existing and well-supported OpenShift operations oc rsync
and oc exec
, which do not require any elevated privileges.
This article describes the current capabilities of the oc-inject
utility, which is available on GitHub or via a Fedora COPR repository. The oc-inject
utility works on any Linux system that includes Python 3, the ldd
utility, and the Red Hat OpenShift command-line tool oc
.
How oc-inject
works
oc-inject
is a command-line utility that can be invoked from any local Linux system that has been configured to communicate with an OpenShift cluster via the oc
command-line tool. The oc-inject
utility has the following command-line syntax:
oc-inject <pod_ID> <executable>
Here, pod_ID
is the name of an OpenShift container, and executable
is the name of an executable on the local system.
The oc-inject
utility installs the specified executable into the container and then runs it. An executable installed by oc-inject
could be a debugging tool or another system utility that would otherwise not be available in the container.
For example, using oc-inject
, we can install and run the htop
utility in order to visualize the CPU and memory usage of processes within the container myapp-rxxrw
(outlined in Figure 1):
$ oc-inject -it myapp-rxxrw htopFigure 1: The flow of an
oc-inject
operation.">The oc-inject
utility operates as follows:
- First,
oc-inject
uses theldd
utility to identify the set of shared libraries required by the executable. - Second,
oc-inject
invokes the OpenShiftoc rsync
command to copy the executable and the identified shared libraries into a temporary directory within the container. - Finally,
oc-inject
invokes theoc exec
command to run the executable. In order for the executable to use the shared libraries within the temporary directory,oc-inject
sets the executable’sLD_LIBRARY_PATH
environment variable to this directory.
It is important to keep in mind that if the executable installed by oc-inject
depends on files other than shared libraries, oc-inject
will not copy these files into the container. This limitation narrows the set of executables that can be installed with oc-inject
. However, in practice, such commonly used debugging tools as gdbserver
and strace
require only shared libraries and can be successfully installed and run using oc-inject
.
The examples in the following sections illustrate how the gdbserver
and strace
debugging tools can be installed and used to observe the behavior of a containerized application. The procedures in these examples were tested on an OpenShift 4.2.8 cluster managed with CodeReady Containers 1.2.0.
Example 1: Tracing system calls in a PostgreSQL process using strace
- Create an OpenShift application based on the
rails-ex
application template from the software-collections.org repository:
$ git clone https://github.com/sclorg/rails-ex $ oc new-app rails-ex/openshift/templates/rails-postgresql.json -p SOURCE_REPOSITORY_URL=https://github.com/sclorg/rails-ex
This template creates several containers, including a container with a PostgreSQL database.
- Run
oc get pods
andps -ax
to identify the name of the PostgreSQL container and the PIDs of processes within the container:
$ oc get pods NAME READY STATUS RESTARTS AGE postgresql-1-deploy 0/1 Completed 0 4m23s postgresql-1-jfg52 1/1 Running 0 4m8s rails-postgresql-example-1-build 0/1 Completed 0 4m24s rails-postgresql-example-1-deploy 0/1 Completed 0 72s rails-postgresql-example-1-gg5hm 1/1 Running 0 26s rails-postgresql-example-1-hook-pre 0/1 Completed 0 63s $ oc exec -it postgresql-1-jfg52 -- ps -ax PID TTY STAT TIME COMMAND 1 ? Ss 0:00 postgres 62 ? Ss 0:00 postgres: logger process 64 ? Ss 0:00 postgres: checkpointer process 65 ? Ss 0:00 postgres: writer process 66 ? Ss 0:00 postgres: wal writer process 67 ? Ss 0:00 postgres: autovacuum launcher process 68 ? Ss 0:00 postgres: stats collector process 69 ? Ss 0:00 postgres: bgworker: logical replication launcher 391 ? Ss 0:00 postgres: userY5Q root 10.128.0.190(39754) idle 414 ? Ss 0:00 postgres: userY5Q root 10.128.0.190(39882) idle 481 pts/0 Rs+ 0:00 ps -ax
- We are interested in tracing the system calls made by one of the PostgreSQL worker processes. The output of
ps -ax
lists two such processes, with PIDs 391 and 414. For this example, we will trace the process with PID 414. We invokeoc-inject
to install anstrace
executable into the container:
$ oc-inject -it postgresql-1-jfg52 -- strace -p 414 /tmp/oc-inject-af154698/strace: Process 414 attached epoll_wait(3, [{EPOLLIN, {u32=34512144, u64=34512144}}], 1, -1) = 1 recvfrom(11, "Q\0\0\0\rSELECT 1\0", 8192, 0, NULL, NULL) = 14 sendto(10, "\2\0\0\0\230\0\0\0\1@\0\0\1\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 152, 0, NULL, 0) = 152 sendto(11, "T\0\0\0!\0\1?column?\0\0\0\0\0\0\0\0\0\0\27\0\4\377\377\377\377"..., 66, 0, NULL, 0) = 66 recvfrom(11, "P\0\0\0+\0SELECT \"articles\".* FROM \""..., 8192, 0, NULL, NULL) = 81 lseek(15, 0, SEEK_END) = 8192 lseek(16, 0, SEEK_END) = 16384 lseek(15, 0, SEEK_END) = 8192 sendto(11, "1\0\0\0\0042\0\0\0\4T\0\0\0\204\0\5id\0\0\0@\24\0\1\0\0\0\27\0\4"..., 268, 0, NULL, 0) = 268 recvfrom(11, 0xcad700, 8192, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) epoll_wait(3, [{EPOLLIN, {u32=34512144, u64=34512144}}], 1, -1) = 1 ... ^C /tmp/oc-inject-af154698/strace: Process 414 detached <detached ...> command terminated with exit code 130
Thus, we obtained a trace of interactions between the PostgreSQL process and the underlying operating system.
Example 2: Tracing PostgreSQL's internal behavior by attaching to SDT markers with gdbserver
In this example, we demonstrate how to collect trace data from a PostgreSQL process. In order to do this, we use statically defined tracing (SDT) markers, which identify various high-level events within the process. The SDT marker for an event has a list of arguments that provide information about the event to a debugging tool.
Note: Many applications, libraries, and runtimes provide built-in SDT markers that can be traced by GDB, including the PostgreSQL, MySQL and MariaDB database engines; core system libraries such as
glibc
; and language runtimes for Python, Ruby, Java, and Node.js. A more comprehensive list of applications and libraries with SDT markers is maintained on the SystemTap wiki. In addition to the official GDB documentation, an earlier blog series by Sergio Durigan Junior (part 2, part 3) gives more information about GDB’s support for tracing SDT markers.
PostgreSQL’s built-in SDT markers identify various high-level database events. The PostgreSQL documentation gives a full description of available markers and associated arguments.
The following steps illustrate how to collect SDT trace data using gdbserver
:
- Start a containerized PostgreSQL process following the same procedure as described in Example 1's first step.
- Start a GDB session outside the container:
$ gdb GNU gdb (GDB) Fedora 8.2-6.fc29 Copyright (C) 2018 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. (gdb)
- Invoke
oc-inject
to install and rungdbserver
within the container, and instruct the GDB session to connect to thegdbserver
’s standard input and output:
(gdb) target extended-remote | oc-inject -i postgresql-1-jfg52 -- gdbserver --multi - Remote debugging using | oc-inject -i postgresql-1-jfg52 -- gdbserver --multi - Remote debugging using stdio (gdb)
- Instruct the running
gdbserver
to attach to the desired PostgreSQL worker process (PID 391 in this case):
(gdb) attach 391 Attaching to process 391 Attached; pid = 391 Reading /opt/rh/rh-postgresql10/root/usr/bin/postgres from remote target... warning: File transfers from remote targets can be slow. Use "set sysroot" to access files locally instead. Reading /opt/rh/rh-postgresql10/root/usr/bin/postgres from remote target... Reading symbols from target:/opt/rh/rh-postgresql10/root/usr/bin/postgres...Reading /opt/rh/rh-postgresql10/root/usr/bin/postgres.debug from remote target... Reading /opt/rh/rh-postgresql10/root/usr/bin/.debug/postgres.debug from remote target... Missing separate debuginfo for target:/opt/rh/rh-postgresql10/root/usr/bin/postgres Try: dnf --enablerepo='*debug*' install /usr/lib/debug/.build-id/14/1c5e620c9ea33d7e9214b6e4d5a4d4519e4a10.debug Reading symbols from .gnu_debugdata for target:/opt/rh/rh-postgresql10/root/usr/bin/postgres...(no debugging symbols found)...done. (no debugging symbols found)...done. ... 0x00007fee94155973 in __select_nocancel () from target:/lib64/libc.so.6
- After attaching to the process, obtain a list of available SDT markers with GDB’s
info probes
command:
(gdb) info probes ... stap postgresql query__rewrite__start 0x00000000007189fd 0x0000000000cac470 target:/opt/rh/rh-postgresql10/root/usr/bin/postgres stap postgresql query__start 0x0000000000718c45 0x0000000000cac464 target:/opt/rh/rh-postgresql10/root/usr/bin/postgres stap postgresql smgr__md__read__done 0x00000000007142c5 0x0000000000cac42a target:/opt/rh/rh-postgresql10/root/usr/bin/postgres ...
- Define a tracepoint that will trigger on the
query__start
marker, which identifies the event of the process starting to execute a database query:
(gdb) trace -probe-stap query__start Tracepoint 1 at 0x718c45
As described in the PostgreSQL documentation, the query__start
marker returns the query string through an argument of type char *
. In a GDB session, this argument can be referenced via the identifier $_probe_arg0
.
- Define the actions we want GDB to take whenever the tracepoint triggers. Let’s say that we want GDB to collect the value of the
query__start
SDT marker's query string argument. To do so, we must instruct GDB to collect both the value of the argument as well as the memory locations this value points to:
(gdb) actions 1 >collect $_probe_arg0 >collect *(unsigned char *)$_probe_arg0@512 >end
- Use the
tstart
command described in GDB’s documentation ontracepoints
to collect trace data while the program continues running:
(gdb) tstart (gdb) continue &
- As the program continues to run, GDB will continue to collect trace data. To view the collected data, interrupt the program and use the
tfind
command to step through the collected trace data:
(gdb) interrupt (gdb) tstatus Trace is running on the target. Collected 18 trace frames. Trace buffer has 5229992 bytes of 5242880 bytes free (0% full). Trace will stop if GDB disconnects. Not looking at any trace frame. Trace started at 7393.098048 secs, stopped -111.-534238 secs later. (gdb) tstop (gdb) tfind start Found trace frame 0, tracepoint 1 #0 0x0000000000718c45 in exec_simple_query () (gdb) print/x $_probe_arg0 $1 = 0x1763358 (gdb) tdump Data collected at tracepoint 1, trace frame 0: $_probe_arg0 = 24523608 *(unsigned char *)$_probe_arg0@512 = "SELECT 1\000\000\000ticles\".* FROM \"articles\"\000\000\000\000CT indrelid, indkey, generate_subscripts(indkey, 1) idx\n", ' ' <repeats 11 times>, "FROM pg_index\n WHERE indrelid = '\"articles\"'::regclass\n", ' ' <repeats 12 times>, "AND indisprimary\n"... (gdb) while ($trace_frame != -1) >tfind >tdump >end ... Found trace frame 10, tracepoint 1 Data collected at tracepoint 1, trace frame 10: $_probe_arg0 = 24523608 *(unsigned char *)$_probe_arg0@512 = "BEGIN\000\000\000\000\000\000\002\000\000\000\001\062\000\000\000\001\061\000\001\000\000\000 SELECT indrelid, indkey, generate_subscripts(indkey, 1) idx\n", ' ' <repeats 11 times>, "FROM pg_index\n WHERE indrelid = '\"comments\"'::regclass\n", ' ' <repeats 12 times>, "AND indisprimary\n"... Found trace frame 11, tracepoint 2 Data collected at tracepoint 2, trace frame 11: Found trace frame 12, tracepoint 1 Data collected at tracepoint 1, trace frame 12: $_probe_arg0 = 24523608 *(unsigned char *)$_probe_arg0@512 = "COMMIT\000\000\000\000\000\000\000\000\000\005\000\000\000\021This is a comment\000\000\000\vRead me!\000\000\000\001\062\000\000\000\032\062\060\061\071-11-27 19:49:34.771435\000\000\000\032\062\060\061\071-11-27 19:49:34.771435\000\001\000\000\000ING \"id\"\000\000\005", '\000' <repeats 21 times>, "ents\"'::regclass\n", ' ' <repeats 12 times>, "AND indisprimary\n"...
Thus, we used SDT markers to extract information about an event in the PostgreSQL process. After we finish observing this PostgreSQL process, use GDB’s monitor exit
command to stop the gdbserver
process within the container:
(gdb) detach (gdb) monitor exit (gdb) ^D
Conclusion
The examples in this article illustrate how the current version of oc-inject
increases the options available for debugging containerized applications.