Creating an extension for Postgres is an experience worthy of immense satisfaction. You get to contribute to the extension ecosystem while providing valuable functionality to other Postgres users. It’s also an incredibly challenging exercise in many ways, so we’re glad you’ve returned to learn a bit more about building Postgres extensions.

In the previous article in this series, we discussed creating an extension to block DDL. That sample extension was admittedly fairly trivial, in that it only added a single configuration parameter and utilized one callback hook. A more complete extension would provide a function or view to Postgres so users could interact with the extension itself. So let’s do just that!

Deciding on Data

Once again we’re faced with choosing a topic for the extension. Users will sometimes ask the question: how much memory is Postgres using? Using various tools like top or ps will show a fairly inaccurate picture of this, limiting results to opaque fields like VIRT, RES, SHR, PSS, RSS, and others. Some are aggregates, others include Postgres shared buffers, and none really describe how memory is being used.

Luckily on Linux systems, there’s an incredibly handy /proc filesystem that provides a plethora of information about several process metrics, including memory. The smaps file in particular reports several process-specific memory categories, and does so on a per-allocation basis.  What if we could parse that file for every Postgres backend process and return output in a table? Admins could then see exactly which user sessions or worker processes are using the most memory and why, rather than an imprecise virtual or resident memory summary.

Sounds interesting!

Starting the Extension

As with our previous extension, we need to bootstrap the project with a few files. Start with creating the project folder:

mkdir pg_meminfo
cd pg_meminfo

And create a pg_meminfo.control file with these contents:

comment = 'PG MemInfo Extension'
default_version = '1.0'
module_pathname = '$libdir/pg_meminfo'

As before, we just need to name the extension, give it a version, and provide an installation path for the resulting library file. Nothing surprising so far.

Next we need a makefile; let’s start with this:

MODULES = pg_meminfo

PGFILEDESC = "Extension to view Postgres related memory information"
EXTENSION = pg_meminfo

PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)

Look familiar? Simple extensions won’t likely need to customize the makefile very much, and in this case we’re just describing the extension and its contents. More sophisticated extensions require more complicated makefiles, but we’re still covering easier ground.

Don’t forget your copy of the Postgres source code; it’s going to be incredibly valuable in this exercise.

Functions and Macros

Now it’s time to work on the body of our extension. Before that, we need to learn a bit more about some of the macros and API functions Postgres provides for this kind of extension. We already know about the PG_MODULE_MAGIC macro which prepares the extension for Postgres, but there are several more.

The C-Language Functions documentation says that any function exposed to users must be declared with the PG_FUNCTION_INFO_V1() macro, and the function should use PG_FUNCTION_ARGS rather than an actual argument list. The function we’re building won’t take any arguments, but the macro is required anyway.

The “Returning Sets” section in particular explains that there are two types of functions which can return a set: value per call, or materialized. Given we’re parsing potentially ephemeral kernel memory mappings, we probably want to return the entire data-set at once. We also have to consider the fact it’s necessary to obtain the list of running processes, and if we used value-per-call, we’d have to effectively cache that output so the function could use it for each call. It’s simply easier to materialize for now.

Since we’re writing a function, the src/backend/utils/fmgr/README file also presents a few pertinent structures we’ll need to know:

  • InitMaterializedSRF() - Sets up the Postgres memory environment to handle the result set, accepted modes, error handling, memory contexts, and more. It takes care of a lot of boilerplate we’d normally have to do ourselves. Take note that this function only exists in Postgres 15 and above! Declared in src/include/funcapi.h.

  • fcinfo->resultinfo - A structure holding information about the result set we’re expected to return.

  • PG_RETURN_VOID() - This macro is actually declared in src/include/fmgr.h and Materialized set-returning functions should use it when there are no results. We mention it here because it really should be in the README.

The documentation also says a set-returning function will need to interact with tuplestores. That means we should examine src/include/utils/tuplestore.h to understand that API. The most interesting function here is tuplestore_putvalues, which takes a Tuplestorestate, a TupleDesc, an array of Datum values, and an array of booleans defining which are NULL. This will allow us to pass an entire array of values representing the row, and another array specifying which (if any) are null.

The tricky part is figuring out what Tuplestorestate and TupleDesc are. It turns out that fcinfo->resultinfo in this context is a ReturnSetInfo struct, and if we examine src/include/nodes/execnodes.h where it’s defined, we’ll see that the setResult field is a Tuplestorestate which is intended to contain our result set! Similarly, setDesc is a TupleDesc which describes the tuples we’re returning. That should be all we need for storing the return values.

We don’t quite have everything, however. Postgres expects result sets to be composed of Datum structures, which encapsulate supported data types like Int, Text, and so on. In order to transform “raw” C types into these, we need to use another set of functions defined in src/include/postgres.h and src/include/utils/builtins.h. Primarily we are looking for these functions:

  • CStringGetTextDatum - Convert a raw C string to a Postgres Datum type.

  • Int32GetDatum - Convert a raw 32-bit integer to a Postgres Datum type.

Why is CStringGetTextDatum defined in a completely different file than Int32GetDatum? Who knows. It might be tempting to use CStringGetDatum, but that would be a mistake; Postgres Text and CString types are treated quite differently and this would make it difficult to manipulate the fields in queries.

Next we need the ability to capture the contents of pg_stat_activity so we can use the PID of each process to obtain all of the memory mapping information. Finding that is a bit of an adventure. First we start at src/backend/catalog/system_views.sql where it’s defined, to find that it’s a view which calls the pg_stat_get_activity() function. That function is defined in src/backend/utils/adt/pgstatfuncs.c and provides a great example at how to make internal system calls. This function is meant to be used by Postgres user sessions, so we want to effectively “mine” it for internal routines. Here are the important ones:

  • pgstat_fetch_stat_numbackends() - Reports the amount of backends currently in use, which we can use to control a looping variable.

  • pgstat_get_local_beentry_by_index() - Returns a LocalPgBackendStatus struct for the indexed backend parameter.

These are all defined in src/include/utils/backend_status.h, which is fairly convenient for us.

Finally, we can’t forget about security. This function returns a lot of potentially sensitive information, so we want to restrict it to users who are either superusers or who have access to the pg_read_all_stats predefined role. If we check out src/include/miscadmin.h, the GetUserId() function retrieves the UID of the current user.

Nailing down the ability to check the pg_read_all_stats role is a bit trickier. Searching the source basically only shows this string in source comments, but the usage is clear: use has_privs_of_role() to check a user’s roles against ROLE_PG_READ_ALL_STATS. That means we also need the src/include/utils/acl.h header.

There’s one final piece that’s slightly non-obvious. The ROLE_PG_READ_ALL_STATS definition doesn’t actually exist yet. That header file is generated during the build process and placed in a file included by src/include/catalog/pg_authid.h. This file is then included by src/backend/utils/adt/acl.c and not the previously mentioned utils/acl.h. This happens frequently when working with catalog data since these are defined as files ending with a .dat extension that are converted into actual headers at build time.

It’s admittedly a bit confusing, but forewarned is forearmed!

Ins and Outs

Isn’t it interesting how much we had to search the code for various Postgres internal system calls? The published documentation really only scratches the surface of everything necessary to write even a fairly basic extension, so always have a copy of the actual source handy!

Given our discussion so far, we’ll need several includes:

#include "postgres.h"
#include "nodes/execnodes.h"
#include "catalog/pg_authid.h"
#include "utils/acl.h"
#include "utils/elog.h"
#include "utils/backend_status.h"
#include "utils/builtins.h"
#include "utils/tuplestore.h"
#include "miscadmin.h"
#include "fmgr.h"
#include "funcapi.h"

It’s possible to reduce this list by almost half by accounting for the dependency graph, but in the interest of being complete, it’s best to simply include every header which declares a referenced symbol. Also note that as with the previous extension, the postgres.h header always comes first.

Then start the module, declare the function we plan to expose to Postgres, and begin with the function definition:

PG_MODULE_MAGIC;
PG_FUNCTION_INFO_V1(get_smaps);

Datum
get_smaps(PG_FUNCTION_ARGS)
{

These are all things we’ve discussed thus far, and should present no surprise. The function takes PG_FUNCTION_ARGS which expands to FunctionCallInfo fcinfo, a variable we’ll be using frequently. The function must return a Datum, even if we’re returning a set.

For the function body, we need to do a bit of housekeeping. As discussed, the first step is to limit execution only to allowed roles. That means we need some variant of this code in the body:

Oid userid = GetUserId();

if (!has_privs_of_role(userid, ROLE_PG_READ_ALL_STATS)) {
	ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
		errmsg("Must be superuser or member of pg_read_all_stats.")));
	PG_RETURN_VOID();
}

The only thing we haven’t discussed here is the ERRCODE_INSUFFICIENT_PRIVILEGE error code. These are described in src/backend/utils/errcodes.txt, so always check this to see if there is a more applicable error when rejecting user actions.

Next we call the indispensable helper function that makes all of this possible:

InitMaterializedSRF(fcinfo, 0);

This single function call prepares our function environment for returning a set, performing all of these actions:

  • Ensuring the caller can receive a result set.

  • Asserting materialize mode is allowed.

  • Creating a tuple store and description in the correct context to store the result set.

  • Setting several result-set attributes to enable returning a result set.

It’s about forty lines of code we no longer have to worry about.

Next would come the actual program body, which we’ll discuss at greater depth in the following section. The end of the function should always end with this:

PG_RETURN_VOID();

This indicates to Postgres that row production is complete.

The Main Loop

Now we’re ready to build the central loop for the set results. Recall how we have a function to retrieve the number of existing backend processes, and another to fetch the contents of each. Here’s how we could use those to locate the appropriate smap file for parsing:

int backend;
int num_backends = pgstat_fetch_stat_numbackends();

for (backend = 1; backend <= num_backends; backend++)
{
	char proc_path[256] = {0};
	FILE *fp;

	LocalPgBackendStatus *entry = pgstat_get_local_beentry_by_index(backend);
	int32_t pid = entry->backendStatus.st_procpid;
	sprintf(proc_path, "/proc/%d/smaps", pid);
	fp = fopen(proc_path, "r");

	if (fp == NULL)
		continue;

	while (MapFields = ParseMap(fp))
	{
		/* Processing code here */
	}
	fclose(fp);

} // End backend loop

It’s quite beneficial that we can retrieve all backend status simply using the index of the backend. In our case it’s only necessary to retrieve the PID. Once we have that, we just need to loop through each of the smap files using fopen. Perhaps it would be better to include some error-handling here in case the files are unavailable or can’t be opened, but this is fine for demonstration purposes.

The parsing loop is a bit more complicated. Since this is a blog, we’re using a magical function called ParseMap which will return a MapFields struct of all known smap fields. For the actual implementation, we’ve put together a GitHub repository you can use for a full understanding of how we built this extension.

The inside of the parsing loop needs a few things for the tuple functions we discussed earlier: an array of Datums, an array of nulls, and a call to the tuplestore_putvalues function. Given our magic function and struct, here’s how that might work:

Datum values[SMAP_FIELD_COUNT] = {0};
bool nulls[SMAP_FIELD_COUNT] = {0};

values[0] = Int32GetDatum(pid);
values[1] = CStringGetTextDatum(MapFields->start_addr);
values[2] = CStringGetTextDatum(MapFields->end_addr);
values[3] = CStringGetTextDatum(MapFields->perms);
values[4] = CStringGetTextDatum(MapFields->offset);
values[5] = CStringGetTextDatum(MapFields->dev_major);
values[6] = CStringGetTextDatum(MapFields->dev_minor);
values[7] = Int32GetDatum(MapFields->inode);
values[8] = CStringGetTextDatum(MapFields->path);
values[9] = Int32GetDatum(MapFields->size);
/* skip several fields */
values[32] = CStringGetTextDatum(MapFields->vm_flags);

tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
	values, nulls);

Note from this example that we never set any values in the nulls array under the assumption that zero is the default and preferred value. Aside from the metadata rows about the memory segment itself, most of the other fields are some amount of kilobytes. Rather than showing all 33 possible fields, we’ve elected to omit them for the sake of brevity.

So long as the parsing function returns records from the memory mapping, the materialized result will continue accumulating rows. This will repeat for each Postgres backend until every one has been processed, and then we exit the loop. Once the function returns, the accumulated tuples are handed to the caller.

Becoming a Postgres Extension

Writing the function is only half of the story, it’s also necessary to create a SQL to C binding. This accomplishes a couple of things:

  • Provide a SQL wrapper so the function is usable within Postgres sessions.

  • Defines all of the types for each column in the result set. This is how Postgres builds the setDesc portion of the result set struct.

This has a naming convention based the Packaging Related Objects into an Extension documentation. For our extension, this would be: pg_meminfo--1.0.sql. The contents should be the function declaration including the entirety of the result set. For this function and all fields in the memory mappings, it would look like this:

CREATE OR REPLACE FUNCTION get_all_smaps()
RETURNS TABLE(
	pid INT, start_address TEXT, 
	end_address TEXT, permissions CHAR(4),
	byte_offset TEXT, dev_major CHAR(2),
	dev_minor CHAR(2), inode INT,
	sys_path TEXT, mem_size INT,
	kernel_page_size INT, mmu_page_size INT,
	rss INT, pss INT, pss_dirty INT, shared_clean INT,
	shared_dirty INT, private_clean INT, private_dirty INT,
	referenced INT, anonymous INT, ksm INT, lazy_free INT,
	anon_huge_pages INT, shmem_pmd_mapped INT,
	file_pmd_mapped INT, shared_huge_tlb INT,
	private_huge_tlb INT, swap INT, swap_pss INT,
	locked INT, thp_eligible INT, vm_flags TEXT
)
AS 'MODULE_PATHNAME', 'get_smaps' 
LANGUAGE C VOLATILE;

The MODULE_PATHNAME portion will automatically be substituted for the installed location of the shared library in Postgres. Running this SQL after the extension shared object files are installed will create an SQL function named get_all_smaps() that would call our get_smaps() C function.

This would be the minimum to get a working Postgres extension. Assuming all of the files are complete, installing consists of a few steps. First is to build and install the actual library:

make
sudo make install

Next, restart Postgres using your favorite method. And finally, install the extension itself:

CREATE EXTENSION pg_meminfo;

This actually executes the SQL file we prepared above, ensuring that the function mapping exists. Once that happens, all subsequent Postgres sessions should be able to utilize the get_all_smaps() function like this:

SELECT * FROM get_all_smaps();

It sure was a lot of work getting to that point, but it was all worth it!

Odds and Ends

We definitely glossed over a few pertinent details here, the most obvious of which is the implementation of the memory map parser itself. Unless all intermediate structs and functions are defined in the same file as the extension function itself, these are probably in a separate C file for organization purposes. (Again, full sources for this blog are available in a dedicated GitHub repository.)

Imagine these are in a parser.c source file that should be part of the extension build procedure. The Makefile example we used doesn’t really account for building multiple sources, or binding them together into a single module. It’s easy to remedy however by using dedicated build variables in the Makefile itself:

#MODULES = pg_meminfo    # Using MODULE_big instead
MODULE_big = pg_meminfo
OBJS = meminfo.o parser.o

The assumption is now that meminfo.c and parser.c exist, produce the resulting object files during the build step, and are then combined into a single pg_meminfo.o module. Everything else is the same as before and the extension will work as described once installed, we simply have more organized source files as a result.

Conclusion

We’ve performed a rather deep dive into the Postgres extension API in this blog in order to write a set-returning function. Hopefully you’ve been able to learn these concepts by following along:

  • Create a materialized set-returning function in Postgres.

  • Searching through the Postgres source code for necessary APIs.

  • Check user role access.

  • Examine Postgres backend status.

  • Build extensions using multiple source files.

As you might expect, there’s a whole lot more content to explore under the topic of Postgres extensions. PgEdge is dedicated to giving back to the Postgres community, and part of that is imparting hard-won knowledge to budding developers who want to contribute to Postgres. We hope this is only the first step in your journey.

There is at least one more article planned for this extension series where we tackle the topic of background workers. What would you like to see after that? Let us know, and maybe your topic will be next in the series!