Optimizing pgEdge Distributed PostgreSQL in your AWS environment
Amid the considerable activity and announcements during AWS re:Invent and the Las Vegas PostgreSQL Meetup sponsored by pgEdge last week we were reminded of a recurring question that comes up for our customers and prospects who self-manage pgEdge Platform in their own AWS environments. We often get asked how to optimize individual node performance when running pgEdge Platform on Amazon EC2 instances, and how do you achieve IOPs performance comparable with Amazon’s RDS. Often there is an inherent performance improvement that comes from distributing the workload across the cluster. However, the key to harnessing the full potential lies in the careful configuration of PostgreSQL parameters, appropriate sizing and configuration of EBS volumes, selecting EC2 instance types with ample resources, refining indexing strategies, and maintaining vigilance through robust monitoring practices. Addressing these considerations will help ensure optimal performance across your pgEdge cluster. Let’s look at each of them in turn.
Optimize PostgreSQL Configuration:
Carefully tuning the PostgreSQL configuration parameters such as shared_buffers, effective_cache_size, work_mem, and a few more important ones to align with the available resources on your EC2 instance. This ensures that PostgreSQL effectively utilizes the allocated memory for optimal performance. As part of setting up the pgEdge Platform, we leverage pgtune with cluster orchestration to enhance the performance tuning process, ensuring optimal resource utilization and efficiency across the entire database cluster. Through automated configuration adjustments by pgtune, database nodes within the cluster can dynamically adapt to varying workloads and resource availability. This integration streamlines the management of PostgreSQL clusters, allowing administrators to fine-tune parameters, allocate resources effectively, and maintain optimal performance across the distributed environment.
As a best practice, we configure the pg_wal directory on a separate mount point which stores the Write-Ahead Logging (WAL) files. This separation can prevent contention for disk resources, potentially reducing the risk of I/O bottlenecks. Moreover, on a separate mount point, the pg_wal directory can benefit from specific optimizations, like using a different storage device with higher write throughput. This configuration is particularly valuable in scenarios where write performance is critical, such as high-transaction-rate databases or systems with stringent durability requirements. When implementing this setup, we should carefully consider the storage characteristics and monitor I/O performance. Additionally, we'll employ parallelism to ensure that the subscriber remains reasonably up-to-date, especially during periods when the publisher is processing substantial data ingestions.
Properly Size and Configure EBS Volumes:
We access the workload thoroughly over discovery calls and help our Customers configure Elastic Block Store (EBS) volumes appropriately for achieving optimal PostgreSQL performance on EC2. A few of our key considerations include:
Volume Type: Selecting the right EBS volume type based on the workload is essential. General Purpose SSD volumes offer a balance of performance and cost, while Provisioned IOPS SSD volumes, NVMe's provide predictable and dedicated I/O performance.
Size and IOPS: Properly sizing the EBS volume involves determining the required storage capacity and ensuring sufficient provisioned I/O operations per second (IOPS). Align the volume size and IOPS with the database workload and performance requirements.
Throughput: Consider the throughput requirements of your database workload and choose EBS volumes with adequate provisioned throughput. This is particularly important for workloads with high I/O demands.
To sum up, choosing an appropriate Amazon Elastic Block Store (EBS) volume type for your workload (e.g., General Purpose SSD, Provisioned IOPS SSD) and size it according to your database requirements. Ensure that the IOPS and throughput of the EBS volume meet or exceed the desired performance level.
Leverage Instance Types with Sufficient Resources:
Selecting an EC2 instance type that adequately balances CPU, memory, and network performance for the PostgreSQL workload. Instances with high I/O capabilities, such as those optimized for storage (e.g., I3 instances), can significantly impact database performance. A few of our key considerations include:
Understanding Resource Requirements: We begin by assessing the resource requirements of the PostgreSQL workload and consider factors such as CPU utilization, memory usage, and I/O patterns. We also understand the characteristics of the database, including read and write operations, to make informed decisions about the required resources.
EC2 Instance Families: AWS offers a variety of EC2 instance families, each tailored to different use cases. For optimal PostgreSQL performance, we recommend Customers to consider instances from families Storage Optimized (I), or instances optimized for specific use cases like burstable performance (T instances).
Storage Performance: Depending on the database size and I/O requirements, we help choose instances with appropriate storage options. Instances optimized for storage, such as the I3 family, provide high-speed NVMe SSD storage, enhancing I/O performance. We understand the relationship between EC2 instance types and their associated storage options to meet the IOPS requirements of your workload.
GPU Instances for Specialized Workloads: In cases where the workload involves specialized computations or data processing that can benefit from GPU acceleration, we consider leveraging instances that come with GPU support. GPU instances can significantly enhance performance for tasks like parallel processing or data analytics.
Achieving the same level of IOPs as seen with Amazon RDS might not always be fully possible, as the performance characteristics and optimizations of EC2 instances and RDS instances can vary. However, implementing these best practices will help you maximize the performance of your pgEdge nodes running on EC2.