Author

Ankur Mandal

Ultimate Guide To Persistent Disks

‍

Author

Ankur Mandal

5 min read

Among the different storage options available in Google Cloud Platform (GCP), Persistent Disks is one of the most reliable and scalable block storage solutions. As a durable and high-performance solution, it ensures that the data remains intact despite a sudden disruption or system failure. It empowers users to efficiently manage the data regardless of whether it is for application development, analytics, or any other functionality of the cloud infrastructure.

When we talk about persistent disks, there are different aspects to remember, such as functions, operations, types, and more. This blog will dive deep into all these aspects to ensure you have a comprehensive understanding of GCP's Persistent Disks by the time you finish this blog.

Similar to other leading cloud service providers like AWS Azure, GCP also offers two cloud storage options: block storage and object storage. Persistent disks fall under the block storage category and are available in different sizes and types, allowing users to select the one that best suits their needs. It offers users exceptional flexibility as it can be created and attached to VM instances anytime.

Persistent disk has the following features that make it suitable for various applications.

Performance: The performance of Persistent Disk relies on factors such as disk type, size, I/O block size, virtual machine vCPU count, and other related variables. Capable of achieving up to 100k IOPS and 1200 read/write throughput, Persistent Disk is well-equipped to handle demanding workloads.
Availability: Persistent disks utilize redundant data storage within a specific region to ensure high availability and durability. This redundancy approach helps minimize the effects of planned and unexpected infrastructure failures, ultimately protecting applications and reducing the risk of data loss.
Encryption: Persistent disk data is encrypted using system-managed encryption keys by default. Customers also have the option to use customer-managed encryption keys through Google Cloud Key Management Service (KMS). It is important to note that when a disk is deleted, its encryption keys are also deleted, ensuring that unauthorized parties cannot access data.
Data integrity: Persistent disks maintain data integrity by utilizing a strong redundancy system to store data redundantly across various zones or regions. This approach guarantees durability and protection against data loss in zone or regional failures. By dispersing data across numerous locations, persistent disks enhance resilience and diminish the possibility of downtime or data corruption.
Snapshots: Persistent disk allows users to create snapshots to back up data from zonal or regional persistent disks. By default, these snapshots are geo-replicated and available for restoration in all regions.
Decouple compute and storage: Storage is separated from virtual machine instances, allowing users to detach or move disks individually. This guarantees data preservation in the event of instance deletion, giving the flexibility to manage your storage resources efficiently.

The features above make Persistent Disk suitable for a wide range of situations, such as

Applications that depend on IaaS solutions, such as the cloud engine Kubernetes Engine.
Application engine flexible deployments
Databases
LOB applications
Enterprise file storage

Types Of Persistent Disks

Now that we have a comprehensive idea of persistent disks' features and use cases let's proceed with the different types of Persistent Disks in GCP.

Before discussing the details of persistent disk types, we must understand their primary categories: HDD and SSD. Both are persistent disks provided as storage choices for virtual machine instances.

HDD (Hard Disk Drive): Traditional storage devices, such as HDDs, utilize spinning magnetic disks for data storage. They provide higher capacity at a lower cost per gigabyte when compared to SSDs.
HDDs are suitable for workloads that require extensive storage capacity and can accommodate slower data access speeds, such as archival storage, data backup, and file storage.

HDD-Based Persistent Disk

Standard Persistent Disks (pd-HDD): Standard Persistent Disks are block storage specifically crafted to ensure consistent and dependable performance for virtual machine instances. They deliver well-rounded performance across various workloads, making them suitable for most general-purpose applications. They are based on HDD and are commonly recommended for workloads that need sequential I/O. They are one of the most cost-effective solutions that can be used for workloads that do not have high-performance requirements.

SSD (Solid State Drive): SSDs, on the other hand, utilize flash memory for data storage, resulting in faster data access speeds and lower latency than HDDs.
They offer superior performance, especially regarding IOPS and latency, making them well-suited for latency-sensitive workloads and applications that demand high-performance storage.
Commonly used for databases, transactional applications, virtual machine boot disks, and any workload that necessitates quick and responsive storage.

SSD-based Persistent Disk

Balanced Persistent Disks (pd-balanced): Offering a balance between cost and performance, balanced persistent disks are considered for general-purpose workloads over pd-ssd and pd-extreme disk options.

Performance Persistent Disk (pd-ssd): Performance persistent disks have the highest storage performance compared to balanced persistent disks, offering latencies in the single-digit millisecond range. Engineered for maximum speed and responsiveness, these disks are designed to meet the needs of even the most demanding workloads by providing lightning-fast data access.

Extreme Persistent Disks( pd- extreme): Alongside offering high performance, extreme persistent disks provide the flexibility of fine-tuning and provisioning target IOPS. These persistent disks are recommended for workloads similar to SAP-HANA and enterprise databases.

Aside from these performance-based categories, persistent disks are categorized based on availability into zonal and regional disks.

Zonal persistent disk offers robustness against hardware failure within a single zone by storing multiple replicated copies of data locally. On the other hand, regional persistent disks enhance resilience against zonal outages by replicating data across multiple zones in the same region.

Zonal disks are suitable for workloads that require low-latency access to a specific zone. Regional disks, on the other hand, offer increased availability by spanning multiple zones within the same region.

Different Functions Of Persistent Disks

Persistent disks are essential for storing and managing data for virtual machine instances on the Google Cloud Platform. They offer essential features such as reliability, scalability, and data protection, which are crucial for modern cloud computing environments. Let us look at the different functions of persistent disks in GCP:

Data Storage: Persistent disks are primarily used for storing data persistently. This data may consist of operating system files, application binaries, user data, databases, and any other necessary files for virtual machine instances running on Google Cloud.
Boot Disk: Persistent disks can also function as boot disks for virtual machine instances. When creating a virtual machine, you can designate a persistent disk as the boot disk, enabling the VM to boot from that disk.
Data Persistence: Persistent disks guarantee that data remains intact even after the virtual machine instance is stopped, restarted, or terminated. This means you can halt or delete a VM without losing the data stored on its persistent disks.
Scalable Storage: The ability to easily resize persistent disks allows for flexibility in accommodating changing storage needs without data loss or disruptions to VM operations.
Snapshot Backup: Persistent disks enable the creation of point-in-time backups of disk data, expanding options for data protection, disaster recovery, creating new disk instances, or migrating data across regions or zones.
Data Warehousing: Persistent disks allows users to consistently store data in GCP.
Data Transfer: Persistent disks facilitate efficient data transfer between virtual machine instances and Google Cloud services, offering reliable and high-performance storage for applications with frequent data access and transfer requirements.

Tips To Optimize Persistent Disk

Follow the tips mentioned below to improve persistent disk performance.

Selecting the right disk type: Evaluate your workload requirements for performance, durability, and cost-effectiveness to choose the appropriate disk type, such as Standard, SSD, Balanced, or Extreme.
Increase disk size: If your disk requirement has changed since the disk was first provisioned, you can change the disk size without any downtime for the associated VM.
Increasing disk size on the fly offers valuable flexibility. You can start with a smaller disk and adapt configurations as your application's usage grows. This leverages the cloud's scalability, ensuring you only pay for the resources you need at any given time, optimizing cost-efficiency and resource utilization.

Improve performance: The performance of a persistent disk depends on numerous factors, such as disk size, type of VM, type of disk, etc. Moreover, it also depends on the CPU. Hence, you can attach the disk to a VM with a higher number of VCPUs to improve the performance of your persistent disks.
Moreover, when determining disk performance for your workload, it is essential to consider the network egress cap associated with each VM type. The egress cap dictates that disk write operations are limited to 60% of the total maximum egress bandwidth, measured in Gbps. This factor plays a significant role in optimizing disk performance and should not be overlooked.

Implement cloud monitoring: By closely monitoring various key metrics, such as disk throughput, IOPS, disk space utilization, latency, queue length, and idle time, you can effectively manage persistent disks proactively. This allows you to quickly identify potential issues, optimize disk usage, and ensure optimal performance and reliability of your storage infrastructure.
With Cloud Monitoring, you can easily customize dashboards, establish alerts, and view the progression of disk performance and usage trends. By utilizing the data-driven insights from Cloud Monitoring, which are based on performance and usage patterns, you can optimize their disk configuration for improved deployment results. This ensures peak performance, efficiency, and cost-effectiveness.

Share persistent disks between VMs: The Persistent Disk feature allows disk sharing between two virtual machines, allowing them to operate in a multi-writer mode where both machines can read from and write to the disks simultaneously. This functionality is essential for creating large, distributed, highly available storage systems. By allowing multiple virtual machines to access the same disk simultaneously, Persistent Disk enables users to build resilient storage architectures that can efficiently and reliably meet various workload requirements.
Use high I/Q queue depth: Compared to Local SSD disks, Persistent Disks often have higher latency. While Persistent Disks can provide impressive IOPS and throughput, ensuring that a sufficient number of I/O requests are processed simultaneously is important. This simultaneous processing of I/O requests is known as the I/O queue depth. To fully maximize the performance of Persistent Disks, it is essential to maintain a high I/O queue depth.
Set limit for heavy I/O loads to a maximum span: A "span" is a consecutive range of logical block addresses assigned on a single physical disk. To achieve optimal performance under high I/O loads, limiting operations to a specific maximum span is often beneficial.
By restricting heavy I/O activities to a designated maximum span, disk access is enhanced, and seek times are minimized. This targeted strategy facilitates quicker data retrieval and processing, maximizing disk performance during intensive I/O operations.

Use GCP cost optimization tools: GCP cost optimization tools refer to software applications, platforms, or solutions businesses and individuals use to analyze, manage, and decrease expenses across different operational areas. These tools have functionalities such as monitoring expenditures, pinpointing areas of excessive spending, recommending cost-saving strategies, and improving resource allocation, ultimately enhancing efficiency and profitability. There are many GCP cost optimization tools available in the market, and you select one out of each category, which would be: tools that focus on compute resource optimizations, tools that focus on storage resource optimization and tools that enhance reporting and visibility.

Persistent Disk Pricing

Persistent disks, including Standard, SSD, and balanced disks, are priced according to the provisioned space and include I/O operations. It is important to assess your I/O needs carefully when choosing the disk size, as performance is directly related to disk size.

Extreme persistent disks are priced based on provisioned space and the number of provisioned IOPS per disk. Upon deleting a persistent disk, you will no longer incur charges for that disk, leading to cost-effective resource management.

To know a detailed breakdown of the price, you can click here.

Optimize Your Persistent Disk With Automated Monitoring & Scaling Of Storage Resources

You need effective GCP cost optimization best practices to ensure your persistent disk stays cost-efficient without compromising performance.

Why do we need to optimize persistent disks?

Block storage, aka GCP persistent disks, accounts for a significant portion of the overall cloud bill. While the Virtana report pointed towards the concerning increase in storage cost, we did an independent study to find out how storage cost impacts the overall cloud bill. After investigating over 100 clients using cloud service providers like GCP, we found out that.

Aside from significantly contributing to the overall cloud bill, disk utilization for root volume, self-hosted databases, and application disks is severely low.
Despite prominent overprovisioning, organizations face at least one downtime per quarter.

Upon further investigation, we've identified that enhancing the buffer to ensure optimal system responsiveness during periods of heightened or unpredictable demand necessitates the following actions:

Streamlining Manual Processes: The process involves four manual touchpoints, requiring the DevOps team to navigate through three different tools to manage block storage.
Downtime Considerations: Certain cloud providers require a minimum downtime of 4 hours to shrink 1 TB of disk space, while an upgrade to the disk necessitates a 3-hour downtime.
Idle volumes: These creates significant impact on system efficiency and cost effectiveness.
Disk utilization: Low disk utilization is one of the causes of inflated persistent disk costs.
Scaling Wait Time: After each scaling process, there is a waiting period of at least 6 hours before initiating the following scaling action.

Despite the challenges outlined, organizations often prioritize overprovisioning storage resources over-optimizing storage due to several compelling reasons:

Custom Tool Development: Addressing storage optimization limitations often necessitates developing custom tools, which is a time-consuming and resource-intensive endeavor. This approach demands significant DevOps efforts and entails considerable time investment, making it daunting for many organizations.
CSP Tool Limitations: Relying solely on Cloud Service Providers (CSPs) tools may result in inefficient and resource-intensive processes. Continuous storage optimization using CSP tools alone can be impractical for day-to-day operations, potentially leading to suboptimal resource allocation and operational inefficiencies.
Lack of Live Shrinkage Process: Major cloud service providers like AWS, Azure, or GCP lack a live shrinkage process, necessitating manual intervention for resizing storage volumes. While there are ways to achieve this manually, the process involves stopping instances, taking snapshots, and mounting new volumes, resulting in downtime and leaving room for errors and misconfigurations.

These challenges significantly impede the regular functioning of organizations, compelling them to allocate excessive resources to mitigate potential risks. However, this approach incurs substantial expenses and operational consequences, ultimately reflecting inefficiency in resource utilization as organizations are burdened with paying for unused resources.

There is an urgent need for a way to stop this practice of overprovisioning as it is leading to a significant financial drain. This necessitates cloud cost automation solutions that can

Monitoring of idle/unused and overprovisioned resources
Automate scaling of storage resources

The first step in the ultimate guide to GCP cost optimization is monitoring and promptly finding solutions for idle/unused and overprovisioned resources. If you have been discovering idle resources manually or using monitoring tools to find them, we suggest you stop with these processes right now.

Why?

Using manual discovery processes or implementing monitoring tools has inherent limitations, such as increased DevOps workloads or additional deployment expenses. As storage environments become more intricate, the possibility of issues escalating to unmanageable levels becomes more evident.

The Lucidity Storage Audit addresses the challenge through automated streamlining, enabled by a user-friendly executable tool. This solution allows users to gain valuable insights into their disk health and utilization, allowing for cost optimization and proactive downtime prevention, all without manual interventions' burdensome complexities.

Our storage audit solution, Lucidity, features an intuitive executable process designed to deliver the following key insights:

Cost Optimization Insights: Lucidity's analysis enhances overall cost-effectiveness by identifying and highlighting significant opportunities to reduce storage expenses, empowering organizations to make informed financial decisions.
Efficiency Assessment: Uncover and address inefficiencies from idle or unused resources and over-provisioning. Lucidity streamlines your storage environment, optimizing it for maximum cost-effectiveness and resource utilization.
Proactive Bottleneck Detection: Lucidity's proactive approach identifies potential performance bottlenecks before they escalate, safeguarding against downtime that could adversely affect finances and reputation.

By leveraging Lucidity's insights, organizations can confidently identify idle or unused resources and take appropriate actions such as deletion or right-sizing, thereby enhancing operational efficiency and cost savings.

With Lucidity Block Storage Auto-Scaler, you can eliminate the possibility of overprovisioning and underprovisioning. An industry-first autonomous storage orchestration solution, Lucidity automates the shrinkage and expansion of storage resources within minutes of the requirements being raised.

Deployed in just a few clicks, Lucidity Bock Storage Auto-Scaler sits atop your block storage, and the cloud service provider offers the following benefits.

Automatic Expansion and Shrinkage: The Lucidity Auto-Scaler is designed to automatically adjust disk scaling in just 90 seconds, efficiently managing large datasets. Unlike traditional block storage volumes, which have a limit of around 8GB per minute (125MB/sec), the Auto-Scaler surpasses this restriction by strategically maintaining a robust buffer.
Storage Cost Savings up to 70%: Lucidity's automated scaling capability revolutionizes operational efficiency, leading to substantial cost reductions of up to 70% in storage expenses. By employing the Auto-Scaler, users can optimize disk usage, increasing it from a mere 35% to an impressive 80%. Furthermore, Lucidity offers an ROI calculator to forecast potential savings upon installation of the Auto-Scaler.

Lucidity cloud ROI calculator for spend optimizaton

No Zero Downtime: Lucidity's Block Storage Auto-Scaler ensures uninterrupted performance despite unpredictable demand spikes or activity-level fluctuations. Through dynamic adjustment of storage resources, the Auto-Scaler maintains optimal operations without disruptions.
In contrast to manual shrinkage methods that entail configuration downtime, Lucidity's Auto-Scaler seamlessly expands and shrinks resources, ensuring uninterrupted functionality. This intelligent and automated solution facilitates agile adjustments in response to evolving workloads, guaranteeing smooth operations at all times.

Customized Policy: With Lucidity's "Create Policy" feature, users can effortlessly tailor system settings to meet specific performance requirements. This feature allows for creating personalized protocols, where users can define key parameters such as policy name, desired utilization, maximum disk size, and buffer size.

Lucidity custom olicy feature for maintaining buffer nd eliminating downtime

Once established, Lucidity diligently adheres to these policies, delivering precise instance management that aligns perfectly with user-defined specifications. The extensive customization options offered by Lucidity enhance both performance and cost-effectiveness, providing users with a dependable and bespoke storage solution tailored to their unique needs.

You can read about how Lucidity helps with cloud cost optimization on the blog here.

We hope you have a fair understanding of GCP’s persistent disks by now. If you want to optimize your persistent disk for storage, you can reach out to Lucidity for a demo. We will help you understand how automation can help you save costs associated with storage usage and wastage.