Google Cloud Platform Overview for a Data Professional - Part 1
As a Data Professional, what should I know about Google Cloud Platform (GCP) cloud services and is this something I should consider for running SQL Server or other data platforms?
This tutorial will provide an overview of the GCP core infrastructure for a data professional, explore the available products, and how they interact.
Virtual Private Cloud (VPC) is Google's network in the cloud, built on top of its actual physical network. VPCs are global resources and cross all available regions in the world. One network exists in all regions at the same time. A region contains multiple zones, and the subnetworks within can span zones from that region. Inside a network, resources can be divided into regional subnetworks. The VPCs can be shared between projects or peered with other VPCs in other projects.
One thing to keep in mind is that every subnet has four reserved IP addresses in its primary range. The first two, second-to-last and the last address. They are used for the network and subnet's gateway, broadcast, and future use. In the secondary IP ranges, there are no IP reservations.
As you can see below, subnet-1 is an IP range, and we can use IP addresses from that range. Although there are virtual machines in different zones, communication between them is possible because they are using the same subnet IP address.
There are Google Cloud resources that can have internal and external IP addresses.
- An internal IP is allocated from the subnet where the resource resides, and resources, like two VM instances from the same VPC, can communicate with each other using their internal IP.
- External IP addresses are optional. They can be ephemeral (assigned from a pool) or static. These addresses are used for internet facing communications or to reach resources in a VPC from a different region.
Here is the network traffic pricing for ingress and egress traffic: All networking pricing.
Compute Engine is an IaaS solution that lets us run VMs in the cloud. We can choose between predefined or custom machine types, configure the desired memory and CPU, the type of disk, and whether we want to use standard hard drives (HDD), flash storage using SSDs, local SSDs, or a combination of them. We can configure networking by adding network interfaces, choosing the OS between Windows and Linux, and giving a flexible solution to select the desired setup. The CPU choice will influence the network throughput, scaling at 2 Gbits /second for each CPU core. However, there is an exception: instances with 2 or 4 CPUs will receive up to 10 Gbits /second of bandwidth.
To see the machine types, purposes, and comparisons between them, review the Machine families resource and comparison guide.
A sole-tenant node is a physical Compute Engine server isolated from other VMs of a different customer or project where you have a dedicated workload. Sole-tenant nodes should be considered when a dedicated workload separated from other VMs is needed and you do not want to share the VM with other workloads, or a regulatory compliance requirement must be met. You can also bring existing OS licenses, if required.
Below you can see the difference between a normal and sole-tenant host with multiple VMs.
Compute Engine can migrate the virtual machine to another host if a maintenance event occurs to prevent the application from experiencing downtime. Also, if the VM is terminated because of a crash or other maintenance events, the instance is automatically restarted. A VM's availability policy determines how the instance behaves in such events. These policies can, of course, be changed as desired.
You can also use snapshots to backup disks or move data between VMs. Snapshots are incremental and automatically compressed, but remember that they are unavailable for local SSDs.
Cloud Storage Options
We mentioned that we could have multiple disk types for a Compute Engine. We usually go with Persistent disks, which can have three options: Standard (HDD), SSD, or local SSD. Choosing between them is a matter of price vs. performance because they have different price structures.
Persistent disks are durable network storage device solutions that the instances can access like a physical disk in a server. The data on each persistent disk is spread across numerous physical disks, and the redundancy, performance, and data distribution are automatically handled. The persistent disk's location is separated from the virtual machine instances, which gives us the ability to detach, move the disk, or to keep our data even after the instance is deleted. Standard and SSD persistent disk performance scales automatically with size, and there's no downtime for resizing operations.
SSDs provide more IOPS/dollar than standard disks, which gives us more capacity/dollar. Local SSDs are attached to the physical hardware. This has the benefit of higher throughput and lower latency than SSD persistent disks. The thing to consider is that the data stored on local SSDs persists only until the instance is stopped or deleted. A local SSD can be used as a swap disk, pagefile, or for temporary data. Local SSD is a good choice for a SQL Server tempdb database.
Check out these links for more information:
Compute Engine encrypts all data at rest by default. GCP handles and manages the encryption automatically without any interaction from us. However, for more control, you can manually manage the encryption by either using Cloud Key Management Service to create and manage key encryption keys (known as customer-managed encryption keys), or you can create and manage your own encryption keys (known as customer-supplied encryption keys).
Compute Engine uses a per-second billing, with a minimum of 1 minute. So, if the virtual machine runs for 30 seconds, you will be billed for 1 minute of usage. After 1 minute, instances are charged in 1-second increments. It uses a resource-based pricing model, where each resource, like CPU or RAM, is billed separately. After we create Compute Engine instances of a particular type, the bill will contain information for the individual CPUs and memory consumption.
Because of the resource-based pricing, Compute Engine has sustained use discounts that can be applied to all the VM machine type usage in a region rather than individual machine types.
If you have a stable and predictable workload, you can purchase a certain number of CPUs and RAM for a discount in return for committing to a usage of 1 or 3 years. The discount can go up to 70% for some machine types.
The discount can increase even more (to 90%) for using preemptible VM instances or their latest version, Spot VMs. This type of machine has some limitations and can be stopped by Compute Engine if resources are needed. If we have workloads that are fault-tolerant and can tolerate possible VM stoppage, it could be considered. More information about preemptible VM\Spot VMs review this What is a preemptible instance?
Check out these links for up-to-date information regarding Compute Engine:
All pricing for Compute Engine is provided on the Google Cloud website.
For a general overview of billing, stay tuned for a future article in this series.
- In part 2 we will continue to discuss GCP data storage options and available database services.
- More articles about Google Cloud
About the author
View all my tips
Article Last Updated: 2023-03-20