Snowflake: virtual warehouses

This piece covers Snowflake virtual warehouses. A virtual warehouse is one or more clusters of compute resources used to process queries and perform other Data Manipulation Language (DML) operations. 

Snowflake supports multiple warehouses for different uses, such as data loading and analysis. Multiple warehouses can also be used to maintain separate development, testing and production environments. 

Warehouses come in different t-shirt sizes. Extra small, small, medium, large, and so on. The size determines the number of servers in each cluster in the warehouse.  

Start by creating a warehouse using the web interface. Navigate to the Warehouses page and click Create. 

Create Virtual Warehouse].png

PRO TIP: Giving the warehouse a descriptive name helps identify its intended purpose. 

Then choose a size. If multi-cluster environments are enabled with the Snowflake account, this is where a maximum and a minimum number of clusters for warehouses can be set. 

The auto-suspend and auto-resume features control warehouse behaviour when it's running or suspended. Auto-suspend stops a warehouse if it sits idle for a specified period. On the other hand, auto-resume starts a suspended warehouse when queries are pushed to it. Why use these features? Utilizing compute resources only when needed consumes fewer Snowflake credits. Shutting down warehouses when they are not in use, therefore, helps conserve credits and control costs

How to choose the right size for a warehouse? Factors such as data size and query complexity impact the performance of a query. Each increase in a warehouse size, such as from small to medium, doubles the number of servers assigned to a task. Generally, queries scale linearly as you increase warehouse size. Determine an optimal size by loading real data into Snowflake and running a representative set of queries using different sized warehouses. Experimentation like this allows one to directly evaluate query performance using real data.

When a warehouse does not have enough resources to concurrently process all the queries submitted to it, incoming queries are queued and completed as resources become available. Snowflake provides two options for increasing compute resources. Warehouse resizing and multi-cluster warehouses

Resize a warehouse if queries are taking too long, or data loading is slow. Queries in progress at the time you resize the warehouse do not take advantage of the size increase. All new and subsequent queries however, start using the additional servers as soon as they are provisioned. Multi-cluster warehouses use clusters of servers to handle fluctuating numbers of concurrent queries, such as during peak hours. As the load increases, the warehouse automatically starts more clusters to prevent queries from queuing. When the additional clusters are no longer needed, it shuts down. Multi-cluster warehouses are a Snowflake Enterprise Edition feature. In summary, resizing warehouses provides performance benefits for slow running queries and data loading, while multi-cluster warehouses dynamically adapt to increases or decreases in the number of queries. 

How does warehouse usage translate into costs? All warehouse costs are based on Snowflake credits. A warehouse only consumes credits when it runs, so you only pay for what you use. The number of credits consumed by a running warehouse is based on its size, the number of clusters, and how long it runs. Each time a warehouse is resumed or increased in size, your account is billed for one minute of usage. After the first-minute billing is calculated per second

This concludes the guide to virtual warehouses.

Previous
Previous

Snowflake: Data Loading

Next
Next

Snowflake Web User Interface