Chris Hastie | InterWorks https://interworks.com/people/chris-hastie/ The Way People Meet Tech Wed, 16 Jul 2025 13:42:45 +0000 en-US hourly 1 https://wordpress.org/?v=6.8.2 Configuring Private Connectivity between Snowflake and Azure Storage https://interworks.com/blog/2025/01/13/configuring-private-connectivity-between-snowflake-and-azure-storage/ Mon, 13 Jan 2025 17:56:08 +0000 https://interworks.com/?p=65838 Snowflake recently started supporting a whole range of new private connectivity functionality when interacting with Azure, and as a security-conscious architect I really couldn’t be happier. We’ve already seen private connectivity support between Azure and Snowflake in other ways, such as the following: Configure Azure...

The post Configuring Private Connectivity between Snowflake and Azure Storage appeared first on InterWorks.

]]>

Snowflake recently started supporting a whole range of new private connectivity functionality when interacting with Azure, and as a security-conscious architect I really couldn’t be happier.

We’ve already seen private connectivity support between Azure and Snowflake in other ways, such as the following:

However, these all relate to authentication and none of them specifically targeted native data ingress/egress between Snowflake and Azure storage. This is where the new private connectivity support for connecting to storage containers and queues comes in.

Gone are the days of needing to whitelist the entire subnets for your Snowflake region. Gone are the days of blindly trusting that the traffic is secure simply because Azure have stated that traffic between Azure-hosted objects will always go via the Azure backbone. That’s not to say that these two things aren’t both valuable of course, and if you don’t have strict enough security requirements to demand Snowflake’s Business Critical edition then the links above are certainly sufficient for most traffic. But in the worlds of public health, banking or any other highly-secure environments, the above links aren’t always sufficient to make security specialists do the proverbial jump for joy.

So how do we make use of this fantastic new functionality? How do we ensure that any traffic between our Azure-hosted Snowflake account and our Azure storage containers/queues is secured using private connectivity? And what does this mean for network-level whitelisting? Keep reading if you’d like to know!

Requirements

Before anything is configured, the following requirements must be met:

  • Your Snowflake account must be using the Business Critical edition or higher.
  • You must have ACCOUNTADMIN access in Snowflake.
  • You must have sufficient access to the Azure storage account to be able to approve new private endpoints.

Please note that at this time, this functionality is in PUBLIC PREVIEW and Snowflake’s official recommendation is that any PUBLIC PREVIEW features should not be relied on as Production-grade objects.

Also, at time of writing, each Snowflake account is limited to only five outbound private endpoints by default. This number can be increased by contacting Snowflake Support.

Configuration

If you meet the requirements above, then you’re good to go. The rough steps we will take are as follows:

  1. Find and note down the resource ID for the Azure storage account
  2. Configure private connectivity for any storage integrations
    1. Provision a private endpoint for traffic between Snowflake and any blobs within the Azure storage account
    2. Approve the new private endpoint in the Azure storage account
    3. Update any existing storage integrations to leverage the privatelink endpoint
  3. Configure private connectivity for any notification integrations
    1. Provision a private endpoint for traffic between Snowflake and any queues within the Azure storage account
    2. Approve the new private endpoint in the Azure storage account
    3. Recreate any existing notification integrations to leverage the privatelink endpoint
    4. Recreate any objects that leverage the notification integrations that have been recreated in the previous step
  4. OPTIONAL: Remove any subnet-level whitelisting you previously applied that allowed Snowflake to access your storage account without private connectivity

1. Find and Note Down the Resource ID for the Azure Storage Account

This step is easy as there are multiple ways to track down the resource ID for a storage account in Azure. My personal favourite is to navigate to the storage account itself within the Azure Portal and select “JSON View” in the upper right corner:

At the top of the “JSON View” will be an easily copied resource ID:

For our example, we’ll use the following storage account and resource ID:

Storage account: mystorageaccount
Resource ID: /subscriptions/a123bc45-67de-89f0-1234-ab5678cd9ef0/resourceGroups/rg-resource-group/providers/Microsoft.Storage/storageAccounts/mystorageaccount

2. Configure Private Connectivity for any Storage Integrations

Whilst it is possible to create an external stage that connects to an Azure storage container without a storage integration, I would highly advise against it. In my opinion, storage integration objects should always be used when configuring external stages as this avoids the need to note down any authentication information. In general, many Azure administrators have started removing the capability to configure these kinds of authentication methods at all, and prefer the route taken by the storage integration method.

If desired, you can learn more about storage integrations between Snowflake and Azure here: Configuring Storage Integrations Between Snowflake and Azure Storage

To configure private connectivity for any storage integrations, we complete the following steps:

  1. Provision a private endpoint for traffic between Snowflake and any blobs within the Azure storage account
  2. Approve the new private endpoint in the Azure storage account
  3. Update any existing storage integrations to leverage the privatelink endpoint

2a. Provision a private endpoint for traffic between Snowflake and any blobs within the Azure storage account

To complete the first step, we must execute the following command using the ACCOUNTADMIN role in Snowflake:

select SYSTEM$PROVISION_PRIVATELINK_ENDPOINT(
  '<Resource ID for storage account>', -- This is the resource ID for the storage account
  '<Storage account name>.blob.core.windows.net', -- This is the host name for blob storage within the storage account
  'blob' -- This is the subresource for blob storage within the storage account
)
;

From step 1, we have already retrieved the following values:

select SYSTEM$PROVISION_PRIVATELINK_ENDPOINT(
  '/subscriptions/a123bc45-67de-89f0-1234-ab5678cd9ef0/resourceGroups/rg-resource-group/providers/Microsoft.Storage/storageAccounts/mystorageaccount', -- This is the resource ID for the storage account
  'mystorageaccount.blob.core.windows.net', -- This is the host name for blob storage within the storage account
  'blob' -- This is the subresource for blob storage within the storage account
)
;

Executing the code in Snowflake will give a response similar to the following:

2b. Approve the new private endpoint in the Azure storage account

After completing the previous step, Snowflake will have provisioned a private endpoint within its own infrastructure, and this will be pending approval within the target storage account. Within the Azure Portal, navigate to the “Networking” pane and select the “Private endpoint connections” tab. You will now see a pending entry for the new private endpoint, which you can approve:

When approving the connection you are given the option of populating a description. I would highly recommend populating this value with information that indicates which Snowflake account the connection is coming from, and what subresource it is accessing. For example, you could enter the following description:

To confirm that everything is working, execute the following command in Snowflake using the ACCOUNTADMIN role and review the “STATUS” field of the output:

select
    parse_json("VALUE"):"host"::string as "HOST"
  , parse_json("VALUE"):"status"::string as "STATUS"
  , parse_json("VALUE"):"subresource"::string as "SUBRESOURCE"
  , parse_json("VALUE"):"endpoint_state"::string as "ENDPOINT_STATE"
  , parse_json("VALUE"):"provider_resource_id"::string as "PROVIDER_RESOURCE_ID"
  , parse_json("VALUE"):"snowflake_resource_id"::string as "SNOWFLAKE_RESOURCE_ID"
from table(flatten(
    input => parse_json(system$get_privatelink_endpoints_info())
  ))
;

This will return a table similar to the following:

We can see that the status is “Approved” so our endpoint should be good to go!

2c. Update any existing storage integrations to leverage the privatelink endpoint

This final step is simple for storage integrations. Simply enable the “use_privatelink_endpoint” option for the storage integration.

If you have not followed the previous steps to provision and approve the private endpoint(s) between your Snowflake account and the blob subresource(s) of the storage account(s) that you have listed under the “storage_allowed_locations” option, then enabling the “use_privatelink_endpoint” option for your storage integration will break it, since it will not be able to access the underlying storage containers without a private endpoint.

To create a new storage integration, execute a command in Snowflake similar to the following using the ACCOUNTADMIN role:

create storage integration if not exists "MY_STORAGE_INTEGRATION"
  type = EXTERNAL_STAGE
  storage_provider = 'AZURE'
  enabled = TRUE
  azure_tenant_id = 'a123bc45-67de-89f0-1234-ab5678cd9ef0'
  storage_allowed_locations = (
      'azure://mystorageaccount.blob.core.windows.net/my-container-1/'
    , 'azure://mystorageaccount.blob.core.windows.net/my-container-2/'
  )
  use_privatelink_endpoint = TRUE
  comment = 'Storage integration that leverages a private endpoint to connect to containers within the storage account mystorageaccount.'
;

To modify an existing storage integration, execute a command in Snowflake similar to the following using the ACCOUNTADMIN role:

alter storage integration "MY_STORAGE_INTEGRATION"
set
  use_privatelink_endpoint = TRUE
;

Remember to apply any desired Role-Based Access Control steps here to allow roles other than ACCOUNTADMIN to leverage the storage integration. For example, you may wish to grant USAGE on the storage integration to the role SYSADMIN.

3. Configure Private Connectivity for any Notification Integrations

Contrasting with storage integrations which are option (though still strongly advised), a notification integration is strictly necessary if you wish to leverage Snowflake’s native functionality for automated data ingestion from Azure storage. This includes performing the following actions when new data lands in the underlying storage account:

  • Automatically ingesting new data into a table using Snowpipe
  • Automatically updating directory tables on stages
  • Automatically updating external tables

If desired, you can learn more about notification integration and automated ingestion between Snowflake and Azure here: Automated Ingestion from Azure Storage into Snowflake via Snowpipe

To configure private connectivity for any notification integrations, we complete the following steps:

  1. Provision a private endpoint for traffic between Snowflake and any queues within the Azure storage account
  2. Approve the new private endpoint in the Azure storage account
  3. Recreate any existing notification integrations to leverage the privatelink endpoint
  4. Recreate any objects that leverage the notification integrations that have been recreated in the previous step

3a. Provision a private endpoint for traffic between Snowflake and any queues within the Azure storage account

To complete the first step, we must execute the following command using the ACCOUNTADMIN role in Snowflake:

select SYSTEM$PROVISION_PRIVATELINK_ENDPOINT(
  '<Resource ID for storage account>', -- This is the resource ID for the storage account
  '<Storage account name>.queue.core.windows.net', -- This is the host name for queues within the storage account
  'queue' -- This is the subresource for queues within the storage account
)
;

From step 1, we have already retrieved the following values:

Storage account: mystorageaccount
Resource ID: /subscriptions/a123bc45-67de-89f0-1234-ab5678cd9ef0/resourceGroups/rg-resource-group/providers/Microsoft.Storage/storageAccounts/mystorageaccount

These values can be entered directly into the SQL statement to provision the private endpoint:

select SYSTEM$PROVISION_PRIVATELINK_ENDPOINT(
  '/subscriptions/a123bc45-67de-89f0-1234-ab5678cd9ef0/resourceGroups/rg-resource-group/providers/Microsoft.Storage/storageAccounts/mystorageaccount', -- This is the resource ID for the storage account
  'mystorageaccount.queue.core.windows.net', -- This is the host name for queues within the storage account
  'queue' -- This is the subresource for queues within the storage account
)
;

Executing the code in Snowflake will give a response similar to the following:

 

3b. Approve the new private endpoint in the Azure storage account

After completing the previous step, Snowflake will have provisioned a private endpoint within its own infrastructure, and this will be pending approval within the target storage account. Within the Azure Portal, navigate to the “Networking” pane and select the “Private endpoint connections” tab. You will now see a pending entry for the new private endpoint, which you can approve.

It’s possible you will also see the connection you previously configured for the storage integration.

 

When approving the connection you are given the option of populating a description. I would highly recommend populating this value with information that indicates which Snowflake account the connection is coming from, and what subresource it is accessing. For example, you could enter the following description:

To confirm that everything is working, execute the following command in Snowflake using the ACCOUNTADMIN role and review the “STATUS” field of the output:

select
    parse_json("VALUE"):"host"::string as "HOST"
  , parse_json("VALUE"):"status"::string as "STATUS"
  , parse_json("VALUE"):"subresource"::string as "SUBRESOURCE"
  , parse_json("VALUE"):"endpoint_state"::string as "ENDPOINT_STATE"
  , parse_json("VALUE"):"provider_resource_id"::string as "PROVIDER_RESOURCE_ID"
  , parse_json("VALUE"):"snowflake_resource_id"::string as "SNOWFLAKE_RESOURCE_ID"
from table(flatten(
    input => parse_json(system$get_privatelink_endpoints_info())
  ))
;

This will return a table similar to the following, which also includes the previously-configured endpoint for the example storage integration:

 

We can see that the status is “Approved” so our endpoint should be good to go!

3c. Recreate any existing notification integrations to leverage the privatelink endpoint

Unlike for storage integrations, this step can be more complex for notification integrations. Again, the intent is to enable the “use_privatelink_endpoint” option for the notification integration. However, with existing notification integrations it is not sufficient to run an ALTER command (at time of writing). Instead, we must recreate existing notification integrations, which can break downstream objects.

Since this involves recreating existing notification integrations instead of just altering them, there will be several downstream impacts. Be sure you are confident with managing the downstream repercussions before recreating your notification integration.

Our testing has also shown that in some situations, the underlying queue in Azure itself needs to be deleted and recreated. This seems to be necessary when a previous notification integration existed that leveraged the same queue.

To mitigate this risk, I would recommend taking the following steps:

  1. Execute a “show grants on integration …” command to determine any Role-Based Access Control grants that will need to be added again after the notification integration is recreated
  2. Determine any pipes, directory tables, external tables, etc. that leverage the notification integration, as these may need to be recreated. Unfortunately, at time of writing these do not appear to be included in the output of the object dependencies view, so your approach to determine these objects will be more complex. One option is to run SHOW commands for all pipes, external tables and directory tables in your account and then parse the DDL of each to determine the notification integration.

If you have not followed the previous steps to provision and approve the private endpoint(s) between your Snowflake account and the blob subresource(s) of the notification account(s) that you have entered under the “azure_storage_queue_primary_uri” option, then enabling the “use_privatelink_endpoint” option for your notification integration break the it, since it will not be able to access the underlying queue without a private endpoint.

To recreate a notification integration so that it leverages private connectivity, execute a command in Snowflake similar to the following using the ACCOUNTADMIN role:

create or replace notification integration "MY_NOTIFICATION_INTEGRATION"
  type = QUEUE
  notification_provider = 'AZURE_STORAGE_QUEUE'
  enabled = TRUE
  azure_tenant_id = 'a123bc45-67de-89f0-1234-ab5678cd9ef0'
  azure_storage_queue_primary_uri = 'https://mystorageaccount.queue.core.windows.net/my-queue'
  use_privatelink_endpoint = TRUE
  comment = 'Notification integration that leverages a private endpoint to connect to the queue "my-queue" within the storage account mystorageaccount.'
;

Now that the notification integration has been created, be sure to restore any downstream RBAC and objects!

4. OPTIONAL: Remove any Subnet-Level Whitelisting you Previously Applied that Allowed Snowflake to Access your Storage Account Without Private Connectivity

If you have previously configured your storage account to block public network access and still allowed Snowflake to connect to it, then you likely followed Snowflake’s steps to whitelist the entire subnets for your Snowflake region. Now that you have configured private link connectivity between Snowflake and your storage account, you no longer need this!

Within the Azure Portal, navigate to the “Networking” pane and view the “Firewalls and virtual networks” tab. You may see an entry here for the “deployment-infra-rg-vnet,” allowing the subnets for your Snowflake region. To remove the whitelisting, select the three-dot menu on the far right and select “Remove.”

 

Summary

So there we have it. We have deployed a clearly secure mechanism to allow Snowflake to interact with both containers and queues within our storage account, all through private connectivity.

If you’ve just finished this and you’re interested in taking further steps to secure your Snowflake account, be sure to check out these two articles:

The post Configuring Private Connectivity between Snowflake and Azure Storage appeared first on InterWorks.

]]>
Hardening Snowflake Security with Network Rules and Policies https://interworks.com/blog/2024/09/11/hardening-snowflake-security-with-network-rules-and-policies/ Wed, 11 Sep 2024 13:57:19 +0000 https://interworks.com/?p=64364 An important step towards ensuring the security of your Snowflake account(s) is the appropriate use of network rules and policies to prevent unauthorised access. These policies are leveraged to ensure that access is only allowed through restricted methods and to/from exclusive sources. Without this protection,...

The post Hardening Snowflake Security with Network Rules and Policies appeared first on InterWorks.

]]>

An important step towards ensuring the security of your Snowflake account(s) is the appropriate use of network rules and policies to prevent unauthorised access. These policies are leveraged to ensure that access is only allowed through restricted methods and to/from exclusive sources.

Without this protection, you have one less wall to prevent potential bad actors from accessing your environment. Whilst it may be easy to think that MFA is enough to secure an environment, this is certainly not enough as there are plenty of situations where this can be defeated when it is the only line of defence. What if the bad actor has managed to get hold of an employee’s phone? Or the bad actor is a recent ex-employee who has not yet turned in their phone? What if the access is for a service principal instead of a human user, such as an automated process orchestrated by Matillion or an automated extract refresh in Tableau, where MFA simply is not appropriate as no human will be approving the MFA request when the process runs at 2 a.m. or every 15 minutes?

In all of the above scenarios, reliance on MFA does not ensure your account is secure. Network rules and policies are an excellent way to harden your security and help keep your data safe.

Here are a few example scenarios where network rules and policies are useful:

  • A user should only be able to log in from their home office, which has an explicit IP address.
  • All users who work from the head office should only be able to log in from within the head office, which has a few public-facing IP addresses all within a precise range.
  • A dedicated service principal for a third party (Matillion, Atlan, etc.) should only be able to log in from that third party’s IP address range.
  • SCIM requests via Entra (Active Directory) should only be allowed if they come from the appropriate subnet in the Azure network.
  • OAuth integrations with a third party (dbt Cloud, Dataiku, Thoughtspot, Tableau, etc.) should only be allowed to leverage OAuth tokens from the appropriate third-party network.
  • For a highly secure account leveraging private connectivity with Azure/AWS, all users should only be able to authenticate through an exclusive set of private endpoints, and all traffic over the public internet should be blocked.

This article walks through how to configure your Snowflake account(s) appropriately to handle the situations such as those above. To achieve this, we will cover the following:

  • What are network rules?
  • What are network policies?
  • How to leverage network policies to secure authentication requests and security integrations.
  • A tip for mapping login history to known network rules.

What are Network Rules?

In short, network rules are objects that store information on a type of network traffic for Snowflake. This traffic can be one of two categories:

  • Ingress — Traffic from outside of Snowflake that is coming in to Snowflake.
  • Egress — Traffic that is leaving Snowflake and travelling outside.

This article focuses on the ingress category, and how we can leverage this to restrict inbound traffic to Snowflake to exclusive private endpoints. Note that in Snowflake’s terminology, the category (ingress/egress) is referred to as the “mode,” and this terminology will be used for the rest of this article. If you are interested in the egress mode instead, see our series on Snowflake External Access.

Ingress Mode — Traffic Types

There are three types of inbound traffic that Snowflake can leverage for network rules:

  • IPV4 — The standard internet.
  • AWSVPCEID — Private endpoints established via AWS PrivateLink.
  • AZURELINKID — Private endpoints established via Azure Private Link.

As stated above, this article focuses on the IPv4 category to handle traffic across the standard internet. For guidance on further securing private endpoints, see this more targeted article: Snowflake Network Rules: Restrict Access to Specific Private Endpoints.

When creating a network rule, the type is paired with a list of values. For example, if the type is IPV4 then each value in the list must either be an IP address or a range of addresses provided as a CIDR block. When creating a network rule for private endpoints via AWS or Azure, then each value in the values list must be Snowflake’s internal ID for that private endpoint.

An Important Note

An important piece of information to note is that network rules will only ever act on traffic within their distinctive traffic type. So if a network rule is created for IPv4 that matches a certain IP range, then traffic coming through a different type but in that same IP range will not be matched.

For example, consider a network rule of type IPV4 that matches the IP address 10.10.10.10. Any inbound traffic that travels via the public internet using IPv4 from the IP address 10.10.10.10 will be matched. However, this IP address may belong to a user that is connecting to Snowflake using private connectivity. In that case, the type of their connection would be AZURELINKID or AWSVPCEID, not IPV4. In this scenario, even though the connection comes from the IP address of 10.10.10.10, the traffic does not match against the the IPV4 rule as the traffic is not of that type.

Example Network Rule – IPv4

Here is a simple example for creating an IPV4 network rule:

create network rule if not exists "NET_RULE__ALL_IPV4"
  type = IPV4
  mode = INGRESS
  value_list = ('0.0.0.0/0')
  comment = 'Network rule that can match against the full range of IPv4 addresses, i.e. all public network traffic. This can be included in blocked lists for private connectivity policies'
;

This network rule uses CIDR notation to span the full range of IP addresses. This means that any connection to Snowflake that comes across the public internet would match against this rule. Of course, it is unlikely that you will want to apply this network rule to any allowed lists as it fully open, however this may be leveraged as part of a blocked list to ensure traffic for a given process can only leverage another type (AZURELINKID or AWSVPCEID).

If you are not sure what a CIDR block is, the simplest and quickest description is that a CIDR block is a quick notation for a range of IP addresses. For example, the CIDR block 123.123.123.32/29 covers all IP addresses in the range 123.123.123.32 to 123.123.123.39. If you are not sure about CIDR blocks, all functionality here can be achieved by either listing IP addresses one-by-one, or by using an IP range to CIDR conversion tool to quickly find the ranges you need.

A Best Practice Recommendation

Network rules are schema-level objects. This means that each network rule must sit within a schema, within a database. This can make things much harder to manage if network rules are created sporadically without a structure in place. It is highly recommended to agree beforehand on a central location or set of locations for network rules and only ever create them there. A very simple example could be having a database called “ACCOUNT_ADMIN” that contains a schema called “NETWORK_RULES,” then granting the “CREATE NETWORK RULE” privilege on this schema to the “SECURITYADMIN” role. This makes it far easier for anybody managing network rules/policies to understand what is already there.

What are Network Policies?

In short, network policies are a mechanism for allowing/blocking specific traffic. These can be thought of as a bridge between a set of network rules and a particular user, integration or account. There is no limit to the number of network rules that can be linked to a network policy, however an individual user/integration/account can only leverage a single network policy at any time.

The following diagram demonstrates how multiple network rules can be leveraged by a network policy, and yet only a single network policy can be leveraged by any particular user/integration/account:

Example Network Policies

Here is a simple example for creating a network policy that only allows traffic for two specific network rules:

create network policy "NET_POL__PRIVATE_ENDPOINT_ONLY"
  allowed_network_rule_list = ('NET_RULE__HEAD_OFFICE', 'NET_RULE__INTERNAL_VPN')
  comment = 'Allows access to traffic originating from either the company head office or the company Virtual Private Network (VPN)'
;

Similarly, here is an example of a network rule that only allows access via an exclusive private endpoint and blocks all access that comes through the public internet:

create network policy "NET_POL__PRIVATE_ENDPOINT_ONLY"
  allowed_network_rule_list = ('NET_RULE__MY_PRIVATE_ENDPOINT')
  blocked_network_rule_list = ('NET_RULE__ALL_IPV4')
  comment = 'Allows user access through the private endpoint. Blocks any access that comes through the public internet.'
;

Why Bother with Network Rules?

If you look at the documentation for network policies in Snowflake, you can see that it is possible to directly block/allow IP addresses and CIDR blocks within a network policy, instead of relying on network rules. This is the legacy mechanism that Snowflake provided for managing network security, before network rules were available. Whilst this legacy functionality is still available, I would highly recommend leveraging network rules instead. This has multiple benefits, including:

  • Network rules can be attached to multiple network policies, so you only need to define/maintain the information in a single network rule instead of in multiple network policies.
    • For example, there could be many network policies that involve an internal VPN. For whatever reason, the outbound IP address of this internal VPN could change. If using network rules, this new outbound IP address only needs to be changed in one location instead of in every network policy.
  • Network rules provide a better mechanism for clearly explaining what each IP address or CIDR block is.
    • For example, consider a network policy that allows traffic from multiple locations, including the head office, an internal VPN and two other offices. This could easily result in 4 or more different IP addresses or CIDR blocks within a single policy. When a network administrator is reviewing these 3 months later, it is much harder to tell that 123.123.123.10 is the head office and 123.45.67.89 is one of the other offices. Depending on the maturity of the networking team, they may not have this information easily on hand and might have to investigate more deeply in other resources to determine which is the appropriate IP address for which office. This problem goes away if each office has its own network rule, as it is easier to match an IP address against a network rule.

How to Leverage Network Policies To Secure Authentication Requests and Security Integrations

Once a network policy has been created, leveraging it to secure a user/integration/account is very simple. All that is required is an alter statement, such as the following examples:

-- Apply a network policy to an individual user
alter user "MY_USER"
set
  network_policy = "NET_POL__MY_NETWORK_POLICY"
;

-- Apply a network policy to a security integration
alter security integration "INT__MY_INTEGRATION"
set
  network_policy = "NET_POL__MY_NETWORK_POLICY"
;

-- Apply a network policy to an entire account.
alter account
set
  network_policy = "NET_POL__MY_NETWORK_POLICY"
;

Important Notes

It is important to keep the following things in mind when applying network policies:

  • As stated above, only one network policy can be applied to an individual user/integration/account at any time. If you need to apply the rules for two network policies to a single user, the only option is to create a new network policy that combines the required properties.
  • To apply a network policy to a whole account, the user must either use the “SECURITYADMIN” role (or higher), or use a role with the global ATTACH POLICY privilege.
  • Snowflake will only leverage the most granular network policy when authenticating a user or service.
    • For example, if a user attempts to authenticate:
      1. Snowflake will first check to see if there is a user-level network policy for that user. If one exists, only this user-level network policy will be leveraged for authentication.
      2. If no user-level network policy exists, Snowflake will then check to see if there is an integration-level network policy against the authentication method being leveraged. If one exists, only this integration-level network policy will be leveraged for authentication.
      3. If no integration-level network policy exists either, Snowflake will then check to see if there is an account-level network policy for the account. If one exists, only this account-level network policy will be leveraged for authentication.
      4. If no account-level network policy exists either, Snowflake will not perform any network-level validation and will simply allow the access
    • More plainly, the order is:
      1. User-level
      2. Integration-level
      3. Account-level

Viewing Active Network Policies for an Object

Snowflake have provided a simple view that greatly simplifies viewing any and all policies that are active against an object. Beyond network policies, this also includes other policies such as authentication policies and password policies.

To quickly see what policies are active against a user/integration/account, you can query the “POLICY_REFERENCES” table function with the following options:

-- View current account-level network policy
select "POLICY_NAME"
from table(
  "SNOWFLAKE"."INFORMATION_SCHEMA"."POLICY_REFERENCES"(
      REF_ENTITY_DOMAIN => 'ACCOUNT'
    , REF_ENTITY_NAME => current_account()
  )
)
where "POLICY_KIND" = 'NETWORK_POLICY'
;

-- View current integration-level network policy
select "POLICY_NAME"
from table(
  "SNOWFLAKE"."INFORMATION_SCHEMA"."POLICY_REFERENCES"(
      REF_ENTITY_DOMAIN => 'INTEGRATION'
    , REF_ENTITY_NAME => 'INT_OAUTH__MY_INTEGRATION'
  )
)
where "POLICY_KIND" = 'NETWORK_POLICY'
;

-- View current user-level network policy
select "POLICY_NAME"
from table(
  "SNOWFLAKE"."INFORMATION_SCHEMA"."POLICY_REFERENCES"(
      REF_ENTITY_DOMAIN => 'USER'
    , REF_ENTITY_NAME => 'MY_USER'
  )
)
where "POLICY_KIND" = 'NETWORK_POLICY'
;

Each of these statements will return the name of the active network policy should one exist.

Viewing Where a Specific Network Policy is Leveraged

Similar to the above, it is also possible to query this information for a specific network policy, and therefore view all users/integrations/accounts where the network policy is leveraged:

-- View where a specific network policy is leveraged
select "REF_ENTITY_DOMAIN", "REF_ENTITY_NAME"
from table(
  "SNOWFLAKE"."INFORMATION_SCHEMA"."POLICY_REFERENCES"(
      POLICY_NAME => 'NET_POL__MY_NETWORK_POLICY'
    , POLICY_KIND => 'NETWORK_POLICY'
  )
)
;

This statement will show all users/integrations/accounts where the network policy is being leveraged, with the “REF_ENTITY_DOMAIN” field showing whether this is applied to a user, integration or account.

For full details, check out Snowflake’s documentation on the POLICY_REFERENCES table function and view.

A Tip for Mapping Login History to Known Network Rules

Through the joy of Snowpark, it is not too difficult to create a process that retrieves the IP addresses and CIDR blocks for all network rules in a schema/database/account and store this information in a table. Through Snowflake’s parse_ip function, this information can be joined to the “CLIENT_ID” field in Snowflake’s LOGIN_HISTORY table function and view to assist with identifying access requests and their sources. I think you’ll have a hard time finding anybody who actually knows all appropriate IP addresses for their network without having to look them up, and this approach can drastically simplify any investigations or analytics taking place.

There is also the added benefit of potentially finding out that you already know the IP address for traffic that you thought was unknown, and that you can actually use an existing network rule/policy for something that you did not previously realise. For example, you may be trying to identify how traffic for a service principal called “MY_SERVICE_PRINCIPAL” should be secured, only to realise from this login history that the traffic always comes from “NET_POL__MY_TABLEAU_SERVER” and is actually a service principal for a Tableau workload.

I have provided the full code for this tip here for anybody to leverage.

Wrap up

To finish this article, I would like to simply reiterate what I said at the start. Network rules and policies are a critical component in any secure Snowflake environment. Without this protection, you have one less wall to prevent potential bad actors from accessing your environment. Whilst it may be easy to think that MFA is enough to secure an environment, this is certainly not enough and you should not rely on MFA alone to keep accounts secure. In fact, I would not rely on the combination of MFA and network rules/policies alone either, as there’s also a great benefit from using authentication policies, password policies and security integrations.

In short, the best way to ensure that your Snowflake accounts are secure is to remain vigilant and put up as many walls as you can, whilst of course being mindful not to introduce unnecessary roadblocks that get in the way of actual users being able to use the environment without a security benefit. It’s better to be overcautious than to have a breach.

If you like what you have seen here and would like some hands-on support and guidance, I strongly recommend checking out our Snowflake Security and Cost Management Health Checks. If you are interested in this or any of our services, then contact us here.

The post Hardening Snowflake Security with Network Rules and Policies appeared first on InterWorks.

]]>
Querying Stock Data with an API Query Profile in Matillion ETL https://interworks.com/blog/2024/07/09/querying-stock-data-with-an-api-query-profile-in-matillion-etl/ Tue, 09 Jul 2024 16:17:43 +0000 https://interworks.com/?p=61646 Matillion ETL is a GUI-based orchestration and transformation tool for data movement that has many functionalities, including querying external data via API. There are three primary ways to query this data from Matillion ETL: Create a custom Python script to send requests to the API...

The post Querying Stock Data with an API Query Profile in Matillion ETL appeared first on InterWorks.

]]>

Matillion ETL is a GUI-based orchestration and transformation tool for data movement that has many functionalities, including querying external data via API. There are three primary ways to query this data from Matillion ETL:

  1. Create a custom Python script to send requests to the API directly, storing the response in variables in Matillion ETL:
    • This option is code-heavy and then requires further steps to convert the response into structured data. However, there are plenty of scenarios where this is preferable.
  2. Leverage functionality in the underlying data platform, such as external access integrations in Snowflake:
    • This option is also code-heavy and requires deeper knowledge of the functionality of the underlying data platform. However, there are also plenty of scenarios where this is preferable. At a minimum, this option allows all compute to be pushed down to the underlying data platform. 
  3. Utilize Matillion ETL’s own API Query Profiles functionality to fetch data in the structure you desire:
    • This option is the focus of this article.

Note that this article covers the functionality in Matillion ETL, which is the Matillion technology deployed within customer environments. The process is different if using Matillion Data Productivity Cloud (DPC), which is the Matillion’s newer SaaS technology.

Understanding Our Example API — Stock Prices

For the purpose of this blog, we will use a stock price API as an example. Our intention is to retrieve the recent performance of a given symbol over time, by leveraging the time series endpoint. This is documented here.

Specifically, the endpoint has the following URL:

https://api.twelvedata.com/time_series

To query any endpoint in this API, an API key is required to authenticate requests. If you wish to stick with simple dummy data, the value “demo” can be leveraged. If you wish to explore further, you may wish to configure your own free API key by signing up to the free plan, here. This API key can either be provided as a query parameter, which we will do quickly in a moment, or it can be included in the header of the request, which we will do in our main query profile.

Also, the following two parameters are required for the specific endpoint that we wish to query:

  • symbol: This is the symbol of the instrument, otherwise known as the code for the stock.
  • interval: This is the length of the interval between values in the returned time series.

Finally, among others, there is an optional parameter that I would recommend:

  • outputsize: This is the number of data points to receive, ranging from 1 to 5000.

The following URL can be clicked to quickly query the API and see a sample of the output:

https://api.twelvedata.com/time_series?symbol=AAPL&interval=1day&outputsize=5&apikey=demo

Along with some metadata, the endpoint returns the following information for the symbol (stock) for each interval:

  • datetime: Datetime when the interval began.
  • open: Price at the opening of the interval.
  • close: Price at the end of the interval.
  • high: Highest price that occurred during the interval.
  • low: Lowest price that ocurred during the interval.
  • volume: Volume of trades that occurred during the interval.

Configuring the Query Profile

The first step is creating a query profile. This profile is how Matillion packages up any information it has about the API and allows the API to be leveraged as part of an orchestration job.

To create the query profile, follow these steps:

  1. Under the “Project” menu, select “Manage API Profiles” > “Manage Query Profiles:”

  2. This menu will list all query profiles that currently exist within your Matillion ETL environment. To create a new one, select the + icon:

  3. Give the query profile a name, such as “Stocks,” then click “OK:”
  4. Find your new query profile in the list, then select the gear icon to configure it:

  5. At first, this will be empty as no endpoints are configured. For this article we will only create a single endpoint, however more complex scenarios can leverage multiple endpoints in the same query profile. Select “New Endpoint:”

  6. Under “Source Details,” give the new endpoint a name and description. Personally, I like to provide the link to the documentation of the endpoint in the description where possible, though you may wish to provide further information.

  7. Not surprisingly, the “Endpoint Configuration” area is where the majority of work takes place when configuring an endpoint. We start by populating the main endpoint URL. Recall from earlier that the value is as follows: https://api.twelvedata.com/time_series. Enter this and click “Send” to confirm that the URL is correct. Since nothing else is configured yet, the response will include an error stating a missing API key:

    In addition to the main endpoint URL at the top, there are a few different tabs to look at here. These changes should resolve the missing API key issue, along with allowing some flexibility when querying the API.

    1. First, switch to the “Auth” tab. This tab is where optional security/authentication information can be provided. In our case, an API key is required. As mentioned above, you can either use the value “demo” or configure your own free API key by signing up to the free plan, here. The value should be added as a query parameter:

      Once the API key is configured, hit “Send” again to see a different error message. This confirms that the API key is now valid, and we can move on to providing the required parameters.
    2. Switch to the “Params” tab. This is where we can enter any other query parameters. Recall from earlier that we intend to send the following parameters to the endpoint:
      • symbol: This is the symbol of the instrument, otherwise known as the code for the stock.
      • interval: This is the length of the interval between values in the returned time series.
      • outputsize: This is the number of data points to receive, randing from 1 to 5000

      Each of these can be entered into the “Params” tab.

      Other than the parameter name, there are three fields here:

      • Value: This is the value of the parameter when you are testing the functionality by clicking “Send” in the “Endpoint Configuration” screen. This is sometimes persisted after you close the screen, depending on the variable type.
      • Parameter Type: This determines what type of parameter is being used
        • Query: Parameters that are added to the end of the URI after a ? character, separated by & characters. This is what we need for our example, as our parameters look like this:
          ?symbol=AAPL&interval=1day&outputsize=5
        • URI: Parameters that are actually inserted into the URI, replacing placeholder values of the form {param-name}
        • Header: Parameters should be included in the header of the request sent to the API
      • Variable Type: This determines how the variable can be populated when the query profile and endpoint are leveraged as part of a Matillion orchestration job
        • Connection: This is the default variable type, where the variable is leveraged when sending the request to the API. This is what we need for our example.
        • Constant: For this variable type, the value is persisted at all times and cannot be changed. It therefore cannot be overridden or populated when the endpoint is queried as part of a Matillion orchestration job.
        • Input: This allows the values which are returned by the API query to be filtered from inside the API Query component in a Matillion orchestration job.

      Again, click “Send” to see that we now have a working API query!

    3. Our example is a simple GET endpoint that does not require any more configuration. However, if you are attempting a more complex example then note that the “Body” tab can be used to add a body to your request.
  8. We can now proceed to the “Response Configuration” area, where the final standardised output can be configured. The response we receive from our endpoint is rather simple, consisting of a dictionary containing multiple keys, of which one is a list of row-level data. The “Repeating Element” entry is the key that is used to determine where row-level data is retrieved. Matillion can understand this without further intervention and will automatically identify the “values” key as the list of row-level information, populating the “Repeating Element” in this area to match.

    For more complex scenarios, the “Repeating Element” may need to be defined specifically. This is achieved by right clicking on a field and selecting “Set Repeating Element.”
    This area is also capable of handling paginated responses, however this is outside the scope of this article.
  9. Finally, the “Review” area can be used to quickly confirm that the output appears as expected. Select “Finish” when ready.

  10. Once finished, the query can again be tested by selecting the corresponding “Table” (this is what the Matillion UI labels the endpoint) and populating any parameter values.

  11. Now for the less fun part of Matillion API Query Profiles. You may have noticed that this output does not include all of the expected fields! There were a set of meta fields outside of our repeating element which have not been included. Unfortunately, this is a known limitation in the UI, but there is a workaround! This has also been resolved in Matillion’s new Data Productivity Cloud, which is their SaaS offering. To work around this, toggle on “Advanced Mode” in the upper right corner. This will show you the Report template Schema Definition file, otherwise known as the RSD. This is the code that forms the endpoint’s configuration. This is documented in full detail here and here. Our goal is nothing too complex. We simple add the missing columns to the top. Testing again will then return the correct fields.

The full RSD for the endpoint is as follows:

<api:script xmlns:api="http://apiscript.com/ns?v1" xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <!-- See Column Definitions to specify column behavior and use XPaths to extract column values from JSON. -->
  <api:info title="time-series" desc="Generated schema file." xmlns:other="http://apiscript.com/ns?v1">
    <!-- You can modify the name, type, and column size here. -->
    <attr name="currency"          xs:type="string" readonly="false"  other:xPath="/json/meta/currency"          />
    <attr name="exchange"          xs:type="string" readonly="false"  other:xPath="/json/meta/exchange"          />
    <attr name="exchange_timezone" xs:type="string" readonly="false"  other:xPath="/json/meta/exchange_timezone" />
    <attr name="interval"          xs:type="string" readonly="false"  other:xPath="/json/meta/interval"          />
    <attr name="mic_code"          xs:type="string" readonly="false"  other:xPath="/json/meta/mic_code"          />
    <attr name="symbol"            xs:type="string" readonly="false"  other:xPath="/json/meta/symbol"            />
    <attr name="type"              xs:type="string" readonly="false"  other:xPath="/json/meta/type"              />
    <attr name="close"             xs:type="string" readonly="false"  other:xPath="/json/values/close"           />
    <attr name="datetime"          xs:type="date"   readonly="false"  other:xPath="/json/values/datetime"        />
    <attr name="high"              xs:type="string" readonly="false"  other:xPath="/json/values/high"            />
    <attr name="low"               xs:type="string" readonly="false"  other:xPath="/json/values/low"             />
    <attr name="open"              xs:type="string" readonly="false"  other:xPath="/json/values/open"            />
    <attr name="volume"            xs:type="string" readonly="false"  other:xPath="/json/values/volume"          />
  </api:info>

  <api:set attr="DataModel" value="DOCUMENT" />
  <api:set attr="uri" value="https://api.twelvedata.com/time_series" />

  <api:set attr="JSONPath" value="/values" />

  <!-- Uri parameters set up -->
  <api:set attr="paramname#1" value="symbol" />
  <api:set attr="paramvalue#1" value="[_connection.symbol]" />
  <api:set attr="paramname#2" value="interval" />
  <api:set attr="paramvalue#2" value="[_connection.interval]" />
  <api:set attr="paramname#3" value="outputsize" />
  <api:set attr="paramvalue#3" value="[_connection.outputsize]" />

  <!-- The GET method corresponds to SELECT. Here you can override the default processing of the SELECT statement. The results of processing are pushed to the schema's output. See SELECT Execution for more information. -->
  <api:script method="GET">
    <api:set attr="method" value="GET"/> <!-- HTTP request type -->
    <api:set attr="method" value="GET"/>
    <api:call op="jsonproviderGet">
      <api:push/>
    </api:call>
  </api:script>

</api:script>

Leveraging the Query Profile Inside an Orchestration Job

To leverage the query profile inside an orchestration job, follow these steps:

  1. If needed, create a new orchestration job. This is the job which will be used to execute the API queries.
  2. We start by creating some variables, matching the variables we created for the query profile. Inside the orchestration job, right click the canvas and select “Manage Job Variables.”

  3. Create the following variables, with either the defaults listed or whichever other values you wish:
    • API_KEY: demo
    • INTERVAL: 1day
    • OUTPUT_SIZE: 5
    • SYMBOL: AAPL

  4. Now find the “API Query” component and drag it onto the canvas:
  5. There are several things to configure within this component, most of which are out of scope for this article as they are standard Matillion functionality. The important things that we configure are:
    • Profile: The name of the API query profile. For our example, we called this “Stocks.”
    • Data Source: The name of the endpoint/table within the API query profile. For our example, we called this “time-series.”
    • Data Selection: The fields that we wish to retrieve. For this example, simply select them all.
    • Target Table: The name of the table in Snowflake where data will be loaded to. For this example, we have called this “API_STOCKS.”
  6. There is one more thing to configure here. It remains to configure the input variables so that our API query targets the correct data. To achieve this, we need to configure the connection options. There are two connection options we need to configure:
    • Other: This is the connection option required for any of our parameters within the endpoint with the “Connection” variable type. Specifically, we use this to provide our three main parameters, using a semicolon-delimited list and leveraging the ${variable-name} notation to leverage the variables from our orchestration job.
    • CustomUrlParams: This is where any additional URL parameters can be provided. This uses an &-delimited list, just like when adding parameters at the end of a regular URL.

    We could have made the API key another parameter in our endpoint within the query profile, and then populated it along with everything else in the “Other” connection option. Personally, I prefer to keep any authentication values separate in some way so that they are more easily found and secured.

  7. It should now be possible to see some results by sampling the component:

And so concludes our adventure into query profiles in Matillion ETL. We have created an API query profile and leveraged it inside an orchestration job. By aligning the inputs with job variables, we have improved the flexibility of the job and could leverage it as part of a larger job to iterate over multiple symbols, for example.

The post Querying Stock Data with an API Query Profile in Matillion ETL appeared first on InterWorks.

]]>
Snowflake External Access: Retrieve Data from an API https://interworks.com/blog/2024/07/02/snowflake-external-access-retrieve-data-from-an-api/ Tue, 02 Jul 2024 15:33:42 +0000 https://interworks.com/?p=61366 Recently, Snowflake released external access integrations into general availability. This allows Snowflake admins to create secure tunnels for Snowflake to communicate with the outside world, and can therefore communicate with external APIs to complete all manner of tasks. For example, external access could be used...

The post Snowflake External Access: Retrieve Data from an API appeared first on InterWorks.

]]>

Recently, Snowflake released external access integrations into general availability. This allows Snowflake admins to create secure tunnels for Snowflake to communicate with the outside world, and can therefore communicate with external APIs to complete all manner of tasks. For example, external access could be used for any of the following, among many others:

This specific article focuses on how to retrieve the latest status summary from the Snowflake Status API as a table.

How Does External Access in Snowflake Work?

Before we dive into the specific example, let’s cover the architecture involved for external access in Snowflake. There are four main components:

  1. A network rule that controls where traffic is allowed to travel to.
  2. An external access integration that allows Snowflake objects to leverage the network rule.
  3. An optional security component to facilitate authentication. This usually leverages a Snowflake Secret object that can either securely store information such as a username and password, or make use of a related Security Integration object that facilitates OAuth authentication with an API endpoint.
  4. A Snowflake object such as a stored procedure or a user defined function (UDF) that leverages the external access integration.

This is demonstrated in the following diagram:

How to Configure an Outbound Network Rule

In short, network rules are database objects that store information on a type of network traffic for Snowflake. This traffic can be one of two categories:

  • Ingress – Traffic from outside of Snowflake that is coming in to Snowflake
  • Egress – Traffic that is leaving Snowflake and travelling outside

This article specifically focuses on the egress category, and how we can leverage this to enable Snowflake to communicate with endpoints outside of Snowflake’s network. Note that in Snowflake’s terminology, the category (ingress/egress) is referred to as the “mode,” and this terminology will be used for the rest of this article.

Example Network Rule for External Access

Here is a simple example for creating an outbound/egress network rule:

create network rule "MY_OUTBOUND_NETWORK_RULE"
  type = HOST_PORT -- Always use this value for external access integrations
  mode = EGRESS -- Always use this value for external access integrations
  value_list = ('?????') -- Populate with host(s) of the target endpoint(s), such as matillion.customer.com, host.online.tableau.com, customer.webhook.office.com, eu1.api.matillion.com, etc
  comment = 'Allows traffic to be sent to the host(s) in the value_list'
;

This template can be used to configure the network rule for any external access.

It is important to note that Snowflake stores network rules as database objects, meaning they sit inside schemas in databases. This is different to network policies, which are account-level objects. This is easy to get confused, which is why it has been mentioned here.

Specific Network Rule Example

The following network rule could be leveraged to enable outbound/egress traffic with the Snowflake Status API:

create network rule "EXTERNAL_ACCESS"."NETWORK_RULES"."EGRESS__SNOWFLAKE_STATUS_API"
  type = HOST_PORT
  mode = EGRESS
  value_list = ('status.snowflake.com') -- The host for the Snowflake Status API
  comment = 'Allows traffic to be sent to the Snowflake Status API'
;

How to Set Up an External Access Integration

In short, external access integrations are account-level objects that act as the bridge between egress network rules and any other Snowflake objects (stored procedures/UDFs).

Here is a simple example for creating an external access integration that leverages an outbound/egress network rule:

create external access integration "MY_EXTERNAL_ACCESS_INTEGRATION"
  allowed_network_rules = ("MY_DATABASE"."MY_SCHEMA"."MY_OUTBOUND_NETWORK_RULE")
  enabled = FALSE
  comment = 'External access integration to support traffic sent to the <endpoint(s) enabled by the network rule(s)>'
;

Specific External Access Integration Example

In many cases, authentication is needed to interact with an API. This is demonstrated in many of our other external access articles — for example, leveraging an OAuth integration when connecting to Matillion DPC, or leveraging standard username and password authentication when connecting to Matillion ETL.

In this specific example however, no authentication is needed. The Snowflake Status API is open. We can therefore create the external access integration directly, with nothing else to configure:

create external access integration "EAI__SNOWFLAKE_STATUS_API"
  allowed_network_rules = ("EXTERNAL_ACCESS"."NETWORK_RULES"."EGRESS__SNOWFLAKE_STATUS_API")
  enabled = TRUE
  comment = 'External access integration to support traffic sent to the Snowflake Status API'
;

How to Retrieve the Current Status Summary from the Snowflake Status API

Now that we have discussed and configured the underlying architecture behind our process, we can move on to our true goal, which is to create a User-Defined Table Function (UDTF) in Snowflake that can query the Snowflake Status API and return the current status summary.

A Quick Overview of the Snowflake Status API

Since we are only using this API for an example, we will not spend much time discussing the API itself. The most important information is that we will be querying this endpoint.

This endpoint returns the current status for each core component in Snowflake, and is similar to viewing the user-friendly equivalent page. If you like, open that endpoint now to see an example of the output. If Snowflake’s status is currently healthy, the output will be fairly dull. If Snowflake is encountering issues then the output gets more interesting.

Our purpose is to return the current status of every module within every cloud provider region for Snowflake, which is all contained in the “components” key.

Here is an example output, where we have only included the response for the region “AWS – US West (Oregon)”:

{
  "page": {
    "id": "94s7z8vpy1n8",
    "name": "Snowflake",
    "url": "https://status.snowflake.com",
    "time_zone": "America/Los_Angeles",
    "updated_at": "2024-06-11T22:26:37.674-07:00"
  },
  "components": [
    {
      "id": "l8zmwsrmltjd",
      "name": "AWS - US West (Oregon)",
      "status": "operational",
      "created_at": "2019-01-02T15:16:31.219-08:00",
      "updated_at": "2019-05-03T13:56:49.012-07:00",
      "position": 1,
      "description": null,
      "showcase": false,
      "start_date": null,
      "group_id": null,
      "page_id": "94s7z8vpy1n8",
      "group": true,
      "only_show_if_degraded": false,
      "components": [
        "pkrzt59vb1cm",
        "b96r20d6p5g7",
        "xjvzhg4llmqt",
        "6jy082tyx85d"
      ]
    },
    {
      "id": "pkrzt59vb1cm",
      "name": "Snowflake Data Warehouse (Database)",
      "status": "operational",
      "created_at": "2019-01-15T09:18:30.171-08:00",
      "updated_at": "2024-05-28T21:46:53.905-07:00",
      "position": 3,
      "description": null,
      "showcase": false,
      "start_date": null,
      "group_id": "l8zmwsrmltjd",
      "page_id": "94s7z8vpy1n8",
      "group": false,
      "only_show_if_degraded": false
    },
    {
      "id": "b96r20d6p5g7",
      "name": "Snowpipe (Data Ingestion)",
      "status": "operational",
      "created_at": "2019-01-30T00:40:49.592-08:00",
      "updated_at": "2024-02-20T14:16:33.244-08:00",
      "position": 4,
      "description": null,
      "showcase": false,
      "start_date": null,
      "group_id": "l8zmwsrmltjd",
      "page_id": "94s7z8vpy1n8",
      "group": false,
      "only_show_if_degraded": false
    },
    {
      "id": "xjvzhg4llmqt",
      "name": "Replication",
      "status": "operational",
      "created_at": "2020-04-28T08:52:25.259-07:00",
      "updated_at": "2024-02-20T14:16:33.256-08:00",
      "position": 5,
      "description": null,
      "showcase": false,
      "start_date": null,
      "group_id": "l8zmwsrmltjd",
      "page_id": "94s7z8vpy1n8",
      "group": false,
      "only_show_if_degraded": false
    },
    {
      "id": "6jy082tyx85d",
      "name": "Snowsight",
      "status": "operational",
      "created_at": "2021-08-06T09:12:25.424-07:00",
      "updated_at": "2024-04-30T07:31:05.031-07:00",
      "position": 6,
      "description": null,
      "showcase": false,
      "start_date": "2021-08-06",
      "group_id": "l8zmwsrmltjd",
      "page_id": "94s7z8vpy1n8",
      "group": false,
      "only_show_if_degraded": false
    }
  ]
}

A Quick Note on User-Defined Table Functions (UDTFs) in Snowflake

We are about to create a UDTF to return the current Snowflake status as a table. This article assumes familiarity with UDTFs, however the example itself is relatively simple anyway. If you wish for more information on UDTFs then the following articles are recommended:

Creating a UDTF to Return the Current Snowflake Status

Finally, we come to the main meat of the article. First, I will provide the full code for the UDTF. I will then explain some key areas.

This article does not explain Python stored procedures in general. If you wish to know more about these, the following articles are recommended: An Introduction to Python UDTFs in Snowflake, A Definitive Guide to Creating Python UDTFs Directly within the Snowflake User Interface. These are both part of our Snowflake with Python series.

The full UDTF definition is as follows:

-- Create UDTF
create or replace function "EXTERNAL_ACCESS"."FUNCTIONS"."API__SNOWFLAKE_STATUS__COMPONENTS"()
  copy grants
  returns table(
      "ID" string
    , "CLOUD_PROVIDER_REGION" string
    , "COMPONENT" string
    , "STATUS" string
    , "CREATED_AT" timestamp_ntz
    , "UPDATED_AT" timestamp_ntz
    , "DESCRIPTION" string
    , "POSITION_IN_GROUP" int
    , "DATA_QUERIED_AT" timestamp_ntz
  )
  language python
  runtime_version = 3.10
  handler = 'main'
  external_access_integrations = ("EAI__SNOWFLAKE_STATUS_API")
  packages = ('requests', 'pandas')
as
$$

## Imports
import requests
import pandas as pd

## Define main class
class main:
  
  ### Define process method that acts
  ### on each individual input row.
  ### This method must be provided in all UDTFs
  ### but is not needed for our example as we
  ### are doing nothing at a row level.
  def process(self) :
    pass

  ### Define end_partition method that acts
  ### on full partition after rows are processed
  def end_partition(self) :
    
    ### Retrieve the components from the endpoint
    snowflake_components_json = retrieve_snowflake_summary_as_json()

    ### Process JSON into a Pandas DataFrame
    df_snowflake_components = process_raw_components_data(snowflake_components_json)

    ### Output the result as tuples to match Snowflake UDTF requirements
    return list(df_snowflake_components.itertuples(index=False, name=None))
  
## Define function to send the request to the summary endpoint
def retrieve_snowflake_summary_as_json():

  ### Define the endpoint URL
  endpoint_url = r"https://status.snowflake.com/api/v2/summary.json"

  ### Populate headers for the API call
  headers = {
    "Content-Type": "application/json"
  }

  ### Execute API call
  response = requests.request("GET", endpoint_url, headers=headers)

  ### Only return something if the request is successful
  if response.status_code == 200 :
    return response.json()
  else :
    raise Exception(f"Error. Response code: {str(response.status_code)}")

## Define function to process JSON into a Pandas DataFrame
def process_raw_components_data(snowflake_components_json: dict):

  ### Load components into dataframe
  df_all = pd.json_normalize(snowflake_components_json["components"])
  df_all["data_queried_at"] = snowflake_components_json["page"]["updated_at"]

  ### Convert datetime fields.
  ### 1. Convert string to datetime
  ### 2. Remove timezone localization (output is now without timezone)
  df_all["created_at"] = \
    pd.to_datetime(df_all["created_at"], utc=True) \
      .dt.tz_localize(None)
      
  df_all["updated_at"] = \
    pd.to_datetime(df_all["updated_at"], utc=True) \
      .dt.tz_localize(None)
      
  df_all["data_queried_at"] = \
    pd.to_datetime(df_all["data_queried_at"], utc=True) \
      .dt.tz_localize(None)

  ### Build a formatted dataframe for all cloud provider region groups
  df_groups = df_all[df_all["group"] == True].copy()
  df_groups["component"] = "All"
  df_groups["cloud_provider_region"] = df_groups["name"]
  df_groups["position_in_group"] = 1

  ### Determine group-level output
  df_output_groups = df_groups[["id", "cloud_provider_region", "component", "status", "created_at", "updated_at", "description", "position_in_group", "data_queried_at"]]

  ### Build a formatted dataframe for all components
  df_components = df_all[df_all["group"] == False].copy()
  df_components["component"] = df_components["name"]
  df_components["position_in_group"] = df_components["position"]

  ### Attach the cloud provider region to the component
  df_components_merged = pd.merge(
      left = df_components
    , right = df_groups
    , left_on = ["group_id"]
    , right_on = ["id"]
    , suffixes = ["", "_from_group"]
  )

  ### Determine component-level output
  df_output_components = df_components_merged[["id", "cloud_provider_region", "component", "status", "created_at", "updated_at", "description", "position_in_group", "data_queried_at"]]

  ### Build final output dataframe, combining both outputs
  df_output = pd.concat([df_output_groups, df_output_components], axis=0)
  df_output.sort_values(["cloud_provider_region", "position_in_group"], inplace=True)

  return df_output

$$
;

-- Test query
select *
from table("EXTERNAL_ACCESS"."FUNCTIONS"."API__SNOWFLAKE_STATUS__COMPONENTS"())
;

The following lines are important here:

  • Line 18 enables the UDTF to leverage the external access integration.
  • Line 19 ensures that the relevant Python packages are available, which are then imported on lines 24-25 and leveraged throughout the function. Fortunately, these are all simple to ingest as they are already included in Anaconda’s Snowflake channel, which Snowflake leverages when executing Python.
  • Lines 51-69 define a function that sends a request to the Snowflake Status API and returns the response JSON. This is then leveraged on line 43.
  • Lines 71-123 convert the response JSON from the Snowflake Status API into a Pandas DataFrame, performing some light data engineering along the way. This is leveraged on line 46.
  • Line 49 returns the final result as a list of tuples, as per Snowflake Python UDTF requirements. The output aligns with the columns define on lines 5-13.
  • Lines 129-131 can be used to test the process.

This UDTF provides the main structure to achieve our goals, but only uses some basic error handling. Depending on your purposes, it may be worth expanding this when deploying it to your own environment.

Testing the UDTF

A successful test of the UDTF should yield a table detailing every Snowflake. Also, quick bonus in this screenshot that we can see Snowsight’s new Dark Mode!

Wrap Up

Hopefully the information in this article provides you with all of the tools you need to build your own UDTFs that retrieve data from external APIs. Naturally, we’ve deliberately chosen a simpler example for this article and have therefore not tackled challenges such as paging, schema changes or authentication, however those are all challenges better faced by a standard Python/APIs article rather than an article about enabling external access from Snowflake.

The post Snowflake External Access: Retrieve Data from an API appeared first on InterWorks.

]]>
Snowflake Security and Cost Management: Health Checks https://interworks.com/blog/2024/06/19/snowflake-security-and-cost-management-health-checks/ Wed, 19 Jun 2024 15:31:05 +0000 https://interworks.com/?p=60221 As many are already aware, there has recently been a lot of noise about security on the Snowflake platform. On June 2nd, 2024, Snowflake issued this statement regarding a targeted threat campaign against some Snowflake customer accounts. This appeared to be a targeted campaign directed...

The post Snowflake Security and Cost Management: Health Checks appeared first on InterWorks.

]]>

As many are already aware, there has recently been a lot of noise about security on the Snowflake platform. On June 2nd, 2024, Snowflake issued this statement regarding a targeted threat campaign against some Snowflake customer accounts. This appeared to be a targeted campaign directed at users with single-factor authentication, where threat actors leveraged credentials previously purchased or obtained through infostealing malware. It’s important to note that their investigation has not discovered any vulnerabilities, exposure or misconfiguration in Snowflake directly, but rather the attack seems to leverage passwords that were retrieved by threat actors outside of Snowflake and then used in attempts to log into various accounts and websites.

As a response to this incident, Snowflake is repeating their best practice suggestions that all customers review their own security configurations and ensure sufficient security hardening is in place. This includes:

  • Reviewing your security and user lifecycle policies:
    • Password rotation
    • Password policies
    • Stale account review
    • Etc.
  • Ensuring that MFA is enabled on all user accounts (not just the ones with admin-level access)
  • Ensuring network policies are in place that limit the access to Snowflake down to specific IP addresses or subnets where possible, especially for service principals
  • Leveraging SSO where possible

Snowflake’s security hardening documentation in response to this risk can be found here. They have included scripts that can be executed to detect if your account was compromised, as well as steps to review your security and harden it against these types of attacks.

It is important to point out that whilst this is an excellent step in ongoing security automation, it lacks the same depth and breadth of experience as trusted Snowflake experts, such as InterWorks. If you require assistance in optimising your Snowflake environment, not only tightening the security but also optimising the architecture and improving cost efficiency, our Snowflake Health Check solution is ideally positioned to support you. If you have any requirements beyond this, we can support you to find and implement the appropriate solution.

What Is an InterWorks’ Snowflake Health Check?

In one sentence, this is a useful solution for any customer who is interested in reviewing that their Snowflake environment is following best practices for both security hardening and cost optimisation.

Even though Snowflake has an excellent services layer that removes a lot of the general admin involved in maintaining your data platform, there are still plenty of opportunities for cost optimisation and security hardening. Instead of failing to match compute needs with cost constraints, worrying that the right people have the right permissions or stressing about whether your environment is secure, let our team of expert data architects perform a Snowflake Health Check to ensure your Snowflake environment is following best practices.

A Snowflake Health Check is an opportunity for Snowflake customers to have their Snowflake instance evaluated by our certified experts. During the health check, a Snowflake-certified expert from our team will leverage a combination of Snowflake’s Trust Center scans and our own in-house scripts to identify areas where best practices could be implemented, including opportunities for cost savings and potential security risks.

This solution applies to any customer that is using Snowflake and wants reassurance on either their security hardening or their cost effectiveness. The Snowflake Health Check will help identify opportunities to match compute needs with cost constraints, ensure the right people have the right permissions and verify that the Snowflake environment(s) have security hardening that follows best practices.

Expected Outcomes

At the end of a Snowflake Data Health check for a customer, they will receive a report that details how their environment has been performing, combined with recommendations on how to improve their environment. Recommendations include savings, which we have seen can be as high as 30% in credit consumption reduction.

Within the report, the client will be provided with recommendations on how to ensure they do not have any vulnerabilities related to security hardening or access control issues. Anything that is easily fixed within the engagement time will be actioned by our team where possible.

Our experts will apply any quick fixes we find as we go, and work with you to plan any wider changes.

Why InterWorks?

InterWorks is a full-service, BI and IT consulting firm specializing in data-driven business solutions. Our team’s world-class expertise will help you find the right answers. At InterWorks, our global bench brings familiarity and expertise across a wide range of technologies, analytics paradigms and industry verticals. InterWorks can help you throughout the full continuum of analytics development and data engineering needs. Whether it’s transforming datasets, applying thoughtful dashboard design and UX, going deeper with machine learning and data science, or simply deploying analytics environments, we have you covered. As a regular partner of the year among partners like Tableau, Snowflake, Matillion and more, InterWorks boasts one of the largest and most talented teams of data and analytics consultants in the world.

Our Snowflake Partnership

We have a long-standing partnership with Snowflake, which includes multiple Partner of the Year awards across several regions over several years. Here are some examples:

You can also view our main page about our Snowflake partnership.

Our Demonstratable History

We have a strong history in supporting customers with their Snowflake architecture directly, which you can see from our case studies, and we regularly share our knowledge through our blog. Here are some particularly relevant articles when it comes to the security of your Snowflake environment:

Contact Us

Interested in an InterWorks’ Snowflake Health Check? Then contact us here.

The post Snowflake Security and Cost Management: Health Checks appeared first on InterWorks.

]]>
How to Improve Security with Snowflake’s Trust Center https://interworks.com/blog/2024/06/18/how-to-improve-security-with-snowflakes-trust-center/ Tue, 18 Jun 2024 20:15:51 +0000 https://interworks.com/?p=60208 As many are already aware, recently there has been a lot of noise about security on the Snowflake platform. On June 2, 2024 Snowflake issued this statement regarding a targeted threat campaign against some Snowflake customer accounts. This appeared to be a targeted campaign directed at users...

The post How to Improve Security with Snowflake’s Trust Center appeared first on InterWorks.

]]>

As many are already aware, recently there has been a lot of noise about security on the Snowflake platform. On June 2, 2024 Snowflake issued this statement regarding a targeted threat campaign against some Snowflake customer accounts. This appeared to be a targeted campaign directed at users with single-factor authentication, where threat actors leveraged credentials previously purchased or obtained through infostealing malware. It’s important to note that their investigation has not discovered any vulnerabilities, exposure or misconfiguration in Snowflake directly, but rather the attack seems to leverage passwords that were retrieved by threat actors outside of Snowflake and then used in attempts to log into various accounts and websites.

As a response to this incident, Snowflake is repeating their best practice suggestions that all customers review their own security configurations and ensure sufficient security hardening is in place. This includes:

  • Reviewing your security and user lifecycle policies
    • Password rotation
    • Password policies
    • Stale account review
    • Etc
  • Ensuring that MFA is enabled on all user accounts (not just the ones with admin-level access)
  • Ensuring network policies are in place that limit the access to Snowflake down to specific IP addresses or subnets where possible, especially for service principals
  • Leveraging SSO where possible

Snowflake’s security hardening documentation in response to this risk can be found here. They have included scripts that can be executed to detect if your account was compromised, as well as steps to review your security and harden it against these types of attacks.

The rest of this article is dedicated to providing a setup and review of new functionality in Snowflake that can help automate the continued assessment of your environment for security vulnerabilities.  It is important to point out that whilst this is an excellent step in ongoing security automation, it lacks the same depth and breadth of experience as trusted Snowflake experts, such as InterWorks. If you require assistance in optimising your Snowflake environment, not only tightening the security but also optimising the architecture and improving cost efficiency, our Snowflake Health Check solution is ideally positioned to support you. If you have any requirements beyond this, we can support you find and implement the appropriate solution.

Introducing Snowflake Trust Center

Snowflake has been working on security and threat detection automation and has now released into public preview: Trust Center.

Trust Center is a new area in the Snowflake user interface where administrators can deploy and monitor scanner packages, which are collections of security hardening tests that can be scheduled to run at regular intervals and display the results in the same UI. This includes a chart that shows the volume of potential risks over time, along with a list of all potential risks and suggested remediation actions:

How to Access the Trust Center

By default, the only way to access the Trust Center is by using the ACCOUNTADMIN role. However, access can be granted to other roles if you desire by granting the following database roles, depending on your requirements:

  • SNOWFLAKE.TRUST_CENTER_ADMIN – Grants full access to the Trust Center.
  • SNOWFLAKE.TRUST_CENTER_VIEW – Grants read-only access to the “findings,” which is the area in the above screenshot where potential risks and remediations are listed and visualised.

Once you have your role sorted (or just use the ACCOUNTADMIN role for simplicity), the Trust Center can be accessed under the “Monitoring” area in the navigation menu. When this area is first opened, you will see a welcome screen. As long as you can see both the “Findings” and “Scanner Packages” tabs at the top, you’ve got the right access to set things up:

Deploying a Scanner Package

To configure the Trust Center to start monitoring the security of your Snowflake account, you must deploy a scanner package. Begin by switching to the “Scanner Packages” tab, which contains a list of all available packages. At this time, there is only a single package called “CIS Benchmarks” which has been provided by Snowflake themselves:

Select the scanner package that you wish to deploy, which is most likely “CIS Benchmarks,” unless you are reading this article a while after publishing. This will give you more details about the scanner package, including a full list of all the scanners included in the package:

A scanner is an individual test that will be executed on your Snowflake environment to identify a specific risk. If you select any of the scanners on the list, a more detailed description will be provided:

When you are happy that you wish to deploy the scanner package, select the large “Enable” button in the top right corner. You may need to minimise a selected scanner description by selecting the minimise arrow:

When the “Enable” button is selected, a scheduling window will open. You can set any frequency you wish here. The important things to note are:

  1. The Trust Center will only include results up to the most recent scan. Naturally if you wish for the area to be updated more frequently then you must increase the frequency of the schedule.
  2. Each time the scanner package runs, it will leverage serverless compute credits in your Snowflake account. More details on this below.

With these in mind, select a schedule that you feel is appropriate.

This schedule can be modified at any time so don’t overthink it at this time! You can manually trigger a scanner package manually after deployment. More details on this below.

Whilst you are familiarising yourself with the general environment, you may wish to set a weekly schedule:

Alternatively, if your Snowflake environment is a larger deployment and/or has needs of tighter security monitoring, you may wish to set a daily schedule, or maybe something even more frequent:

Once you select “Continue,” the scanner package will be deployed. This will include triggering a scan immediately, which should populate the “Findings” tab after a short time. Now that the scanner package is deployed, a new “Settings” subtab will be available. This area can be used to modify the schedule or to disable the scanner package completely:

Manually Triggering a Scan

At any time, you can manually trigger a scan by selecting the refresh icon in the upper right corner of the scanners list:

Reviewing Identified Potential Risks and Suggestion Remediation Steps

Once the scanner package has completed its scan, the “Findings” tab will be populated. You may need to wait a few minutes and/or refresh your page for this to populate. This view provides a strong overview of all current potential risks that have been identified for your account:

To investigate a potential risk in more detail, select it. This will open a new pane that provides suggested remediation steps to resolve the potential risk:

To view the object(s) that have triggered the potential risk to be flagged, navigate to the “Summary” pane. This will provide further details, including the number of objects that failed the test, along with a sample of the affected objects.

If desired, select “Open a Worksheet” at the bottom of the “Summary” pane to retrieve a query that can provide the full list of objects that failed the test.

Monitoring Costs

The final piece of the puzzle is how to monitor the costs of each scan. All of this information is stored in the shared SNOWFLAKE database, inside the SERVERLESS_TASK_HISTORY view within the ACCOUNT_USAGE schema. This table can be queried directly for granular information on all Trust Center executions with the following statement:

A few things to note about this view:

  • This view is inside the shared SNOWFLAKE database and may not be available to depending on which role you are using. Full details on granting access to these views can be found here, however it should be sufficient to continue leveraging the ACCOUNTADMIN role if you are already using that when working with the Trust Center.
  • This view has a latency of up to three hours, so recent executions may not be listed.
  • Activity in this view is aggregated into one hour intervals and cannot show information more granular than this.
  • When queried directly, this view will contain separate records for each scan instead of representing the cost of the entire package at once.
  • This view contains more data that just the Trust Center scan history, so be sure to include appropriate filters.
  • This view is fully documented here.

Here is an example showing some granular results:

To see an aggregated view of total costs per hour across all scanners, leverage the following statement:

select "START_TIME", "END_TIME", sum("CREDITS_USED") as "TOTAL_CREDITS_USED"
from "SNOWFLAKE"."ACCOUNT_USAGE"."SERVERLESS_TASK_HISTORY"
where "DATABASE_NAME" = 'SNOWFLAKE'
  and "SCHEMA_NAME" = 'TRUST_CENTER_STATE'
group by "START_TIME", "END_TIME"
order by "START_TIME" desc
;

Here is an example showing aggregated results:

Final Comments

This new Trust Center is certainly a huge leap forward for enabling Snowflake customers to monitor their own Snowflake accounts and address security concerns.  At minimum, the individual scans within a scanner package provide a great list of potential risks to keep in mind to protect your environment. Being able to see the volume of potential risks through the “Findings” tab and immediately access remediation steps is incredibly valuable. Overall, I applaud Snowflake for providing this functionality and advise all customers to leverage it.

That said, there are a few points I would advise keeping in mind:

  • Whilst there is a strong list that should definitely be leveraged, it is not fully exhaustive strategy towards protecting your environment. Snowflake (and any other technologies) are only capable of protecting you if you regularly leverage these tools and stay vigilant of updates. It may be that a new potential risk is identified that is not available with the scanner package you have deployed, or is simply not covered by an existing scanner package yet. That said, the existing scanner package is already a strong place to start and will go a long way towards securing your environment.
  • Extending the above point, one key point I have not seen mentioned by the Trust Center yet is that of private connectivity. Yes there are network policies here, but you can take things a step further by enabling private connectivity between your Snowflake account and your cloud service provider. More details can be found in the following articles:
  • The very nature of security risks is that malicious entities are always trying to find new ways to breach an environment. It is strongly recommended to train your users in standard security best practice in addition to relying on this technology. For example, you could have the most secure technology available but it means nothing if one of your employees leaves their laptop and phone on a table in a cafe with a post-it note of their password manager credentials stuck to the laptop screen. An exaggerated example for sure, but still worth keeping in mind.
  • The Trust Center, and by extension the scanner packages, are still in Public Preview. This means that the technology is still being developed in part. To protect themselves, Snowflake caveat these features in their documentation with the following statement:

    Preview features are provided primarily for evaluation and testing purposes. They should not be used in production systems or with production data.

  • In our internal testing we have found that these scans can be overprotective in their outputs. That is to say, some potential risks are raised that are not relevant or appropriate. These are minor flaws but worth keeping in mind. Among others, here are a couple of examples that demonstrate how each remediation step must be carefully considered before actioning:
    • CIS-1.3 recommends ensuring that the password is not set for Snowflake users that leverage SSO. This is definitely a good recommendation and worth leveraging, however this risk will appear even if SSO is not enabled for your Snowflake account. If you proceed with the remediation step of removing passwords in this situation, your users will be unable to log into Snowflake at all as there is no SSO to fall back on.
    • CIS-4.9 recommends ensuring that Tri-Secret Secure is enabled for the Snowflake account. This is another good recommendation where appropriate, however this risk will flag even if the Snowflake account is on a lower tier than “Business Critical,” when Tri-Secret Secure is only available for the “Business Critical” tier or higher.

We hope you find this article useful in identifying and correcting potential security risks in your Snowflake accounts. It is worth keeping in mind that the new Trust Center may be able to tell you all of the potential risks that it has identified, but it is still an automated querying process under the hood that does not have the same depth and breadth of experience as trusted Snowflake experts, such as InterWorks. If you wish for assistance in optimising your Snowflake accounts, not only tightening the security but also optimising the architecture and improving cost efficiency, our Snowflake Health Check solution is ideally positioned to support you. And if you have any requirements beyond this, I’m positive we can support you.

InterWorks is a full-service, BI and IT consulting firm specializing in data-driven business solutions. Our team’s world-class expertise will help you find the right answers. At InterWorks, our global bench brings familiarity and expertise across a wide range of technologies, analytics paradigms and industry verticals. InterWorks can help you throughout the full continuum of analytics development and data engineering needs. Whether it’s transforming datasets, applying thoughtful dashboard design and UX, going deeper with machine learning and data science, or simply deploying analytics environments, we have you covered. As a regular partner of the year among partners like Tableau, Snowflake, Matillion and more, InterWorks boasts one of the largest and most talented teams of data and analytics consultants in the world.

The post How to Improve Security with Snowflake’s Trust Center appeared first on InterWorks.

]]>
Snowflake External Access: Trigger Tableau Datasource Refreshes https://interworks.com/blog/2024/06/18/snowflake-external-access-trigger-tableau-datasource-refreshes/ Tue, 18 Jun 2024 15:32:18 +0000 https://interworks.com/?p=60193 Recently, Snowflake released external access integrations into general availability. This allows Snowflake admins to create secure tunnels for Snowflake to communicate with the outside world, and can therefore communicate with external APIs to complete all manner of tasks. For example – external access could be...

The post Snowflake External Access: Trigger Tableau Datasource Refreshes appeared first on InterWorks.

]]>

Recently, Snowflake released external access integrations into general availability. This allows Snowflake admins to create secure tunnels for Snowflake to communicate with the outside world, and can therefore communicate with external APIs to complete all manner of tasks. For example – external access could be used for any of the following, among many others:

This specific article focuses on how to communicate with the REST API  for a Tableau Server/Cloud deployment to refresh extracts in a published datasource/workbook, however this series will include the other examples above as well. With each new article, the entry above will be converted to a link.

How Does External Access in Snowflake Work?

Before we dive into the specific example, let’s cover the architecture involved for external access in Snowflake. There are four main components:

  1. A network rule that controls where traffic is allowed to travel to..
  2. An external access integration that allows Snowflake objects to leverage the network rule
  3. An optional security component to facilitate authentication. This usually leverages a Snowflake Secret object that can either securely store information such as a username and password, or make use of a related Security Integration object that facilitates OAuth authentication with an API endpoint.
  4. A Snowflake object such as a stored procedure or a user defined function (UDF) that leverages the external access integration.

This is demonstrated in the following diagram:

How to Configure an Outbound Network Rule

In short, network rules are database objects that store information on a type of network traffic for Snowflake. This traffic can be one of two categories:

  • Ingress – Traffic from outside of Snowflake that is coming in to Snowflake
  • Egress – Traffic that is leaving Snowflake and travelling outside

This article specifically focuses on the egress category, and how we can leverage this to enable Snowflake to communicate with endpoints outside of Snowflake’s network. Note that in Snowflake’s terminology, the category (ingress/egress) is referred to as the “mode,” and this terminology will be used for the rest of this article.

Example Network Rule for External Access

Here is a simple example for creating an outbound/egress network rule:

create network rule "MY_OUTBOUND_NETWORK_RULE"
  type = HOST_PORT -- Always use this value for external access integrations
  mode = EGRESS -- Always use this value for external access integrations
  value_list = ('?????') -- Populate with host(s) of the target endpoint(s), such as matillion.customer.com, host.online.tableau.com, customer.webhook.office.com, eu1.api.matillion.com, etc
  comment = 'Allows traffic to be sent to the host(s) in the value_list'
;
This template can be used to configure the network rule for any external access.

It is important to note that Snowflake stores network rules as database objects, meaning they sit inside schemas in databases. This is different to network policies, which are account-level objects. This is easy to get confused, which is why it has been mentioned here.

Specific Network Rule Example

The following network rule could be leveraged to enable outbound/egress traffic with Tableau Server/Cloud API:

create network rule "EXTERNAL_ACCESS"."NETWORK_RULES"."EGRESS__TABLEAU_API"
  type = HOST_PORT
  mode = EGRESS
  value_list = ('?????') -- Populate with host of the target Tableau Server/Cloud instance, such as tableau.company.com for a server or 10ax.online.tableau.com for Tableau Cloud
  comment = 'Allows traffic to be sent to the API for the Tableau Server deployment'
;

How To Set up an External Access Integration

In short, external access integrations are account-level objects that act as the bridge between egress network rules and any other Snowflake objects (stored procedures/UDFs).

Here is a simple example for creating an external access integration that leverages an outbound/egress network rule:

create external access integration "MY_EXTERNAL_ACCESS_INTEGRATION"
  allowed_network_rules = ("MY_DATABASE"."MY_SCHEMA"."MY_OUTBOUND_NETWORK_RULE")
  enabled = FALSE
  comment = 'External access integration to support traffic sent to the <endpoint(s) enabled by the network rule(s)>'
;

How To Attach Optional Security Components – Secrets for a Tableau API

Depending on your required target, you may have a need for additional security components to authenticate your requests with the destination. For example, the destination may expect authentication via a username/password combination that has been converted into a bearer token, or it may leverage OAuth and expect a combination of client ID and secret ID. Of course, these are just two examples and there are other possibilities for more complex use cases.

In the case of a Tableau API, a peronal access token (PAT) is the simplest way to authenticate. To support this, two secrets are required in Snowflake, one to store the name of the peronal access token and one to store the value. This is required as PAT values take a form that is not accepted by a standard password-type secret in Snowflake, and thus a pair of generic-string-type secrets are used instead.

My personal recommendation is to create your external access integration before creating any additional security components, then apply them using alter commands.

To start, create the external access integration, using the network rule that was defined in the specific example earlier in this article. As the integration is still a work in progress, set “enabled” to FALSE.

create external access integration "EAI__TABLEAU_API"
  allowed_network_rules = ("EXTERNAL_ACCESS"."NETWORK_RULES"."EGRESS__TABLEAU_API")
  enabled = FALSE
  comment = 'External access integration to support traffic sent to the API for the Tableau Server deployment'
;

Required Permissions in Tableau Server/Cloud

Before creating the personal access token (PAT) as detailed in the next step, a service principal user should be created in Tableau Server/Cloud with the appropriate access to any target published workbooks/datasources. Specifically, for any user to be able to refresh a data extract in a published workbook/datasource in Tableau Server/Cloud, that user must have one of the following levels of access:

  • Ownership of the published workbook/datasource itself.
  • Project Leader access for the project that contains the published workbook/datasource.
  • Site Administrator access, which grants full control over the entire Tableau Server/Cloud site.

How To Create a Personal Access Token (PAT) in Tableau Server/Cloud

Creating a PAT in Tableau Server/Cloud can be achieved in a handful of clicks:

  1. After logging in to your Tableau Server/Cloud site, select your profile icon in the upper right corner and select “My Account Settings.”
  2. Scroll down to the “Personal Access Tokens” area, enter a name for your new token and select “Create new token:”

  3. A pop-up will appear allowing you to copy the new access token to your clipboard. This is only available once, so be sure to leverage the value straight away, either by inserting it directly into the destination by creating the secret in Snowflake, or by noting it down temporarily in another secure location whilst you work:

  4. When complete, your token name should then be listed in the “Personal Access Tokens” area. This can be revoked if required:

How to Configure Secrets To Contain a Generic Strings

Now that the personal access token (PAT) has been created, we can place the credentials inside secrets in Snowflake.

-- Secret for token name
create secret "EXTERNAL_ACCESS"."SECRETS"."CREDS__TABLEAU_API__ACCESS_TOKEN_NAME"
  type = GENERIC_STRING
  secret_string = 'my-pat-token'
  comment = 'Name of the Personal Access Token (PAT) for the Tableau Server API'
;

-- Secret for token value
create secret "EXTERNAL_ACCESS"."SECRETS"."CREDS__TABLEAU_API__ACCESS_TOKEN_VALUE"
  type = GENERIC_STRING
  secret_string = '?????' -- Enter the value of the token here
  comment = 'Personal Access Token (PAT) for the Tableau Server API'
;

Attach the Optional Security Components to the External Access Integration

Now that the secrets have been configured, attaching them to the external access integration is achieved with an alter statement. This is also a good opportunity to enable the integration.

alter external access integration "EAI__TABLEAU_API"
  set allowed_authentication_secrets = (
        "EXTERNAL_ACCESS"."SECRETS"."CREDS__TABLEAU_API__ACCESS_TOKEN_NAME"
      , "EXTERNAL_ACCESS"."SECRETS"."CREDS__TABLEAU_API__ACCESS_TOKEN_VALUE"
    )
    , enabled = TRUE
;

How To Trigger a Resource Extract Refresh in Tableau Server/Cloud

Now that we have discussed and configured the underlying architecture behind our process, we can move on to our true goal, which is to create a stored procedure in Snowflake that can automatically trigger an extract refresh for a published workbook/datasource in Tableau Server/Cloud.

Creating a Mapping Table for Resources and Their Backend IDs

Whilst this step is not necessary and you could develop the process without it, leveraging a mapping table can make the process much simpler. The idea here is to create a table that maps combinations of resource type (workbook or datasource), site ID, project name and resource name to backend resource IDs. That way, if you have resources that you wish to refresh under different scenarios, you can have a single stored procedure that determines the target resource based on a set of inputs. The benefit of the mapping table specifically is that the stored procedure can then convert the inputs into a backend resource ID, removing the effort required for the end user to retrieve the resource ID each time.

Here is some example code for creating such a mapping table:

Populating the Mapping Table

Populating the mapping table is relatively simple once you know the process, but can be confusing at first. The main component we need is the resource ID, which is stored in Tableau’s backend and not exposed in any of the friendly parts of the user interface. Instead, the resource must be obtained from the backend.

The backend ID for a resource can change for a number of reasons, so it is vital to keep this table updated.

The following options are available for identifying the backend ID for a resource:

  1. Leverage the GraphiQL interface to directly retrieve the resource based on an ID that is visible from the URL in the UI.
    • This is the option I would recommend for smaller/simpler scenarios where it is not difficult to maintain the table.
  2. Leverage the REST API, either directly or through a library such as the tableauserverclient for Python, to execute a series of queries to identify the resource.
    • I would not recommend this option as you may struggle with duplicates multiple resources exist with the same name in different projects, especially if the projects are children that also have the same name but sit inside different parent projects.
  3. Query the backend postgreSQL repository directly to identify the IDs.
    • Not an option for Tableau Cloud as the postgreSQL repository is not exposed.
    • I would not recommend this option as it is more complex than step option (1).
  4. Develop an ELT process to replicate the backend postgreSQL repository into Snowflake regularly, then remove the mapping table and modify the process to leverage the actual backend tables.
    • Not an option for Tableau Cloud as the postgreSQL repository is not exposed.
    • This is the option I would recommend for larger Tableau Server scenarios as you can ensure you are always using the most recent ID for each resource. It requires more development time for initial setup, but will then automatically update as needed without further manual intervention.
  5. Develop an ELT process to regularly query the GraphiQL backend regularly.
    • This is similar to option (4) but can work on both Tableau Server and Tableau Cloud.
    • This is the option I would recommend for larger Tableau Cloud scenarios.

To keep things simple for this article, only the first option is covered. Fortunately, it is a quick and simple one to use.

Enabling Metadata Services for Tableau Server

Don’t worry, you do not need to have purchased Tableau’s Data Management add-on for this to work, as that add-on enables the frontend support for the metadata services. The backend is already there behind the scenes.

If you are using Tableau Cloud then this step can be skipped as the metadata services will already be enabled.

For Tableau Server, you will need one of your administrators to access the backend of the Tableau Server and execute this command to enable the services:

tsm maintenance metadata-services enable

This will require a restart of your Tableau Server.

Retrieve the Backend ID for a Published Workbook or Datasource Using the GraphiQL Interface

To retrieve the backend ID for a published workbook/datasource, start by navigating to it in the Tableau Server/Cloud user interface and looking at the URL.

Depending on if it is a workbook or a datasource, it should be in one of the following formats:

https://<server_url>/#/site/<site_id>/workbooks/<vizportal_url_id>/views
https://<server_url>/#/site/<site_id>/datasources/<vizportal_url_id>/askData

For example, the URL could be the one of the following:

https://tableau.company.com/#/site/MySite/workbooks/12345/views
https://tableau.company.com/#/site/MySite/datasources/6789/askData

The important detail we are looking for here is that “vizportal_url_id” near the end of the URL. This is the unique identifier for the resource in Tableau’s frontend.

The next step is to use this to retrieve the backend ID  by leveraging the metadata services through the GraphiQL interface. This interface can be accessed using the following URL:

https://<instance_url>/metadata/graphiql/

For example, this URL could be the following:

https://tableau.company.com/metadata/graphiql/

Once you are within this area, the following query can be used to retrieve the backend ID, referred to as the “luid,” for a workbook:

{
  workbooks(filter:{vizportalUrlId: "<viz_portal_url_id>"}) {
    luid
    vizportalUrlId
    name
    projectName
  }
}

Similarly, the following query can be used to retrieve the backend ID, referred to as the “luid”, for a datasource:

{
  publishedDatasources(filter:{vizportalUrlId: "<viz_portal_url_id>"}) {
    luid
    vizportalUrlId
    name
    projectName
  }
}

The following screenshot demonstrates an example of this for a workbook:

From this output, the “luid” is the backend ID that is required when populating mapping table. This is the unique ID for the resource that is leveraged by the Tableau Server/Cloud API.

Creating a Stored Procedure To Trigger a Resource Refresh

Finally, we come to the main meat of the article. First, I will provide the full code for the stored procedure. I will then explain some key areas.

This article does not explain Python stored procedures in general. If you wish to know more about these, the following article is recommended: A Definitive Guide to Python Stored Procedures in the Snowflake UI. This is part of our Snowflake with Python series.

The full procedure definition is as follows:

-- Create stored procedure
create or replace procedure "EXTERNAL_ACCESS"."PROCEDURES"."API__TABLEAU__REFRESH_RESOURCE"(
      RESOURCE_TYPE varchar
    , RESOURCE_NAME varchar
    , PROJECT_NAME varchar
    , SITE_ID varchar
    , CONNECTION_STRING varchar default '?????' -- Populate with host of the target Tableau Server/Cloud instance, such as tableau.company.com for a server or 10ax.online.tableau.com for Tableau Cloud
  )
  copy grants
  returns varchar
  language python
  runtime_version = 3.10
  handler = 'main'
  external_access_integrations = ("EAI__TABLEAU_API")
  secrets = (
        'pat_name' = "EXTERNAL_ACCESS"."SECRETS"."CREDS__TABLEAU_API__ACCESS_TOKEN_NAME"
      , 'pat_value' = "EXTERNAL_ACCESS"."SECRETS"."CREDS__TABLEAU_API__ACCESS_TOKEN_VALUE"
    )
  packages = ('snowflake-snowpark-python', 'tableauserverclient')
as
$$

## Import module for inbound Snowflake session
from snowflake.snowpark import Session as snowparkSession
from snowflake.snowpark.functions import col, lit

## Imports
import _snowflake
import tableauserverclient as TSC

## Define main function
## Put at top of SProc to avoid Snowflake linting bug
## if converted to a Snowpark for Python worksheet
def main(
      snowpark_session: snowparkSession
    , resource_type: str
    , resource_name: str
    , project_name: str
    , site_id: str
    , connection_string: str
  ):

  ### Validate inputs
  if resource_type is None \
    or len(resource_type) == 0 \
    or resource_type not in ["workbook", "datasource"] \
    or resource_name is None \
    or len(resource_name) == 0 \
    or project_name is None \
    or len(project_name) == 0 \
    or site_id is None \
    or len(site_id) == 0 \
      :
    return "Missing inputs. Must provide all of resource_type, resource_name, project_name, site_id. Also, resource_type must be either 'workbook' or 'datasource'"
  
  ### Retrieve the resource id
  resource_id = retrieve_resource_id(
      snowpark_session = snowpark_session
    , resource_type = resource_type
    , resource_name = resource_name
    , project_name = project_name
    , site_id = site_id
  )
  
  if resource_id is None:
    return "Resource not in mapping table"

  ### Trigger the refresh of the resource
  try:
    trigger_resource_refresh(
        resource_type = resource_type
      , resource_id = resource_id
      , site_id = site_id
      , connection_string = connection_string
    )
    return "Resource refresh complete"
  except Exception as e:
    return f"Error triggering resource refresh:\n\n{str(e)}"

### Define the function to retrieve the resource ID
def retrieve_resource_id(
      snowpark_session: snowparkSession
    , resource_type: str
    , resource_name: str
    , project_name: str
    , site_id: str
  ):

  ### Query the RESOURCES table, filtered by other inputs
  sf_df_resource_id = snowpark_session.table('"EXTERNAL_ACCESS"."MAPPING_TABLES"."API__TABLEAU__RESOURCES"') \
    .filter(
        (col('"RESOURCE_TYPE"') == lit(resource_type))
      & (col('"RESOURCE_NAME"') == lit(resource_name))
      & (col('"PROJECT_NAME"') == lit(project_name))
      & (col('"SITE_ID"') == lit(site_id))
    ) \
    .select(col('"RESOURCE_ID"')) \
    .collect()

  ### If a value is found, return it.
  ### Otherwise, return None
  if sf_df_resource_id is not None \
    and len(sf_df_resource_id) > 0 \
      :
    resource_id = sf_df_resource_id[0]["RESOURCE_ID"]
  else:
    resource_id = None

  return resource_id

### Define the function to trigger the refresh of a resource
def trigger_resource_refresh(
      resource_type: str
    , resource_id: str
    , site_id: str
    , connection_string: str
  ):
  
  ### Retrieve authentication object
  access_token_name = _snowflake.get_generic_secret_string("pat_name")
  access_token_value = _snowflake.get_generic_secret_string("pat_value")
  tableau_auth = TSC.PersonalAccessTokenAuth(
      access_token_name
    , access_token_value
    , site_id = site_id
  )
  
  ### Retrieve Tableau server object
  tableau_server_url = f"https://{connection_string}"
  tableau_server = TSC.Server(tableau_server_url, use_server_version=True)

  ### Execute the refresh statement
  with tableau_server.auth.sign_in(tableau_auth):
    if resource_type == "workbook":
      # Get the workbook by its Id to make sure it exists
      resource = tableau_server.workbooks.get_by_id(resource_id)

      # trigger the refresh, you'll get a job id back which can be used to poll for when the refresh is done
      job_id = tableau_server.workbooks.refresh(resource)
    else:
      # Get the datasource by its Id to make sure it exists
      resource = tableau_server.datasources.get_by_id(resource_id)

      # trigger the refresh, you'll get a job id back which can be used to poll for when the refresh is done
      job_id = tableau_server.datasources.refresh(resource)
    
    # `wait_for_job` will throw if the job isn't executed successfully
    job = tableau_server.jobs.wait_for_job(job_id)

  return

$$
;

-- Test query
call "EXTERNAL_ACCESS"."PROCEDURES"."API__TABLEAU__REFRESH_RESOURCE"(
    RESOURCE_TYPE => 'datasource'
  , RESOURCE_NAME => 'My datasource'
  , PROJECT_NAME => 'My Project'
  , SITE_ID => 'MySite'
)
;

-- Another test query
call "EXTERNAL_ACCESS"."PROCEDURES"."API__TABLEAU__REFRESH_RESOURCE"(
    RESOURCE_TYPE => 'workbook'
  , RESOURCE_NAME => 'My workbook with embedded datasource(s)'
  , PROJECT_NAME => 'My Project'
  , SITE_ID => 'MySite'
)
;

The following lines are important here:

  • Lines 14-18 enable the stored procedure to leverage the external access integration and accompanying secrets that facilitate authentication to Tableau.
  • Line 19 ensures that the “tableauserverclient” Python package is available, which is then imported on line 29 and leveraged throughout the procedure. Fortunately, this is simple to ingest as it is already included in Anaconda’s Snowflake channel, which Snowflake leverages when executing Python.
  • Lines 120-126 leverage the secrets to generate an authentication object for “tableauserverclient.” That authentication object is then leveraged on line 133 to authenticate to the Tableau Server/Cloud.
  • Lines 81-109 define a function that leverages Snowpark for Python to query the mapping table and retrieve the backend resource ID. This is leveraged on lines 57-63.
  • Lines 112-150 define a function triggers the resource refresh
  • Lines 156-161 can be used to test the process for a datasource
  • Lines 165-170 can be used to test the process for a workbook

This stored procedure provides the main structure to achieve our goals, but only uses some basic error handling. Depending on your purposes, it may be worth expanding this when deploying it to your own environment.

Testing the Stored Procedure

A successful test of the stored procedure should yield a simple “Resource refresh complete” response, which has been configured within the stored procedure:

Once executed, you should see the last refresh timestamp update for the extract in Tableau Server/Cloud:

Wrap Up

Hopefully the information in this article provides you with all of the tools you need to programatically trigger datasource extract refreshes in Tableau. This stored procedure can now be made available to other users using Snowflake’s Role-Based Access Control (RBAC) and could be useful in multiple scenarios, including:

  • Refreshing extracts at the end of scheduled pipelines
  • Refreshing extracts as the result of a Snowflake Alert
  • Refreshing extracts as part of a process that executes when new data is detected in a stream

The post Snowflake External Access: Trigger Tableau Datasource Refreshes appeared first on InterWorks.

]]>
Snowflake External Access: Send Alerts to Teams https://interworks.com/blog/2024/06/11/snowflake-external-access-send-alerts-to-teams/ Tue, 11 Jun 2024 15:49:56 +0000 https://interworks.com/?p=60097 Recently, Snowflake released external access integrations into general availability. This allows Snowflake admins to create secure tunnels for Snowflake to communicate with the outside world, and can therefore communicate with external APIs to complete all manner of tasks. For example, external access could be used...

The post Snowflake External Access: Send Alerts to Teams appeared first on InterWorks.

]]>

Recently, Snowflake released external access integrations into general availability. This allows Snowflake admins to create secure tunnels for Snowflake to communicate with the outside world, and can therefore communicate with external APIs to complete all manner of tasks. For example, external access could be used for any of the following, among many others:

This specific article focuses on how to communicate with a Teams webhook to send alert messages to a Teams channel, however, this series will include the other examples above as well. With each new article, the entry above will be converted to a link.

How To Enable a Webhook to a Channel in Teams

Before we configure the external access to allow Snowflake to communicate with a Teams webhook, we need to create the webhook itself. To achieve this, we need to take the following steps:

  1. Create a team in Teams, if you haven’t already.
    1. Navigate to the “Teams” pane in Teams and select the plus icon:
    2. Enter a name and description for the team. Then select “Create.” In our example, the team is called “Sandbox:”
  2. Add a channel to the team.
    1. Right click on your team and select “Add channel:”
    2. Enter a name and description for the channel. For our example, the channel name is “Snowflake Alerts:”

  3. Enable an incoming webhook for the channel.
    1. Right click on the channel and select “Manage channel:”

    2. Select “Edit” in the “Connectors” area:
    3. Find “Incoming Webhook” and select “Add.” You may need to use the search bar on the left of the page:
    4. Select “Add” again on the application:
    5. Give your webhook a name, upload an image if you wish and select “Create.” The image here is the profile picture that will be used by the webhook when posting messages to the channel:
    6. In the final screen, copy the webhook URL. This takes the following form, and comprises of the host for the account, an identifier for the channel, and some authentication materials:
      https://?????.webhook.office.com/webhookb2/???????@???????/IncomingWebhook/???????/???????

How Does External Access in Snowflake Work?

Before we dive into the specific example, let’s cover the architecture involved for external access in Snowflake. There are four main components:

  1. A network rule that controls where traffic is allowed to travel to.
  2. An external access integration that allows Snowflake objects to leverage the network rule.
  3. An optional security component to faciliate authentication. This usually leverages a Snowflake Secret object that can either securely store information such as a username and password, or make use of a related Security Integration object that facilitates OAuth authentication with an API endpoint.
  4. A Snowflake object such as a stored procedure or a user defined function (UDF) that leverages the external access integration.

This is demonstrated in the following diagram:

How to Configure an Outbound Network Rule

In short, network rules are database objects that store information on a type of network traffic for Snowflake. This traffic can be one of two categories:

  • Ingress – Traffic from outside of Snowflake that is coming in to Snowflake.
  • Egress – Traffic that is leaving Snowflake and travelling outside.

This article specifically focuses on the egress category, and how we can leverage this to enable Snowflake to communicate with endpoints outside of Snowflake’s network. Note that in Snowflake’s terminology, the category (ingress/egress) is referred to as the “mode,” and this terminology will be used for the rest of this article.

Example Network Rule for External Access

Here is a simple example for creating an outbound/egress network rule:

create network rule "MY_OUTBOUND_NETWORK_RULE"
  type = HOST_PORT -- Always use this value for external access integrations
  mode = EGRESS -- Always use this value for external access integrations
  value_list = ('?????') -- Populate with host(s) of the target endpoint(s), such as matillion.customer.com, host.online.tableau.com, customer.webhook.office.com, eu1.api.matillion.com, etc
  comment = 'Allows traffic to be sent to the host(s) in the value_list'
;

This template can be used to configure the network rule for any external access.

It is important to note that Snowflake stores network rules as database objects, meaning they sit inside schemas in databases. This is different to network policies, which are account-level objects. This is easy to get confused, which is why it has been mentioned here.

Specific Network Rule Example

The following network rule could be leveraged to enable outbound/egress traffic with a set of Teams webhooks within a target Teams account:

create network rule "EXTERNAL_ACCESS"."NETWORK_RULES"."EGRESS__TEAMS_WEBHOOKS"
  type = HOST_PORT
  mode = EGRESS
  value_list = ('?????.webhook.office.com') -- Populate with webhook host of the target Teams account, such as customer.webhook.office.com
  comment = 'Allows traffic to be sent to webhooks in the target Teams environment'
;

How To Set up an External Access Integration

In short, external access integrations are account-level objects that act as the bridge between egress network rules and any other Snowflake objects (stored procedures/UDFs).

Here is a simple example for creating an external access integration that leverages an outbound/egress network rule:

create external access integration "MY_EXTERNAL_ACCESS_INTEGRATION"
  allowed_network_rules = ("MY_DATABASE"."MY_SCHEMA"."MY_OUTBOUND_NETWORK_RULE")
  enabled = FALSE
  comment = 'External access integration to support traffic sent to the <endpoint(s) enabled by the network rule(s)>'
;

Depending on your required target, you may have a need for additional security components to authenticate your requests with the destination. For example, the destination may expect authentication via a username/password combination that has been converted into a bearer token, or it may leverage OAuth and expect a combination of client ID and secret ID. Of course, these are just two examples and there are other possibilities for more complex use cases.

My personal recommendation is to create your external access integration before creating any additional security components, then apply them using alter commands.

In the case of Teams webhooks, no security components are required since the authentication is includeded as part of the webhook URL itself. It is therefore safe to create the integration and enable it in the same statement.

create external access integration "EAI__TEAMS_WEBHOOKS"
  allowed_network_rules = ("EXTERNAL_ACCESS"."NETWORK_RULES"."EGRESS__TEAMS_WEBHOOKS")
  enabled = TRUE
  comment = 'External access integration to support traffic sent to webhooks in the target Teams environment'
;

How To Send Snowflake Alerts to a Teams Channel

Now that we have discussed and configured the underlying architecture behind our process, we can move on to our true goal, which is to create a stored procedure in Snowflake that can automatically send alerts to a Teams channel.

Creating a Mapping Table for Teams Channels and Webhooks

Whilst this step is not necessary and you could develop the process without it, leveraging a mapping table can make the process much simpler. The idea here is to create a table that maps channel names to webhook URLs. That way, if you have multiple channels and webhooks that you might use for different scenarios, you can have a single stored procedure that leverages the target channel name as an input. The benefit of the mapping table specifically is that the stored procedure can then convert the channel name into a webhook URL, both avoiding the webhook URL being exposed to end users and making it easier for them to send messages to their desired channel.

Here is some example code for creating such a mapping table:

create or replace table "EXTERNAL_ACCESS"."MAPPING_TABLES"."API__TEAMS__ENDPOINTS" (
    "NAME" string comment 'Name of the Teams channel, for reference'
  , "URL" string comment 'Endpoint for the webhook where messages should be delivered'
)
as
select *
from values
    ('Sandbox - Snowflake Alerts', 'https://?????.webhook.office.com/webhookb2/???????@???????/IncomingWebhook/???????/???????')
;

Understanding Adaptive Cards

Teams webhooks (and the wider Teams API) use Adaptive Cards as the main mechanism for sending messages. To make things easier, the stored procedure provided below already converts the inputs into an Adaptive Card body and sends it to the webhook.

If you wish to learn more about these and create your own input formats, take a look at the documentation, including the schema explorer and the samples.

Creating a Stored Procedure To Send a Message to a Teams Channel

Finally, we come to the main meat of the article. First, I will provide the full code for the stored procedure. I will then explain some key areas.

This article does not explain Python stored procedures in general. If you wish to know more about these, the following article is recommended: A Definitive Guide to Python Stored Procedures in the Snowflake UI. This is part of our Snowflake with Python series.

The full procedure definition is as follows:

-- Create stored procedure
create or replace procedure "EXTERNAL_ACCESS"."PROCEDURES"."API__TEAMS__SEND_MESSAGE"(
      PIPELINE varchar
    , DESCRIPTION varchar
    , ENVIRONMENT varchar default 'Snowflake - Main - Production'
    , ENDPOINT_NAME varchar default 'Sandbox - Snowflake Alerts'
  )
  copy grants
  returns string
  language python
  runtime_version = 3.10
  handler = 'main'
  external_access_integrations = ("EAI__TEAMS_WEBHOOKS")
  packages = ('snowflake-snowpark-python','requests')
as
$$

## Snowpark imports
from snowflake.snowpark import Session as snowparkSession
from snowflake.snowpark.functions import col, lit

## Other imports
import requests
import json

## Define main function
def main(
      snowpark_session: snowparkSession
    , pipeline: str
    , description: str
    , environment: str
    , endpoint_name: str
  ):

  ### Retrieve the endpoint url
  endpoint_url = retrieve_endpoint_url(
      snowpark_session = snowpark_session
    , endpoint_name = endpoint_name
  )

  if endpoint_url is None:
    return "Endpoint name not approved"

  ### Retrieve the adaptive card body
  adaptive_card_body = create_adaptive_card_body(
      pipeline = pipeline
    , description = description
    , environment = environment
  )

  ### Send the message to the endpoint
  response_message = send_message_to_endpoint(
      adaptive_card_body = adaptive_card_body
    , endpoint_url = endpoint_url
  )

  return response_message
  

## Define function to retrieve the endpoint URL
## from the ENDPOINTS table
def retrieve_endpoint_url(
      snowpark_session: snowparkSession
    , endpoint_name: str
  ):

  ### Query the ENDPOINTS table, filtered by NAME
  sf_df_endpoint_url = snowpark_session.table('"EXTERNAL_ACCESS"."MAPPING_TABLES"."API__TEAMS__ENDPOINTS"') \
    .filter(col('"NAME"') == lit(endpoint_name)) \
    .select(col('"URL"')) \
    .collect()

  ### If a value is found, return it.
  ### Otherwise, return None
  if sf_df_endpoint_url is not None \
    and len(sf_df_endpoint_url) > 0 \
      :
    endpoint_url = sf_df_endpoint_url[0]["URL"]
  else:
    endpoint_url = None

  return endpoint_url

## Define function to create the adaptive card body
## for the message send to the endpoint
def create_adaptive_card_body(
      pipeline: str
    , description: str
    , environment: str
  ):

  adaptive_card_body = [
            {
              "items": [
                {
                  "size": "large",
                  "text": f"Alert - {environment}",
                  "type": "TextBlock",
                  "weight": "bolder"
                },
                {
                  "text": pipeline,
                  "type": "TextBlock",
                  "weight": "bolder",
                  "wrap": True
                },
                {
                  "text": description,
                  "type": "TextBlock",
                  "wrap": True
                }
              ],
              "type": "Container"
            }
          ]
  
  return adaptive_card_body
  
## Define function to send the message to the endpoint
def send_message_to_endpoint(
      adaptive_card_body: list
    , endpoint_url: str
  ):
  
  json_body = {
    "attachments": [
      {
        "content": {
          "$schema": "http://adaptivecards.io/schemas/adaptive-card.json",
          "body": adaptive_card_body,
          "type": "AdaptiveCard",
          "msteams": {
            "width": "Full"
          },
          "version": "1.2"
        },
        "contentType": "application/vnd.microsoft.card.adaptive",
        "contentUrl": None
      }
    ],
    "type": "message"
  }

  ### Execute API call
  response = requests.request("POST", endpoint_url, json=json_body)

  try:
    if response.status_code == 200 :
      return "Complete"
    else :
      return f"Incomplete. Response code: {str(response.status_code)}"
  except Exception as e:
    return f"Error encountered:\n\n{str(e)}"

$$
;

-- Test query
call "EXTERNAL_ACCESS"."PROCEDURES"."API__TEAMS__SEND_MESSAGE"(
    PIPELINE => 'TEST PIPELINE'
  , DESCRIPTION => 'A test description'
)
;

The following lines are important here:

  • Line 13 enables the stored procedure to leverage the external access integration that facilitates access to the set of Teams webhooks.
  • Lines 62-82 define a function that leverages Snowpark for Python to query the mapping table and retrieve the webhook URL, using the input endpoint name. This is leveraged on lines 36-39.
  • Lines 86-117 define a function that leverages the inputs to create an adaptive card body. This is leveraged on lines 45-49.
  • Lines 120-153 define a function that sends the adaptive card body to the webhook URL. This includes:8
    • Lines 125-142 place the adaptive card body into a wider json body to be sent to the webhook URL.
    • Line 145 sends the POST request itself to the webhook URL.
    • Lines 147-153 return the response from the request, with some minor error handling.
  • Lines 159-162 can be used to test the process.

Testing the Stored Procedure

A successful test of the stored procedure should yield a simple “Complete” response, which has been configured within the stored procedure:

Once executed, you should see the message appear in your Teams channel.

Leveraging the Stored Procedure Inside a Snowflake Alert

The final step on our journey is to leverage this new stored procedure inside a Snowflake alert. Here is a simple template:

create or replace alert "MY_EXAMPLE_ALERT"
  warehouse = MY_WAREHOUSE
  schedule = 'USING CRON 0 * * * * UTC' -- Hourly
if (
  exists (
    select 1
    from "MY_SOURCE_TABLE"
    where <condition>
  )
)
then
  call "EXTERNAL_ACCESS"."PROCEDURES"."API__TEAMS__SEND_MESSAGE"(
      PIPELINE => 'My Pipeline'
    , DESCRIPTION => '<condition description> has been encountered in MY_SOURCE_TABLE'
  )
;

You may want to take this further by creating another stored procedure in the middle. The alert can execute a wider stored procedure that performs multiple activities, one of which could be to call the stored procedure that sends a message to the Teams channel.

For full details on how alerts work, check out this article: Understanding Snowflake Alerts and Notifications

Wrap Up

Hopefully the information in this article provides you with all of the tools you need to establish your own connections that enable Snowflake to send alerts and other messages to Teams. This stored procedure can now be made available to other users using Snowflake’s Role-Based Access Control (RBAC) and could be useful in multiple scenarios, including:

  • Sending alerts when dynamic table refreshes have failed.
  • Sending alerts when tasks have failed.
  • Sending alerts when other alerts have failed.
  • Sending messages when specific pipelines are starting/finishing.
  • Sending messages to confirm when bespoke activity has taken place.

The post Snowflake External Access: Send Alerts to Teams appeared first on InterWorks.

]]>
Snowflake External Access: Send Alerts to Slack https://interworks.com/blog/2024/06/04/snowflake-external-access-send-alerts-to-slack/ Tue, 04 Jun 2024 14:42:49 +0000 https://interworks.com/?p=59950 Recently, Snowflake released external access integrations into general availability. This allows Snowflake admins to create secure tunnels for Snowflake to communicate with the outside world, and can therefore communicate with external APIs to complete all manner of tasks. For example, external access could be used...

The post Snowflake External Access: Send Alerts to Slack appeared first on InterWorks.

]]>

Recently, Snowflake released external access integrations into general availability. This allows Snowflake admins to create secure tunnels for Snowflake to communicate with the outside world, and can therefore communicate with external APIs to complete all manner of tasks. For example, external access could be used for any of the following, among many others:

This specific article focuses on how to communicate with a Slack webhook to send alert messages to a Slack channel, however this series will include the other examples above as well. With each new article, the entry above will be converted to a link.

How To Enable a Webhook to a Channel in Slack

Before we configure the external access to allow Snowflake to communicate with a Slack webhook, we need to create the webhook itself. Note that this may require a Slack admin for authorisation.

To achieve this, we need to take the following steps:

  1. Create a new Slack API app for your workspace.
    1. Navigate to the Slack API Apps page and select “Create New App:”
      Create New App Slack API
    2. In the next menu, select “From scratch:”
      Create from scratch
    3. Enter a name for the app and select the workspace in which it should be created. Then select “Create App.” In our example, the app is called “Snowflake Alerts:”
      Snowflake Alerts App
  2. Open the new app and configure the display information, including a short description and which icon the app use as a profile picture when posting messages in Slack:
    Configure display info
  3. Enable an incoming webhook for a channel
    1. In the “Incoming Webhooks” pane, select “Request to Add New Webhook:”
      Request to add new webhook
    2. Select the channel where the webhook should send messages to. In our example, the channel is called #data-pipeline-alerts:
      Permission
    3. Confirm that the new webhook has been created, then copy the webhook URL. This will take the following form, and comprises of the host for the Slack webhooks, identifiers for the workspace and channel, and some authentication materials:
      https://hooks.slack.com/services/?????????/???????????/????????????????????????

      Webhook URL

How Does External Access in Snowflake Work?

Before we dive into the specific example, let’s cover the architecture involved for external access in Snowflake. There are four main components:

  1. A network rule that controls where traffic is allowed to travel to
  2. An external access integration that allows Snowflake objects to leverage the network rule
  3. An optional security component to faciliate authentication. This usually leverages a Snowflake Secret object that can either securely store information such as a username and password, or make use of a related Security Integration object that facilitates OAuth authentication with an API endpoint
  4. A Snowflake object such as a stored procedure or a user defined function (UDF) that leverages the external access integration

This is demonstrated in the following diagram:

Snowflake External Access Diagram

How to Configure an Outbound Network Rule

In short, network rules are database objects that store information on a type of network traffic for Snowflake. This traffic can be one of two categories:

  • Ingress – Traffic from outside of Snowflake that is coming in to Snowflake
  • Egress – Traffic that is leaving Snowflake and travelling outside

This article specifically focuses on the egress category, and how we can leverage this to enable Snowflake to communicate with endpoints outside of Snowflake’s network. Note that in Snowflake’s terminology, the category (ingress/egress) is referred to as the “mode,” and this terminology will be used for the rest of this article.

Example Network Rule for External Access

Here is a simple example for creating an outbound/egress network rule:

create network rule "MY_OUTBOUND_NETWORK_RULE"
  type = HOST_PORT -- Always use this value for external access integrations
  mode = EGRESS -- Always use this value for external access integrations
  value_list = ('?????') -- Populate with host(s) of the target endpoint(s), such as matillion.customer.com, host.online.tableau.com, customer.webhook.office.com, eu1.api.matillion.com, etc
  comment = 'Allows traffic to be sent to the host(s) in the value_list'
;

This template can be used to configure the network rule for any external access.

It is important to note that Snowflake stores network rules as database objects, meaning they sit inside schemas in databases. This is different to network policies, which are account-level objects. This is easy to get confused, which is why it has been mentioned here.

Specific Network Rule Example

The following network rule could be leveraged to enable outbound/egress traffic with set of Slack webhooks:

create network rule "EXTERNAL_ACCESS"."NETWORK_RULES"."EGRESS__SLACK_WEBHOOKS"
  type = HOST_PORT
  mode = EGRESS
  value_list = ('hooks.slack.com') -- All Slack webhooks have the same host
  comment = 'Allows traffic to be sent to webhooks in Slack'
;

How To Set up an External Access Integration

In short, external access integrations are account-level objects that act as the bridge between egress network rules and any other Snowflake objects (stored procedures/UDFs).

Here is a simple example for creating an external access integration that leverages an outbound/egress network rule:

create external access integration "MY_EXTERNAL_ACCESS_INTEGRATION"
  allowed_network_rules = ("MY_DATABASE"."MY_SCHEMA"."MY_OUTBOUND_NETWORK_RULE")
  enabled = FALSE
  comment = 'External access integration to support traffic sent to the <endpoint(s) enabled by the network rule(s)>'
;

Depending on your required target, you may have a need for additional security components to authenticate your requests with the destination. For example, the destination may expect authentication via a username/password combination that has been converted into a bearer token, or it may leverage OAuth and expect a combination of client ID and secret ID. Of course, these are just two examples and there are other possibilities for more complex use cases.

My personal recommendation is to create your external access integration before creating any additional security components, then apply them using alter commands.

In the case of Slack webhooks, no security components are required since the authentication is included as part of the webhook URL itself. It is therefore safe to create the integration and enable it in the same statement:

create external access integration "EAI__SLACK_WEBHOOKS"
  allowed_network_rules = ("EXTERNAL_ACCESS"."NETWORK_RULES"."EGRESS__SLACK_WEBHOOKS")
  enabled = TRUE
  comment = 'External access integration to support traffic sent to webhooks in Slack'
;

How To Send Snowflake Alerts to a Slack Channel

Now that we have discussed and configured the underlying architecture behind our process, we can move on to our true goal, which is to create a stored procedure in Snowflake that can automatically send alerts to a Slack channel.

Creating a Mapping Table for Slack Channels and Webhooks

Whilst this step is not necessary and you could develop the process without it, leveraging a mapping table can make the process much simpler. The idea here is to create a table that maps channel names to webhook URLs. That way, if you have multiple channels and webhooks that you might use for different scenarios, you can have a single stored procedure that leverages the target channel name as an input. The benefit of the mapping table specifically is that the stored procedure can then convert the channel name into a webhook URL, both avoiding the webhook URL being exposed to end users and making it easier for them to send messages to their desired channel.

Here is some example code for creating such a mapping table:

create or replace table "EXTERNAL_ACCESS"."MAPPING_TABLES"."API__SLACK__ENDPOINTS" (
    "NAME" string comment 'Name of the Slack channel, for reference'
  , "URL" string comment 'Endpoint for the webhook where messages should be delivered'
)
as
select *
from values
    ('data-pipeline-alerts', 'https://hooks.slack.com/services/?????????/???????????/????????????????????????')
;

Understanding Adaptive Cards

Slack webhooks (and the wider Slack API) use Block Kits as the main mechanism for sending messages. To make things easier, the stored procedure provided below already converts the inputs into a blocks array and sends it to the webhook.

If you wish to learn more about these and create your own input formats, take a look at the documentation, including the reference and the builder.

Creating a Stored Procedure To Send a Message to a Slack Channel

Finally, we come to the main meat of the article. First, I will provide the full code for the stored procedure. I will then explain some key areas.

This article does not explain Python stored procedures in general. If you wish to know more about these, the following article is recommended: A Definitive Guide to Python Stored Procedures in the Snowflake UI. This is part of our Snowflake with Python series.

The full procedure definition is as follows:

-- Create stored procedure
create or replace procedure "EXTERNAL_ACCESS"."PROCEDURES"."API__SLACK__SEND_MESSAGE"(
      PIPELINE varchar
    , DESCRIPTION varchar
    , ENVIRONMENT varchar default 'Snowflake - Main - Production'
    , ENDPOINT_NAME varchar default 'data-pipeline-alerts'
  )
  copy grants
  returns string
  language python
  runtime_version = 3.10
  handler = 'main'
  external_access_integrations = ("EAI__SLACK_WEBHOOKS")
  packages = ('snowflake-snowpark-python','requests')
as
$$

## Snowpark imports
from snowflake.snowpark import Session as snowparkSession
from snowflake.snowpark.functions import col, lit

## Other imports
import requests
import json

## Define main function
def main(
      snowpark_session: snowparkSession
    , pipeline: str
    , description: str
    , environment: str
    , endpoint_name: str
  ):

  ### Retrieve the endpoint url
  endpoint_url = retrieve_endpoint_url(
      snowpark_session = snowpark_session
    , endpoint_name = endpoint_name
  )

  if endpoint_url is None:
    return "Endpoint name not approved"

  ### Retrieve message body
  message_body = create_message_body(
      pipeline = pipeline
    , description = description
    , environment = environment
  )

  ### Send the message to the endpoint
  response_message = send_message_to_endpoint(
      message_body = message_body
    , endpoint_url = endpoint_url
  )

  return response_message
  

## Define function to retrieve the endpoint URL
## from the ENDPOINTS table
def retrieve_endpoint_url(
      snowpark_session: snowparkSession
    , endpoint_name: str
  ):

  ### Query the ENDPOINTS table, filtered by NAME
  sf_df_endpoint_url = snowpark_session.table('"EXTERNAL_ACCESS"."MAPPING_TABLES"."API__SLACK__ENDPOINTS"') \
    .filter(col('"NAME"') == lit(endpoint_name)) \
    .select(col('"URL"')) \
    .collect()

  ### If a value is found, return it.
  ### Otherwise, return None
  if sf_df_endpoint_url is not None \
    and len(sf_df_endpoint_url) > 0 \
      :
    endpoint_url = sf_df_endpoint_url[0]["URL"]
  else:
    endpoint_url = None

  return endpoint_url

## Define function to create the body
## for the message send to the endpoint
def create_message_body(
      pipeline: str
    , description: str
    , environment: str
  ):

  message_body = {
      "blocks": [
        {
          "type": "header",
          "text": {
            "type": "plain_text",
            "text": f"Alert - {environment}"
          }
        },
        {
          "type": "section",
          "text": {
            "type": "mrkdwn",
            "text": f"*{pipeline}*"
          }
        },
        {
          "type": "section",
          "text": {
            "type": "plain_text",
            "text": description
          }
        }
      ]
    }
  
  return message_body
  
## Define function to send the message to the endpoint
def send_message_to_endpoint(
      message_body: list
    , endpoint_url: str
  ):

  ### Populate headers for the API call
  headers = {
    "Content-Type": "application/json"
  }

  ### Execute API call
  response = requests.request("POST", endpoint_url, headers=headers, json=message_body)

  try:
    if response.status_code == 200 :
      return "Complete"
    else :
      return f"Incomplete. Response code: {str(response.status_code)}"
  except Exception as e:
    return f"Error encountered:\n\n{str(e)}"

$$
;

-- Test query
call "EXTERNAL_ACCESS"."PROCEDURES"."API__SLACK__SEND_MESSAGE"(
    PIPELINE => 'TEST PIPELINE'
  , DESCRIPTION => 'A test description'
)
;

 

The following lines are important here:

  • Line 13 enables the stored procedure to leverage the external access integration that facilitates access to Slack webhooks.
  • Lines 62-82 define a function that leverages Snowpark for Python to query the mapping table and retrieve the webhook URL, using the input endpoint name. This is leveraged on lines 36-39.
  • Lines 86-118 define a function that leverages the inputs to create a message body as a blocks array. This is leveraged on lines 45-49.
  • Lines 121-140 define a function that sends the message body to the webhook URL. This includes:
    • Lines 127-129 determine the headers for the request, simply instructing the webhook to expect JSON in case it defaults to something else.
    • Line 132 sends the POST request itself to the webhook URL.
    • Lines 134-140 return the response from the request, with some minor error handling.
  • Lines 146-149 can be used to test the process.

Testing the Stored Procedure

A successful test of the stored procedure should yield a simple “Complete” response, which has been configured within the stored procedure:

Once executed, you should see the message appear in your Slack channel:

Leveraging the Stored Procedure Inside a Snowflake Alert

The final step on our journey is to leverage this new stored procedure inside a Snowflake alert. Here is a simple template:

create or replace alert "MY_EXAMPLE_ALERT"
  warehouse = MY_WAREHOUSE
  schedule = 'USING CRON 0 * * * * UTC' -- Hourly
if (
  exists (
    select 1
    from "MY_SOURCE_TABLE"
    where <condition>
  )
)
then
  call "EXTERNAL_ACCESS"."PROCEDURES"."API__SLACK__SEND_MESSAGE"(
      PIPELINE => 'My Pipeline'
    , DESCRIPTION => '<condition description> has been encountered in MY_SOURCE_TABLE'
  )
;

You may want to take this further by creating another stored procedure in the middle. The alert can execute a wider stored procedure that performs multiple activities, one of which could be to call the stored procedure that sends a message to the Slack channel.

For full details on how alerts work, check out this article: Understanding Snowflake Alerts and Notifications

Wrap Up

Hopefully the information in this article provides you with all of the tools you need to establish your own connections that enable Snowflake to send alerts and other messages to Slack. This stored procedure can now be made available to other users using Snowflake’s Role-Based Access Control (RBAC) and could be useful in multiple scenarios, including:

  • Sending alerts when dynamic table refreshes have failed.
  • Sending alerts when tasks have failed.
  • Sending alerts when other alerts have failed.
  • Sending messages when specific pipelines are starting/finishing.
  • Sending messages to confirm when bespoke activity has taken place.

The post Snowflake External Access: Send Alerts to Slack appeared first on InterWorks.

]]>
Snowflake External Access: Trigger Matillion ETL Jobs https://interworks.com/blog/2024/05/28/snowflake-external-access-trigger-matillion-etl-jobs/ Tue, 28 May 2024 14:33:44 +0000 https://interworks.com/?p=59857 Snowflake External Access: Trigger Matillion ETL JobsRecently, Snowflake released external access integrations into general availability. This allows Snowflake admins to create secure tunnels for Snowflake to communicate with the outside world, and can therefore communicate with external APIs to complete all manner of tasks. For example, external access could be used...

The post Snowflake External Access: Trigger Matillion ETL Jobs appeared first on InterWorks.

]]>
Snowflake External Access: Trigger Matillion ETL Jobs

Recently, Snowflake released external access integrations into general availability. This allows Snowflake admins to create secure tunnels for Snowflake to communicate with the outside world, and can therefore communicate with external APIs to complete all manner of tasks. For example, external access could be used for any of the following, among many others:

This specific article focuses on how to communicate with a Matillion ETL API to trigger jobs; however, this series will include the other examples above, as well. With each new article, the entry above will be converted to a link.

How Does External Access in Snowflake Work?

Before we dive into the specific example, let’s cover the architecture involved for external access in Snowflake. There are four main components:

  1. A network rule that controls where traffic is allowed to travel to
  2. An external access integration that allows Snowflake objects to leverage the network rule
  3. An optional security component to facilitate authentication. This usually leverages a Snowflake Secret object that can either securely store information such as a username and password or make use of a related Security Integration object that facilitates OAuth authentication with an API endpoint
  4. A Snowflake object such as a stored procedure or a user defined function (UDF) that leverages the external access integration

This is demonstrated in the following diagram:

How to Configure an Outbound Network Rule

In short, network rules are database objects that store information on a type of network traffic for Snowflake. This traffic can be one of two categories:

  • Ingress – Traffic from outside of Snowflake that is coming in to Snowflake
  • Egress – Traffic that is leaving Snowflake and travelling outside

This article specifically focuses on the egress category, and how we can leverage this to enable Snowflake to communicate with endpoints outside of Snowflake’s network. Note that in Snowflake’s terminology, the category (ingress/egress) is referred to as the “mode,” and this terminology will be used for the rest of this article.

Example Network Rule for External Access

Here is a simple example for creating an outbound/egress network rule:

create network rule "MY_OUTBOUND_NETWORK_RULE"
  type = HOST_PORT -- Always use this value for external access integrations
  mode = EGRESS -- Always use this value for external access integrations
  value_list = ('?????') -- Populate with host(s) of the target endpoint(s), such as matillion.customer.com, host.online.tableau.com, customer.webhook.office.com, eu1.api.matillion.com, etc
  comment = 'Allows traffic to be sent to the host(s) in the value_list'
;

This template can be used to configure the network rule for any external access.

It is important to note that Snowflake stores network rules as database objects, meaning they sit inside schemas in databases. This is different to network policies, which are account-level objects. This is easy to get confused, which is why it has been mentioned here.

Specific Network Rule Example

The following network rule could be leveraged to enable outbound/egress traffic with a Matillion ETL API:

create network rule "EXTERNAL_ACCESS"."NETWORK_RULES"."EGRESS__MATILLION_ETL_API"
  type = HOST_PORT
  mode = EGRESS
  value_list = (
      'matillion.company.com' -- Populate with host of the target Matillion instance, such as matillion.company.com
  )
  comment = 'Allows traffic to be sent to the API for the Matillion ETL deployment'
;

How to Set Up an External Access Integration

In short, external access integrations are account-level objects that act as the bridge between egress network rules and any other Snowflake objects (stored procedures/UDFs).

Here is a simple example for creating an external access integration that leverages an outbound/egress network rule:

create external access integration "MY_EXTERNAL_ACCESS_INTEGRATION"
  allowed_network_rules = ("MY_DATABASE"."MY_SCHEMA"."MY_OUTBOUND_NETWORK_RULE")
  enabled = FALSE
  comment = 'External access integration to support traffic sent to the <endpoint(s) enabled by the network rule(s)>'
;

How to Attach Optional Security Components: Secrets for a Matillion ETL API

Depending on your required target, you may have a need for additional security components to authenticate your requests with the destination. For example, the destination may expect authentication via a username/password combination that has been converted into a bearer token, or it may leverage OAuth and expect a combination of client ID and secret ID. Of course, these are just two examples and there are other possibilities for more complex use cases.

In the case of a Matillion ETL API, a secret is required to store the username and password of the Matillion service principal that will be leveraged.

My personal recommendation is to create your external access integration before creating any additional security components, then apply them using alter commands.

To start, create the external access integration, using the network rule that was defined in the specific example earlier in this article. As the integration is still a work in progress, set “enabled” to FALSE.

create external access integration "EAI__MATILLION_ETL_API"
  allowed_network_rules = ("EXTERNAL_ACCESS"."NETWORK_RULES"."EGRESS__MATILLION_ETL_API")
  enabled = FALSE
  comment = 'External access integration to support traffic sent to the API for Matillion ETL'
;

How to Create a Matillion ETL Service Principal for API Requests

Creating a service principal in Matillion ETL is as simple as creating a new user and granting it the relevant access.

  1. First, log into Matillion ETL as an administrator, open the “Admin” tab and select “User Configuration:”
  2. Select the + icon to create a new user and grant it API access. The following example demonstrates a user called “my-api-user” that has been granted API access. It has also been granted full admin access to ensure that the full API functionality can be leveraged:

How to Configure a Secret to Contain a Password

Now that the service principal has been created, we can place the credentials inside a secret in Snowflake.

create secret "EXTERNAL_ACCESS"."SECRETS"."CREDS__MATILLION_ETL_API"
  type = PASSWORD
  username = 'my-api-user'
  password = '<REDACTED>' -- Enter the password here
  comment = 'Credentials for the Matillion ETL API user'
;

Attach the Optional Security Components to the External Access Integration

Now that the secret has been configured, attaching it to the external access integration is achieved with an alter statement. This is also a good opportunity to enable the integration.

alter external access integration "EAI__MATILLION_ETL_API"
  set allowed_authentication_secrets = ("EXTERNAL_ACCESS"."SECRETS"."CREDS__MATILLION_ETL_API")
    , enabled = TRUE
;

How to Trigger Matillion ETL Jobs in a Stored Procedure

Now that we have discussed and configured the underlying architecture behind our process, we can move on to our true goal, which is to create a stored procedure in Snowflake that can trigger a Matillion ETL job. In fact, we now have the tools we need to query any endpoint in the Matillion ETL API.

The specific endpoint we are interested in for this example is the following:

/v1/group/name/{groupName}/project/name/{projectName}/version/name/{versionName}/job/name/{jobName}/run?environmentName={environmentName}

This endpoint can be leveraged to trigger a job execution, however it requires four important inputs, along with an optional fifth input:

  1. The name of the project group
  2. The name of the project within the project group
  3. The name of the version within the project
  4. The name of the job within the project version
  5. (Optional) The name of the environment in which the job should be executed. If none is provided, the default is used.

It is also possible to include values for any scalar or grid variables within your API request.

Locating the Input Variables

Locating the input variables for a job in Matillion ETL is straightforward. If you do not know them already, open the job that you wish to trigger in Matillion ETL and look at the top of the screen where they will all be provided:

Providing Values for Scalar and Grid Variables

The API endpoint for triggering a job in Matillion ETL is a POST endpoint capable of accepting a payload. Here is an example from Matillion’s own documentation:

{
  "scalarVariables": {
      "{variable-name-1}": "{value-1}"
    , "{variable-name-2}": "{value-2}"
  }
  , "gridVariables": {
        "{grid-variable-name}": [
            [
                "{column-value-1}"
              , "{column-value-2}"
              , "{column-value-3}"
            ]
        ]
    }
}

Creating a Stored Procedure to Trigger a Matillion ETL Job

Finally, we come to the main meat of the article. First, I will provide the full code for the stored procedure. I will then explain some key areas.

This article does not explain Python stored procedures in general. If you wish to know more about these, the following article is recommended: A Definitive Guide to Python Stored Procedures in the Snowflake UI. This is part of our Snowflake with Python series.

The full procedure definition is as follows:

-- Create stored procedure
create or replace procedure "EXTERNAL_ACCESS"."PROCEDURES"."API__MATILLION_ETL__TRIGGER_JOB"(
      PROJECT_GROUP_NAME varchar
    , PROJECT_NAME varchar
    , JOB_NAME varchar
    , VERSION_NAME varchar default 'default'
    , ENVIRONMENT_NAME varchar default NULL
    , SCALAR_VARIABLES array default NULL
    , GRID_VARIABLES array default NULL
    , CONNECTION_STRING varchar default '?????' -- Populate with host of the target Matillion instance, such as matillion.company.com
  )
  copy grants
  returns variant
  language python
  runtime_version = 3.10
  handler = 'main'
  external_access_integrations = ("EAI__MATILLION_ETL_API")
  secrets = ('cred' = "EXTERNAL_ACCESS"."SECRETS"."CREDS__MATILLION_ETL_API")
  packages = ('snowflake-snowpark-python','requests', 'furl')
as
$$

## Import module for inbound Snowflake session
from snowflake.snowpark import Session as snowparkSession

## Imports
import _snowflake
import requests
from requests.auth import HTTPBasicAuth
from furl import furl

## Define main function
## Put at top of SProc to avoid Snowflake linting bug
## if converted to a Snowpark for Python worksheet
def main(
      snowpark_session: snowparkSession
    , project_group_name: str
    , project_name: str
    , job_name: str
    , version_name: str
    , environment_name: str
    , scalar_variables: list
    , grid_variables: list
    , connection_string: str
  ):

  ### Validate inputs
  if project_group_name is None \
    or len(project_group_name) == 0 \
    or project_name is None \
    or len(project_name) == 0 \
    or job_name is None \
    or len(job_name) == 0 \
    or version_name is None \
    or len(version_name) == 0 \
      :
    return "Missing inputs. Must provide all of project_group_name, project_name, job_name, version_name"
    
  ### Retrieve authentication token
  username_password_object = _snowflake.get_username_password("cred")
  basic_auth_token = HTTPBasicAuth(username_password_object.username, username_password_object.password)

  ### Generate URL to trigger job
  api_endpoint_url_unsafe = f"https://{connection_string}/rest/v1/group/name/{project_group_name}/project/name/{project_name}/version/name/{version_name}/job/name/{job_name}/run"

  ### Make the URL safe (replacing spaces with %20 etc)
  api_endpoint_url = furl(api_endpoint_url_unsafe)

  ### Populate the parameters for the API call
  api_parameters = {}
  if environment_name is not None \
    and len(environment_name) > 0 \
      :
    api_parameters["environmentName"] = environment_name

  ### Populate the payload for the API call
  api_payload = {}
  if scalar_variables is not None \
    and len(scalar_variables) > 0 \
      :
    api_payload["scalarVariables"] = scalar_variables
  if grid_variables is not None \
    and len(grid_variables) > 0 \
      :
    api_payload["gridVariables"] = grid_variables

  ### Populate headers for the API call
  headers = {
      "Content-Type": "application/json"
  }

  ### Execute API call to trigger job
  response = requests.request("POST", api_endpoint_url, headers=headers, params=api_parameters, json=api_payload, auth=basic_auth_token)

  try:
    if response.status_code == 200 :
      return response.json()
    else :
      return response.json()
  except Exception:
    return response

$$
;

-- Test query
call "EXTERNAL_ACCESS"."PROCEDURES"."API__MATILLION_ETL__TRIGGER_JOB"(
    PROJECT_GROUP_NAME => 'Example Group'
  , PROJECT_NAME => 'Example Project'
  , JOB_NAME => 'Example Job'
)
;

The following lines are important here:

  • Lines 17 and 18 enable the stored procedure to leverage the external access integration and accompanying secret that facilitate authentication to the Matillion ETL API.
  • Lines 60 and 61 are required to leverage the secret to generate an authentication token for the Matillion ETL API. That access token is then leveraged for authorization as part of the request on line 93. This also leverages the HTTPBasicAuth function that is imported on row 29.
  • Line 64 leverages the input variables to establish the full API endpoint that the request is sent to. This is then made URL-safe on row 67 using the furl function that is imported on row 30. This is an important step as names often include whitespace which is not URL-safe.
  • Lines 70-74 establish the parameters that are used in the API endpoint, consisting of the environment name if it was provided as an input variable.
  • Lines 77-85 establish the payload that is sent to the API, consisting of any scalar or grid variable values that may have been provided as input variables.
  • Line 93 sends the POST request itself to the Matillion ETL API, combining all of the elements established so far.
  • Lines 95-101 return the response from the request, with some minor error handling.
  • Lines 107-111 can be used to test the process.

Testing the Stored Procedure

A successful test of the stored procedure should yield a response that includes a message called msg and an id, which would be the ID for the execution that has just been triggered:

Once triggered, you should be able to monitor the execution in the task history area of either the Matillion ETL environment or the Matillion Hub. You can also track the status by sending requests to the relevant API endpoint.

Example task history in Matillion ETL:

Example task history in Matillion Hub:

Wrap Up

Hopefully the information in this article provides you with all of the tools you need to establish your own connections from Snowflake to the Matillion ETL API. This stored procedure can now be made available to other users using Snowflake’s Role-Based Access Control (RBAC) and could be useful in multiple scenarios, including:

  • Triggering a job as part of a scheduled task
  • Triggering a job as part of a wider set of stored procedures
  • Triggering a job when a stream detects new data from a snowpipe

The post Snowflake External Access: Trigger Matillion ETL Jobs appeared first on InterWorks.

]]>