Snowflake Masking Policies Demystified

Austin Schwinn — Tue, 21 Nov 2023 16:05:55 +0000

Organizations are constantly collecting and analyzing vast amounts of sensitive information. Protecting this data has become more important than ever. Snowflake’s masking policies are a powerful tool in keeping sensitive information safe while still allowing easy access for those with the proper credentials. This blog will introduce you to the main concepts of masking policies in Snowflake and provide resources for further learning.

Masking policies are considered a column-level security feature in Snowflake. Masking involves transforming or obfuscating specific fields to prevent unauthorized users from seeing sensitive values while still allowing access to other, less sensitive fields. This allows data engineers to balance ease of access with security considerations.

An example for using column-level masking is for a table that includes social security number (SSN) or credit card number (CCN). We may still want to allow analysts to see the accompanying names, cities and other useful fields, while hiding those sensitive fields, like SSN and CCN.

There are several masking policies that can accomplish this, including dynamic data masking, external tokenization, and tag-based masking.

Dynamic Data Masking (DDM)

Dynamic data is the primary masking policy I have seen most users leveraging. It offers a fine-grained mechanism for concealing sensitive information from unauthorized users while allowing them access to non-sensitive data.

Data Definition

DDM is implemented using SQL queries. Users define masking policies that specify which columns to mask, the type of masking to apply and the conditions under which the masking should occur.

Run at Query Run Time

Masking policies are applied at query runtime. The policy is applied to the column at every location where it occurs, like tables and views. Depending on the masking policy conditions, the SQL execution context and role hierarchy, Snowflake query returns plain-text value, a partially masked value or a fully masked value.

Types of Masking

Snowflake offers various masking techniques:

Full masking: Replaces the entire column value with a default mask, such as “Xs” or “***.”
Partial masking: Reveals only a portion of the data, such as displaying only the last four digits of a credit card number.
Pseudonymized masking: Substitutes the original value with a masked value, for example by leveraging the SHA2 function to create a new value. This allows for obfuscation while keeping the same analytical value, as two identical inputs would result in identical masked outputs.

Role-Based Access Control (RBAC)

DDM seamlessly integrates with Snowflake’s RBAC system, allowing you to assign masking policies to specific roles. This ensures that users see different levels of data obfuscation based on their roles and privileges, enhancing data security.

Dynamic data masking in Snowflake provides a flexible, role-based approach to safeguarding sensitive information without compromising data accessibility for authorized users.

Example

Here is an example of a standard masking policy leveraging RBAC to mask the data in different ways:

create or replace masking policy MY_EMAIL_MASKING_POLICY
as (email_address string) returns string
->
  case

    -- Full access:
    -- Access to read the email address
    -- Applies when the querying user has the "EMAIL_ACCESS" role in their session
    -- Example output: my_email_address@my_domain.com
    when is_role_in_session('EMAIL_ACCESS') 
      then email_address
    
    -- Partial masking:
    -- Access to see the email domain but not the individual
    -- Applies when the querying user has the "DOMAIN" role in their session
    -- Example output: *********@my_domain.com
    when is_role_in_session('DOMAIN_ACCESS') 
      then regexp_replace(email_address,'.+\@','*********@')

    -- Pseudonymised masking:
    -- Cannot see the value whilst maintaining analytical value
    -- Applies when the querying user has the "ANALYST" role in their session
    -- Example output: 1dbd59f661d68b90724f21084396b865497173e4d2714f4d91cf05fa5fc5e18d
    when is_role_in_session('ANALYST')
      then sha2(email_address)

    -- Full masking
    -- Cannot see anything
    -- Example output: *********
      else '*********'
  end
;

This policy can then be assigned to a field in a table directly:

alter table MY_TABLE
  alter column EMAIL_ADDRESS
    set masking policy MY_EMAIL_MASKING_POLICY
;

Tag-Based Masking Policies

By combining masking policies with object tagging in Snowflake, tag-based masking policies offer a more scalable approach to data protection. This is accomplished by classifying objects using tags and applying masking rules based on data classification tags.

Data Classification

Before implementing tag-based masking policies, data must be classified. This can be accomplished by classifying columns and tagging them as you create tables. I would recommend at least adding tags based on sensitivity of data in columns if you plan on using the tag-based approach. For the most usability, it helps to tag objects at every level, including columns, tables and schemas. Snowflake’s object tagging documentation is a helpful resource.

Policy Assignment

Once data is classified, administrators can assign masking policies to specific tags. This means that any column or table with a matching tag will automatically have the associated masking policy applied.

Consistency and Efficiency

Tag-based masking help with consistency and efficiency by allowing you to manage policies programmatically with tags. Instead of manually defining masking rules for each column, you know every column with a certain tag has the same masking policy.

Adaptability

As your data evolves, a tag-based approach allows your masking policies to adapt with it. New columns or tables with the same sensitivity classification automatically have the associated masking policy applied. This reduces administrative overhead by streamlining data protection efforts, making it easier to maintain a secure environment.

Example

Here is an example of assigning a masking policy by assigning a tag:

alter tag MY_EMAIL_TAG set policy MY_EMAIL_MASKING_POLICY;

This tag can then be assigned to a field in a table to apply the masking policy:

alter table MY_TABLE
  alter column EMAIL_ADDRESS
    set tag MY_EMAIL_TAG = 'My tag value'
;

In the above example it does not matter what the value of the tag is, as we have not attempted to leverage it in our example masking policy. If desired, we could choose to leverage the SYSTEM$GET_TAG_ON_CURRENT_COLUMN and/or SYSTEM$GET_TAG_ON_CURRENT_TABLE functions within our masking policy to change the behaviour based on the tag value. For example, the following masking policy will only mask email addresses if the tag value is “MASKED”:

create or replace masking policy MY_TAG_BASED_EMAIL_MASKING_POLICY
as (email_address string) returns string
->
  case
    
    -- Full access:
    -- Access to read the email address
    -- Applies when the tag value is not 'MASKED'
    -- Example output: my_email_address@my_domain.com  
    when SYSTEM$GET_TAG_ON_CURRENT_COLUMN('MY_EMAIL_TAG') != 'MASKED'
      then email_address

    -- Full access:
    -- Access to read the email address
    -- Applies when the querying user has the "EMAIL_ACCESS" role in their session
    -- Example output: my_email_address@my_domain.com
    when is_role_in_session('EMAIL_ACCESS') 
      then email_address
    
    -- Partial masking:
    -- Access to see the email domain but not the individual
    -- Applies when the querying user has the "DOMAIN" role in their session
    -- Example output: *********@my_domain.com
    when is_role_in_session('DOMAIN_ACCESS') 
      then regexp_replace(email_address,'.+\@','*********@')

    -- Pseudonymised masking:
    -- Cannot see the value whilst maintaining analytical value
    -- Applies when the querying user has the "ANALYST" role in their session
    -- Example output: 1dbd59f661d68b90724f21084396b865497173e4d2714f4d91cf05fa5fc5e18d
    when is_role_in_session('ANALYST')
      then sha2(email_address)

    -- Full masking
    -- Cannot see anything
    -- Example output: *********
      else '*********'
  end
;

External Tokenization

External tokenization takes data security to the next level by replacing sensitive information with tokens generated and managed by external functions. Tokens are undecipherable values that replace values in columns with sensitive data. This is the most secure option for obfuscating sensitive data as the real values are not available in any table or view.

Pre-Load Tokenization

External Tokenization allows accounts to tokenize data before loading it into Snowflake. By using a tokenization provider, the data is transformed and pre-loaded into Snowflake. The benefit is that without having masking policy in place, users cannot see the real data value.

De-Tokenization at Runtime

The data is de-tokenized at query runtime so that the appropriate can view original values. This is performed by the external functions set in the masking policy. Like other types of masking, the queries may return plain-text value, a partially masked value or a fully masked value. This depends on the masking policy conditions, the SQL execution context and role hierarchy.

Data Governance and Access Management

Masking policies like external tokenization support contextual data access by role or custom entitlements. It supports both centralized and decentralized administration models. For example, dedicated security or privacy officer can decide which columns to protect instead of the object owner. It also allows you to apply one policy to thousands of columns across databases and schemas. And all of this can easily be changed by updating the masking policy, instead of having to update columns individually.

Enhanced Security and Compliance

External tokenization helps organizations comply with strict data protection rules and regulations, as it minimizes the risk of data breaches while preserving the functionality of the data for authorized users.

Example

Here is an example of a masking policy that leverages an external function to detokenize data that is secured with external tokenization:

create or replace masking policy MY_TOKENIZATION_MASKING_POLICY
as (input_value string) returns string
->
  case
    
    -- Full access:
    -- Execute the external function "DETOKENISE_VALUE"
    -- which detokenizes the data.
    -- Applies when the querying user has the "SECURE_ACCESS" role in their session
    when is_role_in_session('SECURE_ACCESS') 
      then DETOKENISE_VALUE(input_value)
    
    -- No access
    -- Return the original input value,
    -- which is still tokenized
      else input_value
  end
;

Encryption

Encryption allows users to safeguard their sensitive values by using a passphrase to encrypt/decrypt values. ENCRYPT is used during data ingestion to load the encrypted into Snowflake. DECRYPT can then be used as a function within a masking policy to decrypt values using the varchar passcode.

There are also ENCRYPT_RAW and DECRYPT_RAW variations of ENCRYPT and DECRYPT that use binary keys and initialized vectors instead of varchar passphrases.

Below is an example of a masking policy that leverages DECRYPT:

create or replace masking policy MY_DECRYPTION_MASKING_POLICY
as (input_value string) returns string
->
  case
    
    -- Full access:
    -- Execute the DECRYPT function
    -- which decrypts the data.
    -- Applies when the querying user has the "SECURE_ACCESS" role in their session
    when is_role_in_session('SECURE_ACCESS') 
      then DECRYPT(input_value, $passphrase)
    
    -- No access
    -- Return the original input value,
    -- which is still encrypted
      else input_value
  end
;

Additional Resources

I hope this has been a helpful introduction to Snowflake’s masking policies. For deeper dives into each of the topics we covered, I have attached links to some helpful resources:

The post Snowflake Masking Policies Demystified appeared first on InterWorks.

Installing dbt with Mac M1 Chips

Austin Schwinn — Wed, 12 Oct 2022 15:43:12 +0000

Have you encountered errors trying to install older versions of dbt on your fancy, new MacBook? This could be related to compatibility issues with the new M1 CPU. Dependency packages written for Intel CPUs can cause issues preventing proper installation. But don’t worry, Apple provides an Intel emulator called Rosetta. Below are the steps that worked for me.

Before we go through our instructions, please note that I am installing dbt 0.20.2 as my example. Your results may vary depending on the version you need. Also be aware that I found this process successful with CLI dbt, but still encounter issues with virtual environments, like “dbtevn.”

With those notes in mind, here are the steps I recommend:

Start by referencing the dbt Install Documentation. dbt includes specific instructions for installation on an M1 Mac. There are 2 important considerations to take note from this documentation:
1. We need to run our commands through Rosetta to emulate an Intel chip.
2. We will also follow the pip install instructions instead of brew as we will have to be careful about our Python version to match the old dbt version’s requirements.
I found this guide to be most helpful in setting up a Rosetta terminal environment. It also shows you how to use that terminal environment directly in vscode. $uname -m is a helpful command to check if you are correctly emulating the Intel architecture.
I followed this guide to set up Python. I specifically followed pyenv instructions as Python versioning is important to the necessary dependencies. I install Python 3.9.7 as that was the most recent version when dbt v0.20.2 was deployed.
Pip install dbt using your Rosetta terminal. Here are the dbt docs for reference.
I had to downgrade the MarkupSafe dependency package.
Don’t forget to set up your profiles.yml file in the .dbt root directory.
Here is one other guide for reference that doesn’t use pyenv

At this point you should be able to run dbt successfully from the CLI. I hope this helped you overcome some of the challenges I ran into.

The post Installing dbt with Mac M1 Chips appeared first on InterWorks.

Austin Schwinn | InterWorks

Snowflake Masking Policies Demystified

Dynamic Data Masking (DDM)

Data Definition

Run at Query Run Time

Types of Masking

Role-Based Access Control (RBAC)

Example

Tag-Based Masking Policies

Data Classification

Policy Assignment

Consistency and Efficiency

Adaptability

Example

External Tokenization

Pre-Load Tokenization

De-Tokenization at Runtime

Data Governance and Access Management

Enhanced Security and Compliance

Example

Encryption

Additional Resources

Installing dbt with Mac M1 Chips