Take Control of your Governance in Snowflake

Using object tags to enforce your compliance and governance policies.

Motivation

One thing we didn’t expect when we started Spyglass was the types of high level policies companies want to define about their data access:

  • Reliability — Staging users can only access staging tables, and production users should only be able to access production tables.
  • Privacy — Tables or columns classified as PII Level 3 can only be accessed by users with PII Level 3+ classifications.
  • Security — Users classified as “external sharing” accounts can only access objects with an “external sharing” classification.

The good news is that Snowflake’s robust RBAC system can support all of these patterns.

The bad news is that it takes a bit of work for us to get there; here’s what we need to do:

  1. Classify objects with tags.
  2. Classify users with tags.
  3. Check for conflicts between classified and actual access patterns.

Snowflake Tags

Snowflake Object Tags are a way to classify objects and users in Snowflake in order to achieve governance goals:

Tags enable data stewards to track sensitive data for compliance, discovery, protection, and resource usage use cases through either a centralized or decentralized data governance management approach.

Object (Data) Classification

First, we need to classify our data and users into categories. There are tools for this out (Amundsen, etc.) or you may have your own tools for doing it.

In our example, let’s create an environment classification to ensure staging users can only access staging data, and production users can only access production data, which will help prevent data contamination.

Step 1: Create a Tag

We create a tag called application_environment with values for production, staging, and development:

CREATE SCHEMA merchants.governance;
CREATE TAG merchants.governance.application_environment
    allowed_values 'production', 'staging', 'development';

(Note that tags are created within a database schema scope, so we created a “governance” schema for this purpose)

Step 2: Assign Tags to Objects

We can assign tags to databases, schemas, tables, warehouses, and many more Snowflake objects. In this case, let’s imagine we have a “merchants” database with schemas “production” and “staging” that have otherwise identical table structures.

ALTER SCHEMA merchants.production
    SET TAG merchants.governance.application_environment = 'production';

ALTER SCHEMA merchants.staging
    SET TAG merchants.governance.application_environment = 'staging';

Notice that we don’t need to apply the tag to all tables in the schema, since tags are inherited:

Tag inheritance means that if a tag is applied to a table, the tag also applies to the columns in that table. This behavior is referred to as tag lineage.

To illustrate this point, we can query for the environment tag of a table within our “merchants.production” schema and verify the result is “production”:

SELECT SYSTEM$GET_TAG(
    'merchants.governance.application_environment',
    'merchants.production.customer_view_1',
    'table'
);
-- Result: production

User Classification

Next, we need to classify our users. You may already do this through an identity provider (Okta, Azure AD, etc.). For completeness, we’ll do it ourselves in SQL.

Here we use a “service account user” such as an application that accesses tables, but it works the same for a human user like Alice or Bob.

ALTER USER payment_service
    SET TAG merchants.governance.application_environment = 'production';
ALTER USER payment_service_staging
    SET TAG merchants.governance.application_environment = 'staging';

Role Classification (optional)

To manage classification at scale, you’d probably have separate roles for staging versus production users. We’d want to classify those roles as well, but one pattern we see work is to suffix your role name to help distinguish it. For example: payment_services_staging vs payment_services_production.

Enforcing Governance

To enforce our policy, we need to understand the difference between:

  1. Which objects a user has access to;
  2. Which objects a user should have access to according to the policy.

For simplicity, we’ll just query for schemas and the associated users, tags, and role grants.

WITH

    -- CTE to find all user privileges for SCHEMAS
    -- based on first degree role assignments
    user_privs (username, privilege, database, schema) AS (
        SELECT
            u.grantee_name AS username,
            r.privilege,
            r.table_catalog AS database,
            r.table_schema AS schema --, u.role, r.granted_on as object_type, r.name as object_name
        FROM
            snowflake.account_usage.grants_to_users AS u
        JOIN
            snowflake.account_usage.grants_to_roles AS r
        ON
            u.role = r.grantee_name
        WHERE
            r.granted_on = 'SCHEMA'
    ),

    -- CTE to find all "application_environment" tags
    -- that are assigned to users and objects
    tag_refs (tag, tag_value, domain, object_database, object_name) AS (
        SELECT
            concat(tag_database, '.', tag_schema, '.', tag_name) AS tag,
            tag_value,
            domain,
            object_database,
            object_name
        FROM
            snowflake.account_usage.tag_references
        WHERE
            tag_database = 'MERCHANTS' AND
            tag_schema = 'GOVERNANCE' AND
            tag_name = 'APPLICATION_ENVIRONMENT'
    )
SELECT
    p.username,
    concat(p.privilege, ' on ', p.database, '.', p.schema) AS schema_grant,
    tu.tag_value as user_tag,
    tp.tag_value as object_tag
FROM user_privs as p

-- Join the tags associated with the user
JOIN tag_refs as tu ON p.username = tu.object_name

-- Join the tags associated with the schema
JOIN tag_refs as tp ON p.database = tp.object_database and p.schema = tp.object_name

-- Only return users that have mismatched user and schema tags
WHERE tu.tag_value != tp.tag_value;

/*
Returns
  username:       PAYMENT_SERVICE
  schema_grant:   USAGE on MERCHANTS.STAGING
  user_tag:       production
  object_tag:     staging
*/

In this case, we see that the payment_service user has a production tag, but has been granted access to a staging-tagged schema! Now we know there is an issue and can fix it.

From here, you can add a query like this to your CI/CD pipeline for Terraform or Permifrost and alert when access rules become non-compliant.

Limitations

The above query, while already pretty complex, is naive and operates only at a schema level and doesn’t support hierarchical role grants, which are common.

So if your roles have table-level granularity or you use an access vs. functional style role framework, there is more work to be done!

About Spyglass

Since you’re here, let me tell you what we’ve cooked up at Spyglass. In short, we make Snowflake data access controls easy - or provide an automated and better way to do the above.

If you’ve nodded your head while reading this, reach out at spyglass.software (or demo@spyglass.software) and we’ll show you a product demo to give you a taste of the future of data access management.

Subscribe to stay informed

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.