Take Control of your Governance in Snowflake
Using object tags to enforce your compliance and governance policies.
Motivation
One thing we didn’t expect when we started Spyglass was the types of high level policies companies want to define about their data access:
- Reliability — Staging users can only access staging tables, and production users should only be able to access production tables.
- Privacy — Tables or columns classified as PII Level 3 can only be accessed by users with PII Level 3+ classifications.
- Security — Users classified as “external sharing” accounts can only access objects with an “external sharing” classification.
The good news is that Snowflake’s robust RBAC system can support all of these patterns.
The bad news is that it takes a bit of work for us to get there; here’s what we need to do:
- Classify objects with tags.
- Classify users with tags.
- Check for conflicts between classified and actual access patterns.
Snowflake Tags
Snowflake Object Tags are a way to classify objects and users in Snowflake in order to achieve governance goals:
Tags enable data stewards to track sensitive data for compliance, discovery, protection, and resource usage use cases through either a centralized or decentralized data governance management approach.
Object (Data) Classification
First, we need to classify our data and users into categories. There are tools for this out (Amundsen, etc.) or you may have your own tools for doing it.
In our example, let’s create an environment classification to ensure staging users can only access staging data, and production users can only access production data, which will help prevent data contamination.
Step 1: Create a Tag
We create a tag called application_environment
with values for production, staging, and development:
(Note that tags are created within a database schema scope, so we created a “governance” schema for this purpose)
Step 2: Assign Tags to Objects
We can assign tags to databases, schemas, tables, warehouses, and many more Snowflake objects. In this case, let’s imagine we have a “merchants” database with schemas “production” and “staging” that have otherwise identical table structures.
Notice that we don’t need to apply the tag to all tables in the schema, since tags are inherited:
Tag inheritance means that if a tag is applied to a table, the tag also applies to the columns in that table. This behavior is referred to as tag lineage.
To illustrate this point, we can query for the environment tag of a table within our “merchants.production” schema and verify the result is “production”:
User Classification
Next, we need to classify our users. You may already do this through an identity provider (Okta, Azure AD, etc.). For completeness, we’ll do it ourselves in SQL.
Here we use a “service account user” such as an application that accesses tables, but it works the same for a human user like Alice or Bob.
Role Classification (optional)
To manage classification at scale, you’d probably have separate roles for staging versus production users. We’d want to classify those roles as well, but one pattern we see work is to suffix your role name to help distinguish it. For example: payment_services_staging
vs payment_services_production
.
Enforcing Governance
To enforce our policy, we need to understand the difference between:
- Which objects a user has access to;
- Which objects a user should have access to according to the policy.
For simplicity, we’ll just query for schemas and the associated users, tags, and role grants.
In this case, we see that the payment_service user has a production tag, but has been granted access to a staging-tagged schema! Now we know there is an issue and can fix it.
From here, you can add a query like this to your CI/CD pipeline for Terraform or Permifrost and alert when access rules become non-compliant.
Limitations
The above query, while already pretty complex, is naive and operates only at a schema level and doesn’t support hierarchical role grants, which are common.
So if your roles have table-level granularity or you use an access vs. functional style role framework, there is more work to be done!
About Spyglass
Since you’re here, let me tell you what we’ve cooked up at Spyglass. In short, we make Snowflake data access controls easy - or provide an automated and better way to do the above.
If you’ve nodded your head while reading this, reach out at spyglass.software (or demo@spyglass.software) and we’ll show you a product demo to give you a taste of the future of data access management.