Last week, your engineer created a Snowflake procedure (PROD_DB.SCHEMA.ADD_CUSTOMER) that inserts a new customer record into the customers table (PROD_DB.SCHEMA.CUSTOMERS).
Notice that these object names are “fully qualified” with specified database, schema, and object name. This engineer prefers explicit and descriptive code. Yesterday, the code in TEST_DB was recreated to match production and this code was executed with test customers. Suddenly we have test data flowing into our production environment because even though the procedure is now in TEST_DB, the copied code explicitly refers to the PROD_DB.SCHEMA.CUSTOMERS table.
For more technical details on how qualified names work, refer to Snowflake’s docs on Object Name Resolution.
Following standard DevOps principles and leveraging modern data tooling like DataOps.Live can help to avoid these occurrences by enforcing proper environment management, parameterizing database names and running automated tests. Still, with Snowflake’s ease of cloning, and a trend of having multiple databases per account, these scenarios seem all too common. Cross database references like this can cause significant issues with data governance, lineage, and coding standards. While these references are sometimes useful and necessary, it’s important to be on the lookout for any that are used improperly.
TL;DR, here is a script that can help you do a quick scan through cross database references in your Snowflake environment. Read on for more details and considerations.
Considerations of Cross Database References
There are several things consider when using them in Snowflake databases:
- Impact on cloning: When an object with fully qualified references is cloned, the new object will still use the original reference even if the newly cloned object is in a different database. This can lead to unintuitive behaviour and messy object dependencies.
- Impact on permissions: The scenario above highlights the importance of having roles with minimally required permissions. Ideally a Test environment role wouldn’t have permission to insert into Production environment tables. If set up correctly, that scenario would simply throw an error. Still not ideal, but better than data going to unexpected places.
- Impact on lineage: Cross database references, even if intentional, can make it more difficult to trace data lineage, which can impact data governance efforts.
- Impact on governance: Using object references across all databases with masking policies or other utilities can be useful for standardization but requires care and consideration to ensure that these are applied consistently. Inconsistent use of standards can sometimes do more harm than good.
- Impact on coding standards: applying standards around using qualified names is always a challenge whether across databases, across schemas, or even within a single schema.
In short, cross database references are useful when used intentionally and consistently, but it’s easy to use them accidently and inconsistently.
So how do you check if this is an issue for you?
Finding Cross Database References
This script, https://github.com/datalytyx/snowflake-audit/blob/main/cross_db_references.sql will quickly check your entire account for foreign keys, masking policies, row access policies, and views that span databases. This covers a large majority of the common sources of cross database issues.
In short, the main thing to look for in the results are anything that you don’t expect. A policy reference from ADMIN_DB that points at multiple other databases is probably normal. You may choose to filter out common expected results. However, seeing a PROD_DB view in this list that queries a DEV_DB table, might mean that a developer wrote code that worked in their development environment without realizing the effect of elevating that code to production, unchanged. It’s certainly worth taking a second look at.
- This version does not include procedures because there is no built-in Snowflake metadata around the content of a procedure. If you do have views show up in this query, it’s certainly worth checking your procedures as well. This can be achieved with decent accuracy by parsing DDLs but is a bigger challenge than I want to tackle in this post.
- These issues can also apply to cross schema references, although in my experience those are less common as a source of trouble. The query filters can be modified to check for schema differences instead of database differences if desired.
Cross database references can be a powerful tool for working with Snowflake databases, but they can also cause large headaches if not used carefully. By understanding their impact on your Snowflake environment, identifying where they are being used, and taking steps to monitor and maintain them, you can ensure that your databases are easier to manage.
If nothing else, try running the query above to see anything unexpected comes up. If you do, you’re already one step closer to understanding your environment, and if you are looking for further help you can of course contact us.
We include this analysis in our Snowflake Health Check where we have reduced operational overhead tenfold. We work with clients all the time to follow best practices and we can help you find and fix issues like this in your Snowflake environment.
Read here to know more about our Snowflake Health Check Services