ETL vs. ELT: How data transformation changes in the cloud
September 2, 2021
The old saying “garbage in, garbage out” goes back more than 60 years in computing, but it's as relevant today as ever. ETL helps fix that age-old problem.
ETL stands for Extract/Transform/Load, and it’s essential in data warehousing. It’s the three-step process of extracting data from source databases; transforming raw data into a format for analysis; and loading the reformatted data into the data warehouse.
ETL is vital for data quality because it involves creating a “clean” version of data that has been integrated, joined, deduped, etc. And, significantly for anyone building a cloud data warehouse, ETL is different in the cloud.
With on-premises data warehouses, data transformation typically happens in a staging environment prior to being moved into the warehouse. In the cloud, that often (but not always) happens inside the cloud data warehouse itself, where there is plenty of processing power.
In other words, in the cloud, data transformation is the last step in the three-step process, and ETL becomes ELT. Among the benefits of moving data transformation to the end stage are speed (immediate access to raw data) and efficiency (you transform only data that needs it).
'Reverse ETL,' too
On this week’s episode of the Cloud Database Report podcast, I talk about ETL and ELT with Ciaran Dynes, chief product officer of Matillion, a company that specializes in this process. Matillion’s tools are used with Amazon Redshift, Google Cloud BigQuery, Microsoft Azure Synapse, Snowflake, and other cloud data warehouses. (Other ETL/ELT vendors include Informix, Talend, Tibco, and leading database software companies.)
ETL may sound like deep-in-the-weeds tech, but Ciaran makes it accessible and relevant to everyone who should care, including business managers who utilize data warehouses for reports and analytics. We also talk about “reverse ETL,” when formatted data is redistributed to be used in more ways.
“That’s the game we’re in — connect, combine, then synchronize back out into the operational system so we can take an action with a customer in real time,” says Ciaran.
It's a wide-ranging conversation on data integration, quality, and movement in the cloud.
LISTEN TO THE PODCAST: Data Transformation for Cloud Data Warehouses
The Cloud Database Report Podcast is available on Amazon, Apple, Google, Spotify, and Cloud Wars Live. Next week, I talk with Karthik Ranganathan, co-founder and CTO with Yugabyte. So stay tuned — and thank you!
John Foley, Editor
|