Ehab Abdelhamid, Christian Bedford, Michael Duller, Gourab Mitra, Jozsef Patvarczki, Niko Tsikoudis
“That’s too good to be true!” is the standard reaction when a prospect understands what Datometry Hyper-Q can do for them. With the latest release, we bring our technology now also to Databricks customers. Today we announced the general availability of Datometry Hyper-Q for Databricks.
Supporting Databricks was the logical next step for Datometry. No other platform has currently the mindshare like Databricks; no other platform has evolved so rapidly. Databricks captivates both data scientists, data engineers, and enterprise IT alike.
Because Databricks is taking on enterprise data warehousing, Hyper-Q can be a critical building block. With Hyper-Q any workload currently depending on Teradata can directly move to Databricks: ETL, BI, analytics, custom development or scripts. You name it.
A powerful framework for interoperability
Hyper-Q makes existing applications written for a legacy system work directly with Databricks. Hyper-Q translates all SQL and API calls in real-time. On the way in, SQL and API calls are translated to Databricks SQL. On the way out, data is formatted to match the legacy format.
The translations applied can be elaborate and, in contrast to those of static code converters, are not limited to one-for-one. Instead, think of an emulator like Apple’s Rosetta that powers its latest series of laptops and makes Intel-based applications run on Apple silicon.
The figure below shows how Hyper-Q is typically deployed. Existing client applications (1) connect directly to Hyper-Q (2) which in turn connects to the Databricks instance (3). Hyper-Q is agnostic to application logic and semantics of its clients. That is to say, any application can work through Hyper-Q.
Because the communication uses TCP/IP network connections (4), client applications can reside anywhere: on-premises or in the cloud, on laptops or on servers. In addition, native Databricks applications (5) communicate directly with Databricks using Spark, MLflow, Databricks SQL, accessing the same data.
Lastly, Unity Catalog (6) is critical for how Hyper-Q operates. Through it, Hyper-Q can emulate the entire data dictionary and catalog of a legacy system. Client applications rely on these catalog queries either explicitly or, in the case of third-party applications, often in the background.
Coverage, Security, and Scalability
While the principle of Hyper-Q seems simple enough, of course, the engineering behind it is where it gets interesting. In this blog we will discuss the three most immediate questions our prospects typically raise. In future blogs, will we deep-dive into other areas of the product too—so stay tuned.
“Ok, can you really do everything that Teradata does?” is probably the most common question we get. In a nutshell, we support all commonly used SQL and APIs. In practice, our customers get a coverage of over 99.7% on average, sometimes up to 99.9% out of the box.
Naturally, your mileage varies. No two workloads are the same. However, many data warehousing workloads are quite similar. Yet, there might be a few workloads that need adjustments. But our customers found what’s not supported are usually corner cases or “exotic” workloads.
Security is the next big question. As you can see from the architecture drawing, Hyper-Q does not own any data at rest. Instead, requests and results are passed through transiently. This makes for a lean resource footprint and eliminates most security concerns right away.
Even authentication is on a pass-through-basis. For every incoming connection from a client application, Hyper-Q opens an ODBC connection to Databricks. Isolation of individual sessions ensures the identity of the original user is preserved and, with it, privileges and the privacy of their data.
Because Hyper-Q was designed with isolation of sessions in mind, it scales extremely well. For most customers, vertical scaling through sizing up the VM on which Hyper-Q runs is sufficient. But if needed, horizontal scaling through adding independent instances of Hyper-Q provides even more power.
Hyper-Q for Databricks In practice: Identical result sets for data validation
So far, we looked Hyper-Q as a building block. Hyper-Q opens up the world of Databricks to existing Teradata applications. Let’s look at the bigger picture though. How does it interact with its environment and how to go about an implementation?
Let’s start with database schemas. Hyper-Q supports not only queries but DDL statements. In enterprise workloads, DDL is a big part of daily processing. Our support for DDL is rather intricate and includes everything from stored procedures and macros to operational constructs like Global Temporary Tables.
We may write in more detail about the conversion of Teradata DDL to Databricks constructs in a later post. For now, suffice it to say, using this capability empowers implementors to migrate entire schemas to Databricks—including stored procedures and other highly system-specific elements.
Once the setup is complete and data is loaded—also through Hyper-Q—Hyper-Q delivers bit-identical results when compared to the legacy platform. It does so by formatting and converting the results returned from Databricks to match the legacy format, conforming even with the strict requirements of our financial services customers.
For enterprise clients, especially in banking, risk management, insurance, retail, and manufacturing this is a huge boon. We empower them to conduct “apples to apples” comparisons effortlessly. This kind of side-by-side testing of Databricks and the legacy system accelerates the validation process tenfold and builds instantly confidence with business users.
How to get started?
Quite a few iconic enterprises use Hyper-Q already as they have replaced their Teradata systems successfully. With today’s release, we bring this capability to Databricks. Modern enterprises can now consolidate their data infrastructure onto their Databricks Lakehouse.
The modular architecture of Hyper-Q lends itself to adding further source systems. So, going forward you can expect support for other legacy database systems besides Teradata. We will keep you posted. In the meantime we will deep-dive into some of the innovation that enables that on this blog.