Azure Databricks previews parallelized Photon query engine
Microsoft and Databricks say the vectorized query engine written in C++ accelerates Apache Spark workloads by up to 20x
Microsoft has unveiled a preview of a C++-based vectorized query engine for the Azure Databricks cloud analytics and AI service based on Apache Spark. Azure Databricks, which is delivered in partnership with Databricks, introduced the Photon-powered Delta Engine September 22.
Written in C++ and compatible with Spark APIs, Photon is a vectorized query engine that leverages modern CPU architecture and the Delta Lake open source transactional storage layer to enhance Apache Spark 3.0 performance by as much as 20x. Microsoft said that as organizations embrace data-driven decision-making, it is now imperative for them to have a platform that can quickly analyze massive amounts and types of data.
Photon offers greater parallelism of CPU processing at the data and instruction levels. Other components in Delta Engine include an improved query optimizer and a caching layer. The combination of these technologies boosts big data use cases including data engineering, machine learning, data science, and data analytics.
Azure Databricks is intended to allow users to quickly set up optimized Apache Spark environments. It offers native integration with the Azure Active Directory and other Azure cloud services such as Azure Synapse Analytics and Azure Machine Learning, with customers able to build end-to-end data warehouses, machine learning, and real-time analytics solutions. Users can request access to the Photon Preview by filling out a questionnaire.