Esc

Cross-source queries

Join data across Postgres, Snowflake, MongoDB, and more in a single SQL statement, no ETL pipelines required.

What is DataFusion?

Apache DataFusion is a fast, extensible query engine written in Rust. It executes SQL against in-memory Apache Arrow columnar data, with a vectorized, streaming execution model and a pluggable table-provider interface that lets a host application feed it rows from any source. It is a top-level Apache project, used as the engine behind many analytical databases and data tools.

Arris embeds DataFusion to run cross-source queries: a single SQL statement that reads from several of your database connections at once and joins the results locally. DataFusion runs entirely in-process, so there is no cluster to deploy and no external service to call.

How it works

When you run a cross-source query, Arris hands it to the embedded DataFusion engine, which:

Writing a cross-source query

In an editor tab, flip the DataFusion toggle in the run bar. The toggle needs at least two connections configured; once it is on, the connection selector switches to All Connections and the tab can reference tables from any of them. Reference a table with its connection name in front: connection.schema.table.

Table references

A reference is <connection>.<schema>.<table>, or <connection>.<table> for sources without schemas (e.g., for MongoDB, the database name takes the schema slot). The connection name is the name you gave the connection in Arris, and the editor tints each connection's segment in its own color. Autocomplete spans every connection, so typing a connection name and a dot suggests that connection's schemas, then its tables and columns.

The Arris editor with the DataFusion toggle on and All Connections selected, running a cross-source query that joins a Postgres customers table with a MongoDB orders collection, results shown below
With the DataFusion toggle on, one query joins a Postgres table with a MongoDB collection across connections.

Execution plan

A cross-source query runs through DataFusion's physical plan, which Arris surfaces as a live execution graph. Use the Show execution plan toggle () in the results toolbar to swap the grid for a node graph of the plan: each scan, join, aggregate, sort, and the final result, with each node's status updating as the query runs.

The Arris execution plan graph for a cross-source query: two Scan nodes (Postgres customers and MongoDB orders) feeding Sort, Inner Join, and Projection nodes, each showing row count and elapsed time
The execution plan graph shows each scan, sort, join, and projection with its row count and timing.

Limitations

Federation is designed for analytical and exploratory workloads. Keep these limits in mind:

DataFusion SQL dialect
Cross-source queries use DataFusion SQL syntax. Most standard SQL works as expected. Check the DataFusion SQL reference for dialect-specific details on functions, types, and syntax.