Joining data

Last updated:

|Edit this page

The real power of the data warehouse is the ability to combine data from multiple tables in a single query. Joins enable you to do this. They enable you to choose fields that act as connections between PostHog and external sources.

Table joins

You can join external data on existing PostHog schemas and other external data tables. These joins are saved and interpreted anytime they're accessed on the origin table.

To define a join, go to the data warehouse tab, click the three dots next to your source table, and click Add join. Here you define the source table key, joining table, and joining table key as well as how the fields are accessed.

For example, if you import your Stripe data, you can define a join between the events table's distinct_id key and the stripe_customer table's email key. You can then access the stripe_customer table through the events table like SELECT stripe_customer.id FROM events.

Create a join

Once joined, source properties can be used in filters, breakdowns, and HogQL expressions.

To edit or delete a table join, click the three dots next to your source table, click View table schema, click the three dots next to your joined table, and select Edit or Delete.

Person joins

Person joins are a special type of table joins. They are joins on the persons table in PostHog. When you join external data on this table, we enable you to use it like a person filter in insights.

Filter on joined person properties

Note: Be sure that your joined keys actually match. For example persons.id returns a UUID, even if you use an email as a distinct_id when capturing events. You might need to add a person property like email.

Query joins

If you only want to join data together for a single insight or query, you can use SQL commands like WHERE IN and JOIN SQL commands.

For example, to get a count of events for your Hubspot contacts you can filter events.distinct_id by email FROM hubspot_contacts like this:

SQL
SELECT COUNT() AS event_count, distinct_id
FROM events
WHERE distinct_id IN (SELECT email FROM hubspot_contacts)
GROUP BY distinct_id
ORDER BY event_count DESC

You can also use a JOIN such as INNER JOIN or LEFT JOIN to combine data. For example, to get a count of events for your Stripe customers you can INNER JOIN on distinct_id and email like this:

SQL
SELECT events.distinct_id, COUNT() AS event_count
FROM events
INNER JOIN prod_stripe_customer ON events.distinct_id = prod_stripe_customer.email
GROUP BY events.distinct_id
ORDER BY event_count DESC

Questions?

Was this page useful?

Next article

Creating views

In the PostHog data warehouse, you can save your most used queries as views and reference them in subsequent queries. Creating a view Query views are created directly inside SQL insights. If the query has valid view characteristics, the "Save as view" button is enabled. When clicked, you are prompted to give the view a name which can then be referenced in other queries. For a query to be a valid view, all fields being accessed must be aliased (with the SQL as keyword). The alias names are…

Read next article