How Model 2 Fact Tables with some common dimensions!

Hello

trying to model 2 Fact Tables with some common dimensions,

I managed to generate the model but the results of the Query are very wrong, it is like it is doing a cross join or something

is there a documentation on how to model multiple fact tables ?

thanks

dataset BIENGINETESTING
// this is an auto-generated default dataset
// please feel free to delete it, if not used
table actual budget category_dim
join actual inner join category_dim
on actual.category = category_dim.category
join budget inner join category_dim
on budget.category = category_dim.category

Hi Mim - thanks for your question. I’ve flagged with the customer support team and someone will get back to you ASAP - Jeremy

Update
Or maybe our CEO who invented our data modeling language will jump in…

Do you have a one-to-many relationship between each fact table table and the dimension table?

Fact1
id metric1
1 2
1 3

Fact2
id metric2
1 20
1 30

DimensionTable
id
1

In that case, according to standard SQL semantics, a simultaneous join is equivalent to a cross join.

Fact1 JOIN Fact1 JOIN DimensonTable
id metric1 metric2
1 2 20
1 2 30
1 3 20
1 3 30

Please let us know if our understanding is correct. Whether we are right or wrong in our interpretation of your problem, if we know your intent and how your tables are related, we can help find the correct solution.

BTW if you haven’t already done so, you can look at the auto-generated SQL by clicking on “View SQL” (near top right) when you are exploring a dataset.

here is my model

I expect the SQL to be something like this

SELECT category,
actualhours ,
budgethours ,

FROM (
SELECT category, SUM(actualhours) AS actualhours
FROM xxxxxxxx.starschema.actual

GROUP BY category
) t2
FULL OUTER JOIN (
SELECT category, SUM(budgethours) AS budgethours
FROM xxxxxxx.starschema.budget
GROUP BY category
) t3 USING(category)

We strive to provide general and powerful solutions to wide a variety of cases, while respecting basic semantics of SQL.

I understand if you expect the above SQL to be generated from your data model. Generalizing it to the case where the two JOINs use different columns may be hard. We will certainly look into this.

Meanwhile, one possible data model that would produce queries similar to your intent is as follows.

table budget
  sql xxxxxxx.starschema.budget
  datasource <yourDataSource>
  ...

table actual
  sql xxxxxxx.starschema.actual
  datasource <yourDataSource>
  ...

table joinableBudget
  sql {budget category sumBudgetHours}
  datasource <yourDataSource>
  column sumBudgetHours
    sql SUM(budgethours)
    type int
  column category
    sql ...
  ...

table joinableActual
  sql {actual category sumActualHours}
  datasource <yourDataSource>
  column sumActualHours
    sql SUM(actualhours)
    type int
  column category
    sql ...
  ...


dataset BIENGINETESTING
  table joinableBudget joinableActual
  join joinableBudget full outer join category_dim
    on joinableBudget.category = category_dim.category
  join joinableActual full outer join category_dim
    on joinableActual.category = category_dim.category

We could have defined a join rule between joinableBudget and joinableActual directly. But the above solution will work even when you select additional columns from category_dim.

The {} syntax for a table sql is documented here: Views using Structured SQL | bipp Analytics

As always, we solicit and welcome any feedback. If any of this looks unnecessarily complex, we would love to hear back from you.

1 Like

Thanks a lot, that’s very interesting approach, maybe it will be useful to show cases like this in the documentation

Yes, I completely agree. Out documentation is still far from ideal. We are working on it.

We now have a better solution for your use case in bipp.

I created tables ‘budget’, ‘actual’ and ‘category_dim’ on bigquery using your schema. Then I created a project in bipp called ‘multipleFactTables’. I connected to bigquery and added the three tables to the project. Then I created a dataset (notice the “oneToMany” keyword):

dataset multipleFactTables
  table actual budget category_dim
  join oneToMany category_dim left join actual
    on category_dim.category = actual.category
  join oneToMany category_dim left join budget
    on category_dim.category = budget.category

Then I can do what you want:
image

The underlying auto-generated query is:

SELECT t000._1 AS _0, SUM(t000._0) AS _1, SUM(t000._2) AS _2 FROM (
  SELECT t001._1 AS _0, t001._0 AS _1, SUM(t000.budgetHours) AS _2 
  FROM `test-datasources-294120.oneToMany`.`budget` AS t000 
  RIGHT OUTER JOIN (
    SELECT t001.category AS _0, SUM(t000.actualHours) AS _1 
    FROM `test-datasources-294120.oneToMany`.`actual` AS t000 
    RIGHT OUTER JOIN `test-datasources-294120.oneToMany`.`category_dim` AS t001 
    ON t001.category=t000.category 
    GROUP BY 1) AS t001 
  ON t001._0=t000.category 
  GROUP BY 1, 2) AS t000 
GROUP BY 1

Hope that helps.

1 Like

That’s great, it works as expected.
now, when I try to add another dimension, it generate an error

error : Error from makeSql: internal error: hell

I apologize for that. Can you help me figure out what the problem is? Can you share with me your bippLang data model and what columns you select?

We just deployed a version with better logging. Can you try again? Maybe that will be enough for us to debug it.

dataset starschena
table actual budget category_Dim_mv commodity_dim_mv
join oneToMany category_Dim_mv left join actual
on category_Dim_mv.category = actual.category
join oneToMany category_Dim_mv left join budget
on category_Dim_mv.category = budget.category
join oneToMany commodity_dim_mv left join actual
on commodity_dim_mv.commodity = actual.commodity
join oneToMany commodity_dim_mv left join budget
on commodity_dim_mv.commodity = budget.commodity

when trying to use two dimensions :

category_Dim_mv.category

commodity_dim_mv.commodity

and a metric from Actual

I get this error : error : Error from makeSql: internal error: assertion failure