๐Ÿ–‡๏ธManaging the data model

Each data model describes its:

  • Source of data: a database table, a SQL query or a PRQL query

  • Fields: the columns present on this data model

  • Relations: relationships to other data models

Additionally, for certain kinds of data models that almost every B2B SaaS company has, such as accounts, users and analytics events, you can describe additional semantic properties about the data models.

These later make it even easier for anyone on your team to do complex things like cohort retention analysis, without having to worry about database tables or configuring anything.

models:
  supersimple_demo_account:
    name: Account

    semantics:
      kind: Account
      properties:
        created_at: created_at

    properties:
      account_id:
        name: Account ID
        type: String
      name:
        name: Name
        type: String
      payment_plan:
        name: Payment Plan
        type: Enum
        enum_options:
          load: static
          options:
            - value: enterprise
              label: Enterprise
            - value: pro
              label: Pro
            - value: basic
              label: Basic
            - value: free
              label: Free
      created_at:
        name: Created At
        type: Date
        
    table: raw_account
    primary_key:
      - account_id
    relations:
      users:
        name: Users
        type: hasMany
        model_id: supersimple_demo_user
        join_strategy:
          join_key: account_id
      onboarding_response:
        name: Onboarding Response
        type: hasOne
        model_id: supersimple_demo_onboarding_response
        join_strategy:
          join_key: account_id

Data source

The data source can be defined in three ways. Only one may be used for a given model.

SQL

models:
  somemodel:
    name: My model
    sql: select id, amount / 100 as amount_usd from x

Table

models:
  somemodel:
    name: My model
    table: schemaname.tablename # use the whole table as-is

PRQL

We internally often like to use PRQL for certain things.

models:
  somemodel:
    name: My model
    prql: |
      from x
      derive y = amount > 0

Properties

Sets the properties (aka Fields) that a model has. Each field must be listed explicitly in order to be shown on the platform. Fields that are present in the data source but not in the properties list are not visible to users.

Properties can have the following type:

  • String

  • Enum

  • Boolean

  • Number

  • Integer

  • Float

  • Date

Relations

Relations describe how different data models are linked together. Relations:

  • Encapsulate the semantic meaning of the relationships โ€“ย two data models might have several relationships between each other, with different meanings (e.g. a Person might have multiple relations to other Persons: friends and enemies)

  • Centrally define the SQL join logic

Relations are, by default, unidirectional. They are defined from the "base model" โ€“ย the data model from which you can use them. For example, for an User's Car, User would be the base model, and Car would be the related model.

The types of relations are:

  • hasMany: each row in the base model has zero or more (up to infinity) matches in the related model (e.g. Company->Employee)

  • hasOne: each row in the base model has exactly one match in the related model, or it has none at all (e.g. Employee->Employer)

  • manyToMany: functions just like hasMany; each row in the base model has zero or more matches in the related model (e.g. User->Team where every user can be in multiple teams and every team can have multiple users)

  • hasOneThrough: functions just like hasOne; the underlying database has an intermediary table (e.g. a Person's Grandfather is defined through Person->Parent->Parent)

Note that you only need to define one level of relations between data models. It's always possible to later dynamically traverse through your entire data graph, e.g. going from accounts to their users, to the users' analytics events.

Metrics

Metrics allow you to reuse calculation logic in a flexible way. Metrics correspond to a single base model, and can be used from anywhere that has access to that data model. A metric can also be broken down (grouped) by any of that data model's Fields.

In your models YAML file, you can define metrics as follows:

metrics:
  transaction_gmv:
    name: GMV
    model_id: transaction
    aggregation:
      type: sum
      key: amount

The Metric can then be used as described under summarization options.

Last updated