Skip to main content

SQL

We use a lot of SQL!

Our system uses quite large data across both Postgres (OLTP style transactional) and Redshift (OLAP style large queries). Because of this, we have a large benefit from using efficient queries and we are not afraid to put more complicated logic in SQL (indeed our entire DBT layer is such).

Inside application code (TS / python) we have 2 common ways to access data:

  • ORM - useful for basic CRUD style operations affecting few rows. We use Sequelize in Typescript, which is handy as it gives us type safety. Typically this is used only when we access Postgres (ORMS are not a good fit for DWHs) and 'stateful' operations.
  • Direct SQL - useful for more complicated queries (eg aggregations, window functions etc etc) that are hand optimised, or require more complicated SQL logic or require DWH specifics. Also used when we want to access dynamic tables (eg. where we create a table specifically for a client for example) where ORMs get in the way.

As a general tip, if you find yourself struggling to write the Sequelize syntax for a complex query, or it contains any aggregations or window functions, you probably want a Direct SQL query.

This approach has a few downsides:

  • Defining schema in more than one place. Currently we have to modify both the Sequelize models and manually create SQL migrations. Not ideal.
  • Query safety and composability. One of the traditional pros of using an ORM is that you can safely pass in parameters and also compose queries without concatenating strings.

For the second issue we use Sql Templates.

SQL Templates

As we all know, manually creating SQL strings by concatenating strings is painful and also totally insecure as its open to SQL injection for parameters. We avoid it using the SQL Template helper in Typescript:

const id=1;
const query = sqlT`SELECT * FROM mytable WHERE id=${id}`;

Note that this is, under the hood, using parameter placeholders. The above actually results in query being an instance of SqlFragment class:

instance SqlFragment {
sql: 'SELECT * FROM mytable WHERE id=:v_1',
params: {
v_1: 1
}
}

We then use this with one of these helpers:

PG:
await queryFetchAllRows(query); => return array of rows
await queryFetchFirstRow(query); => first row
await queryFetchFirstValue(query); => first value of first row
await quertReturnNoRows(query); => return nothing

Redshift:
const dwh = new DWHClient(false);
await dwh.queryFetchAllRows(query); => return array of rows
await dwh.queryFetchFirstRow(query); => first row
await dwh.queryFetchFirstValue(query); => first value of first row
await dwh.quertReturnNoRows(query); => return nothing

And these pass the parameters safely to the database (using the inbuilt postgres parameter format). These helpers ONLY accept an instance of SqlFragment created with the sqlT template, you cant accidentally pass in an unsafe string.

Therefore, as long as we do this, SQL injection is not an issue.

There are some helpers for other insertion types:

sqlT`SELECT * FROM ${sqlT.table(tableName)}`;
sqlT`SELECT * FROM mytable WHERE ${sqlT.column(dynamicColumn)}=${value}`;
sqlT`SELECT * FROM mytable WHERE id IN(${sqlT.inList(ids)})`;

etc… These helpers are all safe and will take care of escaping all parameters and inputs.

There is a sqlT.raw('SQL') helpers but it should only be used in extreme cases and with careful thought.

Note that SqlFragments are composable, e.g. one can do:

const wherePart = sqlT`id=${id}`;
const finalSql = sqlT`SELECT * FROM mytable WHERE ${wherePart}`;

Which will result in a new SqlFragment with the SQL and the parameters merged.

This allows conditions etc (not a great example but to tell a point):

const wherePart = sqlT`id=${id}`;
const finalSql = sqlT`SELECT * FROM mytable WHERE ${id ? wherePart: sqlT`TRUE`}`;

There are also helpers for cases when you want to join an array of SQL Fragments together, e.g. to construct a list of AND or ORs:

const parts = [
sqlT`id=${1}`,
sqlT`status=${'DONE'}`,
];
const sql = sqlT.join(parts, ' OR ');
// sql = id=1 OR status='DONE'