Configuration


Configuring your project is done with a simple YAML file, named quary.yaml and placed at the root of a project. The file defines which database to use and how to connect to it. The following are the different options for configuring the database.

BigQuery

BigQuery is a cloud based database, it is a good choice for large projects and for production. It is operated by Google and is part of the Google Cloud Platform. To use BigQuery you need to create a project in the Google Cloud Platform and enable BigQuery for the project. You can configure the connection to BigQuery with the following configuration.

bigQuery:
  projectId: my-project-id
  datasetId: my-dataset-id

Snowflake

Snowflake is a cloud-based data warehousing platform that offers scalable, on-demand storage and computing power and that separates storage and compute, enabling users to independently scale and pay for each.

snowflake:
  accountUrl: account-url
  clientId: client-id
  clientSecret: client-secret
  role: your-quary-role
  database: database-name
  schema: schema-name
  warehouse: warehouse-name

Clickhouse

Clickhouse is a column-oriented database management system developed by Yandex. It is designed for OLAP workloads and is a good choice for large-scale data storage and analysis. Unlike other databases, Clickhouse does not have a concept of a schema, so you only need to specify the database.

clickhouse:
  database: transform

All connection details are passed as environment variables. To connect to Clickhouse you need to set the following environment variables: CLICKHOUSE_HOST, CLICKHOUSE_PORT, CLICKHOUSE_USER and CLICKHOUSE_PASSWORD.

Postgres

Postgres is a popular open-source database. It is a good choice for small to medium projects and for production.

postgres:
    schema: transform

All connection details are passed as environment variables. The following environment variables are used to connect to the database. PGHOST, PGPORT (if not set defaults to 5432), PGUSER, PGPASSWORD, PGDATABASE. They are the same as those used by the psql command line tool.

Redshift

Redshift is a cloud-based data warehousing service provided by Amazon Web Services (AWS). It is designed for large-scale data storage and analysis.

redshift:
    schema: your-schema

All connection details are passed as environment variables. The following environment variables are used to connect to the database:

  • RSHOST
  • RSUSER
  • RSPASSWORD
  • RSDATABASE
  • RSPORT (if not set defaults to 5432)

DuckDb

DuckDb is an in-process SQL database, it's a feature-rich SQL dialect that it is a great way to query data locally on your machine.

File Based

The database is stored in a file on the disk and is persisted between runs. In this case the path to the file is specified, and each run creates the file if it does not exist.

duckdb:
    path: database.db

In Memory

The database is created in memory and is lost when the application is closed. It is faster than the file-based version and is good for rapid testing.

duckdbInMemory: {}

SQLite

SQLite is a database type that runs locally, it is a file-based database. It is a good choice for small projects and for testing. It can be operated in two manners

In Memory

The database is created in memory and is lost when the application is closed. It is faster than the file-based version and is good for rapid testing.

sqliteInMemory: {}

File Based

The other option is to use a file based database, where the database is stored in a file on the disk and is persisted between runs. In this case the path to the file is specified, and each run creates the file if it does not exist. It is useful for testing, for small projects and when you want to use the database in different applications.

sqlite:
  path: db.sqlite

Environment variables

In order to make your set up easier, you can use a .env files to set environment variables. By default, quary will pick up variables inside of .env. You can specify other .env files with the -e flag. For example, the following will read both .env files.

quary init -e='.env' -e='local.env'