The following guide will help you start using BigQuery with Quary. By the end of this guide, you will have a BigQuery instance, a Quary dataset inside it, and a Quary project that you can use to run queries and push views.
Before you can get started, you will need the following:
A Google Cloud Platform account
A Google Cloud Platform project with BigQuery enabled
A Quary account
A signed in Quary extension
Data inside the BigQuery project to query
Create your target dataset
Quary requires a dataset to be created inside your BigQuery project. This dataset will be used to store all the views that you create with Quary. To create a dataset, follow the steps below:
To create a Quary project, open Visual Studio Code with the Quary extension installed and signed in. Once signed in, create a new project by opening the command palette and running the Quary: Initialise Project command. This should open a new window as follows:
After clicking on the BigQuery button, you will be prompted through a sign-in flow to connect your Google Cloud Platform account to Quary. The steps are as follows:
Click on Next to continue.
Allow Visual Studio Code to open a browser window.
Sign in to your Google Cloud Platform account.
Click on Allow to allow Quary to access your BigQuery project.
Copy the code shown to you.
Press Proceed in Visual Studio Code in the bottom right and paste the code into the input field
After completing the steps above, you should be able to select your BigQuery project and dataset from the dropdown. Select the project and dataset you created in the previous steps.
Once you hit Create Project, Quary will create some files in your directory. These are the configuration files for your project. In addition to this, Quary will also open the Import Sources tab.
The Import Sources tab allows you to import tables and views from your BigQuery project into Quary. Once a table, or view, has been imported, you can use it in your queries just by referencing it by name, prefixed with q., like you would refer other models. To import the source, select the sources you want to import and click on the Import Sources button. The path at the top is the path where the source will be imported to. You can change this to your liking.
Import sources will add the following for each source:
A source reference in the schema.yaml file.
A stg_source.sql model in the directory that intermediates the source.
Documentation for the model also in the schema.yaml file.
Create your first model
Once you have those sources imported, you should be able to query them in your models, just as documented in the first model guide.
GCP Keyless authentication
There are many ways to authenticate from GitHub Actions to BigQuery. The simplest way would be to use a service account key and place it in your GitHub repository. However, this is not recommended. Rather than do this, we will use the keyless authentication method described here. You could either follow the guide or use the following Terraform config, with replaced local variables.
This should create the required resources in your GCP project, such that the GitHub Actions service account can authenticate to BigQuery. Depending on your use case, you may need to add additional permissions to the service account, like access to the other datasets in your BigQuery project that you import.
Finally, you may also want to restrict the branches that have privileges to deploy to your BigQuery dataset, by modifying the attribute_condition in the google_iam_workload_identity_pool_provider resource.
Now that we have the required resources in GCP, we can set up GitHub Actions to deploy and test changes to our BigQuery dataset.
First, we will set up a workflow to run tests on pull requests. This will ensure that any changes to the dataset are tested before they are deployed.
Create a new file in your GitHub repository at .github/workflows/pr.yml with the following contents. Note: Remember to change the branch if your default branch is not main.
This adds three jobs:
A quick compile step
A dry run build step which outputs the required sql
A test step which runs the tests
Next, we will set up a workflow to deploy changes to the dataset. This will ensure that any changes to the dataset are deployed to the dataset when changes are merged to your default branch. Note: Remember to change the branch if your default branch is not main. Create a new file in your GitHub repository at .github/workflows/main.yml with the following contents.