Automatically load data from Google Cloud Storage files into Big Query tables
Note: Documentation is currently not in sync with the rewrite, updating soon…
A Google Cloud Function providing a simple and configurable way to automatically load data from GCS files into Big Query tables.
It features a convention over configuration approches, and provides a sensible default configuration for common file formats (CSV, JSON, AVRO, ORC, Parquet)
If the default behaviour does not suit your needs, it can be modified for all or certain files through mapping files or custom metadata.
Create a new bq-autoload
Google Cloud Storage bucket
$> gsutil mb -c regional -l europe-west1 "gs://bq-autoload"
Create a new Staging
BigQuery dataset
$> bq mk --dataset "Staging"
Clone and deploy this repository as a cloud function triggered by changes on this GCS bucket (do not forget to replace the project id)
$> git clone "https://github.com/tfabien/bigquery-autoload/" \
&& cd "bigquery-autoload" \
&& npm install -g typescript \
&& npm install \
&& npm build \
&& gcloud functions deploy "bq-autoload" \
--entry-point autoload \
--trigger-bucket "bq-autoload" \
--set-env-vars "PROJECT_ID={{YOUR_GCP_PROJECT_ID}}" \
--runtime "nodejs10" \
--memory "128MB" \
--region europe-west1
That’s it
Any file you upload to the bq_autoload
GCS bucket will now automatically be loaded into a BigQuery table within seconds.
See the wiki for usage samples and advanced configuration