Stream Firestore to BigQuery

Made by Firebase

Sends realtime, incremental updates from a specified Cloud Firestore collection to BigQuery.

47.7K+
installs
Works with
Cloud Firestore
Version
0.1.31 | Source code
License
Apache-2.0
Publisher
Firebase
Report
Bug
Abuse

How this extension works

Use this extension to export the documents in a Cloud Firestore collection to BigQuery. Exports are realtime and incremental, so the data in BigQuery is a mirror of your content in Cloud Firestore.

The extension creates and updates a dataset containing the following two BigQuery resources:

  • A table of raw data that stores a full change history of the documents within your collection. This table includes a number of metadata fields so that BigQuery can display the current state of your data. The principle metadata fields are timestamp, document_name, and the operation for the document change.
  • A view which represents the current state of the data within your collection. It also shows a log of the latest operation for each document (CREATE, UPDATE, or IMPORT).

If you create, update, delete, or import a document in the specified collection, this extension sends that update to BigQuery. You can then run queries on this mirrored dataset.

Note that this extension only listens for document changes in the collection, but not changes in any subcollection. You can, though, install additional instances of this extension to specifically listen to a subcollection or other collections in your database. Or if you have the same subcollection across documents in a given collection, you can use {wildcard} notation to listen to all those subcollections (for example: chats/{chatid}/posts).

Enabling wildcard references will provide an additional STRING based column. The resulting JSON field value references any wildcards that are included in ${param:COLLECTION_PATH}. You can extract them using JSON_EXTRACT_SCALAR.

Partition settings cannot be updated on a pre-existing table, if these options are required then a new table must be created.

Clustering will not need to create or modify a table when adding clustering options, this will be updated automatically.

Additional setup

Before installing this extension, you’ll need to:

Transform function

Prior to sending the document change to BigQuery, you have an opportunity to transform the data with an HTTP function. The payload will contain the following:

{ 
  data: [{
    insertId: int;
    json: {
      timestamp: int;
      event_id: int;
      document_name: string;
      document_id: int;
      operation: ChangeType;
      data: string;
    },
  }]
}

The response should be indentical in structure.

Backfill your BigQuery dataset

This extension only sends the content of documents that have been changed – it does not export your full dataset of existing documents into BigQuery. So, to backfill your BigQuery dataset with all the documents in your collection, you can run the import script provided by this extension.

Important: Run the import script over the entire collection after installing this extension, otherwise all writes to your database during the import might be lost.

Generate schema views

After your data is in BigQuery, you can run the schema-views script (provided by this extension) to create views that make it easier to query relevant data. You only need to provide a JSON schema file that describes your data structure, and the schema-views script will create the views.

Billing

To install an extension, your project must be on the Blaze (pay as you go) plan

  • This extension uses other Firebase and Google Cloud Platform services, which have associated charges if you exceed the service’s no-cost tier:
    • BigQuery (this extension writes to BigQuery with streaming inserts)
    • Cloud Firestore
    • Cloud Functions (Node.js 10+ runtime. See FAQs)