
Stream Firestore to BigQuery
Made by Firebase
Sends realtime, incremental updates from a specified Cloud Firestore collection to BigQuery.
- Works with
- Cloud Firestore
- Version
- 0.1.31 | Source code
- License
- Apache-2.0
- Publisher
- Firebase
- Report
- Bug
- Abuse
How this extension works
Use this extension to export the documents in a Cloud Firestore collection to BigQuery. Exports are realtime and incremental, so the data in BigQuery is a mirror of your content in Cloud Firestore.
The extension creates and updates a dataset containing the following two BigQuery resources:
- A table of raw data that stores a full change history of the documents within your collection. This table includes a number of metadata fields so that BigQuery can display the current state of your data. The principle metadata fields are
timestamp
,document_name
, and theoperation
for the document change. - A view which represents the current state of the data within your collection. It also shows a log of the latest
operation
for each document (CREATE
,UPDATE
, orIMPORT
).
If you create, update, delete, or import a document in the specified collection, this extension sends that update to BigQuery. You can then run queries on this mirrored dataset.
Note that this extension only listens for document changes in the collection, but not changes in any subcollection. You can, though, install additional instances of this extension to specifically listen to a subcollection or other collections in your database. Or if you have the same subcollection across documents in a given collection, you can use {wildcard}
notation to listen to all those subcollections (for example: chats/{chatid}/posts
).
Enabling wildcard references will provide an additional STRING based column. The resulting JSON field value references any wildcards that are included in ${param:COLLECTION_PATH}. You can extract them using JSON_EXTRACT_SCALAR.
Partition
settings cannot be updated on a pre-existing table, if these options are required then a new table must be created.
Clustering
will not need to create or modify a table when adding clustering options, this will be updated automatically.
Additional setup
Before installing this extension, you’ll need to:
Transform function
Prior to sending the document change to BigQuery, you have an opportunity to transform the data with an HTTP function. The payload will contain the following:
{
data: [{
insertId: int;
json: {
timestamp: int;
event_id: int;
document_name: string;
document_id: int;
operation: ChangeType;
data: string;
},
}]
}
The response should be indentical in structure.
Backfill your BigQuery dataset
This extension only sends the content of documents that have been changed – it does not export your full dataset of existing documents into BigQuery. So, to backfill your BigQuery dataset with all the documents in your collection, you can run the import script provided by this extension.
Important: Run the import script over the entire collection after installing this extension, otherwise all writes to your database during the import might be lost.
Generate schema views
After your data is in BigQuery, you can run the schema-views script (provided by this extension) to create views that make it easier to query relevant data. You only need to provide a JSON schema file that describes your data structure, and the schema-views script will create the views.
Billing
To install an extension, your project must be on the Blaze (pay as you go) plan
- This extension uses other Firebase and Google Cloud Platform services, which have associated charges if you exceed the service’s no-cost tier:
- BigQuery (this extension writes to BigQuery with streaming inserts)
- Cloud Firestore
- Cloud Functions (Node.js 10+ runtime. See FAQs)