Data Infrastructure: Create ETL Pipeline for Friktion Data from Solana to GCP/Google BigQuery

This project will enable the batch transfer of data from the Solana blockchain to a data warehouse on Google Cloud Platform. In doing so, it will empower the team with (semi) real-time data which is (1) easily queryable using SQL for simple self-service of basic questions and (2) usable for dashboard development and API creation (if desired).

Projected Scope
For Volts 1 and 2

  • Withdrawals per Epoch
  • Transaction Size
  • Unique Users
  • Deposits per Epoch

Projected Tasks

  • Review existing script to understand data structure and existing functions for pulling the data
  • Define a rough schema for the data warehouse for existing and potential future data to be stored
  • Spin-up Cloud VM and clone the ETL script onto the box to run on set intervals via cron job. This script will write incremental records to the data warehouse each day (or some other interval).

Estimated Time
10-15 hours

1 Like

This data will probably not be that big after a year (< 500MB), so this solution might be a little overkill for this specific task, but building out the infrastructure for future more data intensive tasks would be very useful. I think we can use flatfiles for now until we start to get constrained, and then migrate over to more long term solutions when the time comes.

1 Like

Honestly, I might just do this “just because”. Even though the data volume is low, it will make sense to have all of Friktion’s data in one place.

That being said, this will maybe be lower priority amongst all the data-related tasks. Will prioritize analysis- and entropy-related stuff first.

1 Like

Agree with this - good to have robust infrastructure built out for when the time comes.