Using Power BI to analyze blockchain transactions on Ethereum

Blockchain is THE tech to watch in 2018 – you can’t escape it!  As more organizations look to leverage this technology, there will be an increased need to capture blockchain transaction data and integrate it with other data sources.  One of the tenants of blockchain technology is the immutable ledger – and access to this ledger is a key piece of the pie.

In this post, I’ll demonstrate how to ingest event activity from a DApp (or Decentralized Application) built on the Ethereum network into Power BI desktop and create visualizations using free, readily available software.  But first, some background:

What is Power BI?

A data visualization tool from Microsoft.  The desktop version is a free download for Windows – presently no Mac or Linux option, but they are working on it.  If you want to publish reports to a wider audience or schedule nightly data refresh, then it costs $10/month per user to upgrade to the Power BI Pro Service.  Power BI has both data transformation capabilities as well as report/dashboard creation capabilities.  We’ll need both to grab the JSON payload from the Etherscan API and visualize the data.  It is also worth noting that Power BI gets monthly updates from the product team – it’s very possible my screen shots will be out of date soon.

What is Ethereum?

A platform for building/hosting apps on the blockchain (such as smart contracts) that was developed by the Ethereum Foundation in 2014.  We won’t get into the “What is blockchain?” or “What is cryptocurrency?” as there are tons of articles and videos out there that do a much better job at explaining this than I do.  However, Ethereum seems to be the platform that has the most developer support and its currency, Ether, is a top-5 cryptocurrency.

What is Votethereum?

A DApp created by Kris Bennett as a proof-of-concept to re-envision how voting could work on the blockchain.  Each participant is allocated some number of votes (imagine voting for multiple candidates but splitting the pot 70/30 between your top two candidates) and each smart contract represents an election.  This DApp is deployed to the Ropsten test network, so we’ll be using the Ropsten Etherscan.io API to grab the transactions (in our case, votes by election for a particular user).

Now let’s get to it…

Step 1: Register for an Etherscan.io API key

Create an account on Etherscan.io and within the account, you’ll see the ability create up to 3 API keys.  Note that Etherscan limits GET/POST requests to 5/sec.

For this demo, we’re using the Ropsten network, but here is a reference to API documentation across the various networks:

  1. Ropsten: https://ropsten.etherscan.io/apis

  2. Rinkeby: https://rinkeby.etherscan.io/apis

  3. Etherscan: https://etherscan.io/apis

Step 2: Install Power BI and Configure a New Connection

Install Power BI Desktop from Microsoft.  Once the application is launched, choose Get Data –> JSON (the API returns a JSON payload).

Now, since my use case is viewing all voting activity for a specific user (account), I’m using the GET request to return all normal transactions by ACCOUNT.  In VoteEthereum, user “Alice” is at address:

0x16F3ef207C48b17E0fBbC59Eb2c217eB139eff88

And combining this with my API key and the ropsten network, my request will be something like this (where the startblock, endblock and sort parameters are optional):

http://ropsten.etherscan.io/api?module=account&action=txlist&address=0x16F3ef207C48b17E0fBbC59Eb2c217eB139eff88&startblock=0&endblock=99999999&sort=asc&apikey={MY-API-KEY}

Paste this into the “Open JSON” dialog and you’ll get what is called the the Query Editor in Power BI. This feature allows you to perform and track all transformations performed on the source data.  The initial load of your Etherscan data will look a little funny – we’ll need to drill in to get to the table of information.  Notice the “result = List”, where the List is a hyperlink to the child records.

Step 3: Convert JSON to a Table

Click on the “List” link referenced above which brings us to the correct level of grain we’re looking for – each record represents a user (account) transaction.  BTW, special thanks to Scott Murray for providing this Power BI JSON tutorial on MSSQLtips.com .  Now that we have our list of records, we can convert this list to a table, using the Convert Into Table function.

In this case, there are no delimiters, so go ahead and click OK to perform the conversion.  Initially, it won’t look like anything happened…but now notice the double arrows near the column header – click this to expand the other columns.

This will allow you to select which columns to include in your table – I just grabbed everything.  Notice the API provides good info on the Ethereum transaction including gas used, cost, blockNumber, etc. that could be useful depending on your needs.

Step 4: Convert Unix Timestamp to Something Better

Now that we can preview the data in a table format, you’ll see that the Timestamp used is Unix, which is the number of seconds since Epoch (January 1, 1970) in UTC.  This is a nice, clean format for an event log, not so much for an analytical data store.  So, before we apply all of the steps in our Query Editor, let’s create a column that converts Unix Timestamp to a standard MM/DD/YYYY HH:MM:SS format.  First, the Column1.timeStamp is a TEXT data type – we need this to be numeric.  Highlight the column, and choose the data type Whole Number.

Now we can add our custom column.  In the ribbon, choose the Add Column tab, then select Custom Column.

The syntax for this new column is as follows:

#datetime(1970, 1, 1, 0, 0, 0) + #duration(0, 0, 0, [Column1.Timestamp])

Keep in mind this will convert to UTC time, so if you want to change to a different time zone, place the offset in the 3rd parameter of duration.  I’m in Mountain Time, so the offset is (-7):

Now, we have the data where we want it – go ahead and Close and Apply the Query Editor.

Step 5: Visualize Your Data

The heavy lifting is done – now for the fun part.  Power BI has a growing library of visuals but often the simplest visuals are the most effective (line, bar charts, etc.).  Here’s a quick dashboard presenting votes by day and votes by contract.

In a production scenario, you’d probably want to first bring the data into a relational data store like Amazon RDS or Azure SQL DB.  Shameless plug – this is where Auptimal can help (now accepting BTC or ETH 🙂 ).  Here are some additional considerations that will help smooth out the data reporting component of your DApp:

  1. Add a date dimension to your analytical data store to enable slicing and dicing across days, weeks, months, quarters, etc.
  2. Keep a reference table handy for your Contracts as well as all Versions of those contracts
  3. Stagger API calls in your load procedure so that you don’t come anywhere near the 5 separate calls per second limit.  This way, you’re not taxing the network (and getting kicked off).