Always On

The Datorama blog on recent marketing trends,
product updates, and industry thought leadership.

Demo

Tips & Tricks: Python Retrieval to Easily Extend APIs– and Much More

Gora Sudindranath | 05.01.2018

This post is part of our Tips & Tricks series, highlighting some of Datorama’s most interesting or useful features. This series features guest authors from our Client Success team, who are hard at work every day helping Datorama customers with everything from data prep to visualization.

Datorama offers many ways to easily integrate data into the platform: an extensive set of pre-built APIs, TotalConnect for going beyond API-based integration, and LiteConnect to instantly integrate non-marketing data sets “as-is”.

But what if you needed to integrate data from an API that did not have a pre-built connection? Or what if you needed to pull more or different fields from an API than what Datorama supports out of the box? Datorama’s new Python Retrieval feature is perfect for this.

Python Retrieval runs a Python script on a scheduled interval and stores the output of the script within Datorama’s Data Model. The Python script itself can connect to anything that you can integrate with using Python– from REST APIs to databases or big data stores. This allows you to expand the range of metrics you can pull in with the popular and user-friendly language of Python.

As an example of how to do this, in this blog post I’ll show you how to use Python Retrieval to build a simple YouTube connector for integrating public data about non-owned channels on YouTube. To make the example even more concrete, let’s say I’m working on behalf of a fictitious organisation, the “International Mech Racing League” (or IMRL). It’s the ultimate nerd pursuit. 😉

How to Use Python Retrieval

The IMRL would like to analyze the social profile on YouTube for a number of their competitors in spectator sport: WWE, F1, PGA, etc. The following steps illustrate how to add this Data Stream.

1) Add a new Python Data Stream

Click “Add New” Data Stream, then select the Python Icon

2) Click on the code editor to input the Python script.

Clicking on the code will open a full-screen Python code editor.

3) Enter your code

I would suggest writing code in another text editor of your choice and copy and pasting your code into the editor. Here is the code for my YouTube Public API. A quick tour of the code:

The first line imports the Datorama python module which allows your script to save data in the platform. The second line imports the requests library, which allows the script to connect to make HTTP requests to the YouTube REST API. NB: requests and other commonly used libraries are pre-loaded in Datorama, but if you encounter a library that isn’t loaded, raise a ticket and our fabulous support team will ensure that these are installed.

Here is where the magic happens. We are looping through each YouTube account for competitors (e.g. WWE, Formula 1, etc.) and calling the YouTube API endpoint to get the statistics (the actual call is the “requests.get…” line, third from the top) for the competitor’s channel. Once we have the statistics for each competitor’s channel, we create a CSV string with the following format:

Date, id (competitor name), viewCount, commentCount, subscriberCount, videoCount

NB: select CSV column headers that are the same as the attributes in the data model you want to map them into to ensure Datorama automatically maps all of your data into the data model.

Finally, we sanity check to ensure YouTube actually returned some data and then save the CSV string to Datorama with the “Datorama.save_csv(csv)” line.

4) Perform a sanity check

4a) Validate the script

Datorama will automatically validate the script to ensure it will actually run. This mainly checks for syntax errors, dependencies on third party libraries, and for the existence of the all important datorama.save line of code

4b) Configure the frequency for running the script

Most of the time daily is fine, but if you want to run it more or less frequently you can configure this just below the code editor.

4c) Configure standard optional parameters like Parent Data Stream, Custom Attributes, etc.

4d) Click “Next” to edit mappings

If you followed my advice above about aligning CSV headers with the data model, this step should be dead easy and all of your columns should be auto-mapped.

4e) Finally, click “Save” to start processing the data stream

You have now integrated a YouTube data source tracking competitor data. Pat yourself on the back!

Your new Python retrieval Data Stream will run as often as you schedule it. A common extension to what we have done here is the case when your Python script needs to “remember” the last row of data pulled from an API so that, perhaps, it retrieves new rows in subsequent runs. This is easily achieved by calling the Datorama Query API from within your Python script to return the max date of the last row stored.

Blog posts by Gora Sudindranath | Follow Datorama on:

IBM Case Study

See how Datorama enabled IBM to
optimize its performance at scale.

Read the Case Study

How Can We Help You?

Find out how Datorama
can power your business.

Set Up a Demo