Tips & Tricks: Python Retrieval to Easily Extend APIs– and Much More
This post is part of our Tips & Tricks series, highlighting some of Datorama’s most interesting or useful features. This series features guest authors from our Client Success team, who are hard at work every day helping Datorama customers with everything from data prep to visualization.
Datorama offers many ways to easily integrate data into the platform: an extensive set of pre-built APIs, TotalConnect for going beyond API-based integration, and LiteConnect to instantly integrate non-marketing data sets “as-is”.
But what if you needed to integrate data from an API that did not have a pre-built connection? Or what if you needed to pull more or different fields from an API than what Datorama supports out of the box? Datorama’s new Python Retrieval feature is perfect for this.
Python Retrieval runs a Python script on a scheduled interval and stores the output of the script within Datorama’s Data Model. The Python script itself can connect to anything that you can integrate with using Python– from REST APIs to databases or big data stores. This allows you to expand the range of metrics you can pull in with the popular and user-friendly language of Python.
As an example of how to do this, in this blog post I’ll show you how to use Python Retrieval to build a simple YouTube connector for integrating public data about non-owned channels on YouTube. To make the example even more concrete, let’s say I’m working on behalf of a fictitious organisation, the “International Mech Racing League” (or IMRL). It’s the ultimate nerd pursuit. 😉
How to Use Python Retrieval
The IMRL would like to analyze the social profile on YouTube for a number of their competitors in spectator sport: WWE, F1, PGA, etc. The following steps illustrate how to add this Data Stream.
1) Add a new Python Data Stream
Click “Add New” Data Stream, then select the Python Icon
2) Click on the code editor to input the Python script.
Clicking on the code will open a full-screen Python code editor.
3) Enter your code
I would suggest writing code in another text editor of your choice and copy and pasting your code into the editor. Here is the code for my YouTube Public API. A quick tour of the code:
The first line imports the Datorama python module which allows your script to save data in the platform. The second line imports the requests library, which allows the script to connect to make HTTP requests to the YouTube REST API. NB: requests and other commonly used libraries are pre-loaded in Datorama, but if you encounter a library that isn’t loaded, raise a ticket and our fabulous support team will ensure that these are installed.
Here is where the magic happens. We are looping through each YouTube account for competitors (e.g. WWE, Formula 1, etc.) and calling the YouTube API endpoint to get the statistics (the actual call is the “requests.get…” line, third from the top) for the competitor’s channel. Once we have the statistics for each competitor’s channel, we create a CSV string with the following format:
Date, id (competitor name), viewCount, commentCount, subscriberCount, videoCount
NB: select CSV column headers that are the same as the attributes in the data model you want to map them into to ensure Datorama automatically maps all of your data into the data model.
Finally, we sanity check to ensure YouTube actually returned some data and then save the CSV string to Datorama with the “Datorama.save_csv(csv)” line.
4) Perform a sanity check
4a) Validate the script
Datorama will automatically validate the script to ensure it will actually run. This mainly checks for syntax errors, dependencies on third party libraries, and for the existence of the all important datorama.save line of code
4b) Configure the frequency for running the script
Most of the time daily is fine, but if you want to run it more or less frequently you can configure this just below the code editor.
4c) Configure standard optional parameters like Parent Data Stream, Custom Attributes, etc.
4d) Click “Next” to edit mappings
If you followed my advice above about aligning CSV headers with the data model, this step should be dead easy and all of your columns should be auto-mapped.
4e) Finally, click “Save” to start processing the data stream
You have now integrated a YouTube data source tracking competitor data. Pat yourself on the back!
Your new Python retrieval Data Stream will run as often as you schedule it. A common extension to what we have done here is the case when your Python script needs to “remember” the last row of data pulled from an API so that, perhaps, it retrieves new rows in subsequent runs. This is easily achieved by calling the Datorama Query API from within your Python script to return the max date of the last row stored.