This post is part of our Tips & Tricks series, highlighting some of Datorama’s most interesting or useful features. This series features guest authors from our Client Success team, who are hard at work every day helping Datorama customers with everything from data prep to visualization.
This post is by Caroline Kangpan, Client Success Sr. Manager.
As a technical client success lead, one of the most common questions I get from my clients is: ‘What is the best way to harmonize my data?’
First, what is harmonization?
Data harmonization is the concept of bringing together your data from different data sources with potentially varying file formats, naming conventions, and columns, and transforming it into one cohesive data set. It also happens to be one of the things Datorama helps marketers handle most easily. If “centralizing” your marketing data is putting it into one place, “unifying” your marketing data is connecting it all together into a seamless whole for analysis. When data gets unified, it gets harmonized. It’s here where data classification occurs– giving you the power to get all of your data into an “apples to apples” format and to supplement your analysis with dimensions that don’t exist in your source data.
Note: I’ll use classification in this blog post as our guiding term, but data normalization and data cleansing are often used as well.
What are some examples of classifications in harmonization?
Before we dive in, understand that classifications generally fall into one of two overarching buckets: business and marketing and are typically the driving force enabling you to slice and dice your data to uncover insights. They will often materialize as filters, drill downs, or segmentations in your dashboard visuals and reports.
Here are some examples of common business and marketing classifications:
- Business Classifications
- Business Unit
- Product Group
- Product SKU
- Marketing Classifications
- Creative (or Content)
- Audience/Customer Segment
Now let’s take a look at some of the Datorama feature sets for classification in the context of some questions you might have about your data.
Example 1: The Dimensions Explorer
Question: What’s the best way to simply “write in” a small number of classification changes as needed?
The Dimensions Explorer is designed for these simple on-the-fly changes, allowing you to drill down into your data model and see every key and their corresponding attributes that are in the system. For instance, you can go to your campaign entity and update a campaign name manually (just be sure to hit “save”).
Example 2: Transformation Groups
Question: Our conversion events or action tags are more granular than we require for our Executive dashboard. Can we create a more high-level classification in Datorama from our source data?
Yes. Transformation Groups offer the ideal approach when you want to aggregate various dimension values into a higher order classification.
Using the example in the question, let’s say you’re an insurance company measuring the performance of your campaigns. The action tags you have are great for a deep-dive analysis but they cover every stage in the funnel: from landing on the homepage to starting a request-for-quote summary, to submitting said request, to becoming a customer. Ultimately for your CMO dashboard, you wish to show a topline summary of the action events that convey your campaign’s overall effectiveness: email sign ups, app downloads, landing page visits, request-a-quotes, and purchases. With Groups, you can easily create new classifications for the sign ups, downloads, purchases, etc. derived from the more granular action tags by simply highlighting the set of tags that belong to a group, naming the selected set, saving, and repeating until all tags are accounted for. As new action tags run, you can simply add them to an existing group by clicking on the tag and attributing it to the pre-defined category. The end result is analysis at the action tag and action tag group level for this example.
Example 3: Transformation Rules
Question: Our campaign data sources contain our “Summer 2017” campaign data in different naming conventions across various channels and platforms, but they all end in “SS17.” How do we align all of these into one “campaign type”?
Transformation Rules support use of one or more conditional rules that govern what to do if the dimensional value “contains,” “equals,” “starts with,” “ends with”, or matches a certain nomenclature pattern – i.e. a pre-defined taxonomy of say 10 positions in the campaign name each separated by underscores. We can simply create a rule that says for every campaign name in our source data that ends with “SS17,” set the “campaign type” to “Summer 2017.” If you have more than one pattern or logic-based condition to manage, you can include additional criteria that can either have relationships of AND or OR between them – like, if campaign ends with “SS17” OR campaign contains “Summer 17” somewhere else within the campaign. Finally, if in our scenario, “SS17” always appears in the second position of the campaign name, these rules can also be instructed to look into that position, extract the information, and relabel it as “Summer 2017” in a brand new column without modifying the existing structure of the campaign name. Transformation rules like Groups are simple to use, but with a little more flexibility.
Example 4: Calculated Dimensions
Question: Our campaign naming conventions are very different across our data sources. Is there a more flexible way for us to capture all of the different rules in one editable statement?
Like Transformation Groups and Transformation Rules, Calculated dimensions let you classify data– but with even more flexibility. They give you a formula expression box that supports a variety of common functions – IF, EXTRACT, REPLACE, GROUPCONCAT, SELECT – and operators – contains, equals, etc. so that you can create multiple rules at once. For instance, let’s say for your “Summer 2017” campaigns, your Facebook data uses “SS17” somewhere in the campaign name, your Twitter data stores the information in the second position of the nomenclature which is delimited by hyphens, and your YouTube data, like Twitter, also stores the information in the second position of the nomenclature but is separated by underscores instead of hyphens. In one piece of the expression, you can orchestrate a multi-step process, first using the REPLACE formula to normalize the Twitter and YouTube delimiters to always exist as underscores and then proceeding to EXTRACT position 2. In the second piece of the expression you can check if Facebook campaigns ‘contain’ “SS17” and if it does, set the calculated dimension to return “Summer 2017”. Best of all, calculated dimensions update every time you look at your data, so any edits to your raw data will automatically flow into the calculated dimension.
Additional Classification Approaches
In addition to the classification methods outlined above, there are two more ways I’ll share today that are slightly more advanced.
Example 5: Dimension-only Generic Data Stream Types and VLOOKUP
Question: We have a set table of classifications. Can we simply reference these when data loads to pull in the right classification values?
Yes. In addition to all of the standard data stream types in Datorama (social, web analytics, ads, email, etc.), you can create a generic data stream to house a traditional “lookup” table. You can then have other data streams explicitly reference this lookup table using Datorama’s VLOOKUP function. This method is handy for changing classifications on data load within a specific data stream. As an example, your dimension only data stream could have campaign names and their corresponding normalized, client-friendly campaign names. Your ad server data can then use the live campaign names as a reference point, find them in the dimension only lookup table and instead return the client-friendly version of the name.
Example 6: Dimension-only data streams using parent/child relationships
Question: We like the dimensions we get through our API connection, but we want to pull in more fields that are only available through an exported report. Can we easily blend the API and report data?
Yes. This approach lets you conveniently expand upon an API’s offering. Using a “parent/child relationship”, you can append the report’s additional dimension columns to the API’s dimensions based on a common key. This is essentially using Datorama’s native parent/child feature to perform the VLOOKUP for you, so the new column(s) of data get added to your data automatically. The result is a “custom” hybrid data source that blends your API data with additional classifications in your exported report data via TotalConnect.
Pro Tip: In cases where, placements/campaigns have nomenclature issues, a supplementary child stream can be helpful to make changes. You can simply run reports filtered to make edits to these entities, and then re-upload via the child stream.
Thanks for reading. If you want to learn more about classifications, check out the Knowledge Base in Datorama on Classifications, Groups, Transformations, and Calculated Dimensions — or send a note to your Client Success Manager or support with any specific questions you have.
Make sure you stay up-to-date with all the latest AdTech, MarTech, Data and Datorama-related news by following us on Twitter, Facebook and LinkedIn.