A data source is the starting point for all you do in Harmoni. A source is a collection of data that is either in an imported file or a connected data store.
A project contains one or more data sources.
Harmoni can use sources in any language, including languages requiring multi-byte storage (e.g., traditional Chinese).
Harmoni automatically maps source variables into Harmoni types when data sources contain an inherent dictionary. A dictionary contains meta-data to guide interpretation. Variables in sources without an inherent dictionary need to be defined.
In this article
Data Sources - Project Size and Performance
1. Source Types
a. Imported into Harmoni (Upload)
Imported or uploaded files capture data for a moment in time.
Source Type |
Description |
Source Dictionary |
SPSS data |
|
Contains an inherent source dictionary |
Excel files |
|
Needs to be defined |
Comma-delimited |
|
Needs to be defined |
Tab-delimited |
|
Needs to be defined |
Dimensions data(Currently available as a service) |
|
Contains an inherent source dictionary |
b. Direct Connections to Harmoni (Connect)
Direct connections allow updated data to flow into Harmoni in real-time.
- Achieved using APIs (Application Programming Interface).
- If an API is available in a data collection system, we can potentially develop a direct connection to Harmoni.
Harmoni currently supports direct connections to:
- Tables and Views in SQL Server
- Projects in Decipher
- Projects in Voxco
- Projects in Qualtrics
- Projects in SurveyMonkey
2. Source Structures
Harmoni supports both record level/respondent level data and summary or aggregate level data.
a. Record Level/Respondent Level Data
- Each record represents a respondent, a log entry, a sales transaction, etc.
- Analyzing the count of records is relevant - i.e., How many respondents?
b. Summary Level/Aggregate Level Data
- Summed and/or categorized data that can answer research questions about populations or groups of organizations.
- The data has been compiled from record-level data.
- Analyzing the count of records is irrelevant.
- The analysis only has meaning when a measure is applied, e.g., scores, GRPs, Spend, etc.
Each record in a source corresponds to a count in Harmoni.
The process of importing aggregate data into Harmoni is just the same as for respondent level. If the data respondent and aggregated data are in the same project you need to ensure that all analyses use appropriate bases, e.g. ensure the base for the survey data doesn’t now include the aggregate record counts.
Dashboards can include analyses from different projects, so if the data is feeding a separate analysis on the same page, it doesn’t have to be in the same project. Please note that if you want to apply page level filters, they will need to be available in all analyses on the page, so you would need to create ‘dummy’ filters (i.e. have the same variable and element labels, but include all records in them) to apply the filters to the aggregate data.
You need a field in the data for each value you want to report, and each cut you want to report it by.
The data drives the possibilities, so if you want to have the filters change the data, you need to have the results for each of the filters within the aggregate source.
For example, if you are reporting the Share of Voice for five brands, the source data could look as simple as:
In Harmoni, Brand would be a standard categorical axis, and you want the Value as a Measure. You could then drag Brand into the down and Value into the measure drop zone. The Brand 3 row would then show 900 in the counts (123), and when you select ∑%, you’d see the Share of Voice, i.e., 45% (900/2000), where 2000 is the sum of all the values. Learn more about calculation types.
If you also need to slice the result by another dimension, it is just a case of adding that to the data, e.g., Month.
In Harmoni, you would now have two standard axes and the Value as a Measure. You could then drag Brand into the down, Month into the across, and Value into the measure drop zone. The Brand 3 row would then show:
- Total Column
- ∑ = 1400
- ∑%, 38% (1400/3700)
- Jan 2021 Column
- ∑ = 900
- ∑% = 45% (900/2000)
- Feb 2021 Column
- ∑ = 500
- ∑%= 29% (500/1700)
Keep in mind that in an analysis from aggregate data, the counts (and therefore %) are the count of records, which is typically meaningless. It is just the calculation types that look at the values that will be useful, i.e. ∑, ∑%, AVG, etc. Learn more about calculation types.
3. Data Sources - Project Size and Performance
In Harmoni, the size of the project has no technical limitations. However, the number of records and the number of sources influence the time required for analysis.
We recommend taking into account the below considerations.
Project
The number of variables in the dictionary, and particularly the number of user-created variables, is the biggest influencer on response time.
We recommend keeping the dictionary to less than 500,000 items. The following are considered to be items in the dictionary:
- Each continuous variable = 1 item, e.g. Number of Shoes owned
- Each category in categorical variables = 1 item, e.g. Gender.Male
- Each cell in a yes/no matrix variable = 1 item, e.g. Brand.Attribute
- Each scale-point in a scale matrix variable = 1 item, e.g. Brand.Attribute.ScalePt
- Each verbatim = 1 item
Analysis
The more items included in the analysis (cube), the longer it will take to run and render it to the screen. For ease of interpretation, we recommend keeping the number of cells in the analysis under 10,000 cells (e.g., 100 rows x 100 columns). However, performance degradation will not occur until well beyond that.
Where to from here?