A data source is the starting point for all you do in Harmoni. A source is a collection of data that is either in an imported file or a connected data store.
A project contains one or more data sources.
Harmoni can use sources in any language, including languages requiring multi-byte storage (e.g., traditional Chinese).
Harmoni automatically maps source variables into Harmoni types when data sources contain an inherent dictionary. A dictionary contains meta-data to guide interpretation. Variables in sources without an inherent dictionary need to be defined.
In this article
- Source types
- Supported source structures
- Data Sources - Project Size and Performance
1. Source Types
a. Imported into Harmoni (Upload)
Imported or uploaded files capture data for a moment in time.
Source Type Description
Source Dictionary
SPSS data
-
SPSS.sav format.
-
Can be generated from a variety of data collection applications.
-
Typically contain record level data with a record representing a respondent.
Contains an inherent source dictionary Excel files
-
XLSX – Excel files.
-
Primarily created from Excel in Office 2007+.
-
Files must be XLXS.
Needs to be defined Comma-delimited
-
CSV files.
-
Can be generated from a variety of systems.
-
Do not have market research specific structures.
Needs to be defined Tab-delimited
-
TXT files.
-
Can be generated from a variety of systems.
-
Do not have market research specific structures.
Needs to be defined Dimensions data
(Currently available as a service)
-
Dimensions .mdd and .ddf format.
-
Typically created by Dimensions.
-
Typically contain record level data with a record representing a respondent.
Contains an inherent source dictionary b. Direct Connections to Harmoni (Connect)
Direct connections allow updated data to flow into Harmoni in real-time.
- Achieved using APIs (Application Programming Interface).
- If an API is available in a data collection system, we can potentially develop a direct connection to Harmoni.
Harmoni currently supports direct connections to:
- Tables and Views in SQL Server.
- Projects in Decipher.
- Projects in Voxco.
- Projects in Qualtrics.
2. Source Structures
Harmoni supports both record level data and summary or aggregate level data.
a. Record Level Data
- Each record represents a respondent, a log entry, a sales transaction, etc.
- Analyzing the count of records is relevant - i.e., How many respondents?
b. Summary Level/Aggregate Level Data
- Summed and/or categorized data that can answer research questions about populations or groups of organizations.
- The data has been compiled from record-level data.
- Analyzing the count of records is irrelevant.
- The analysis only has meaning when a measure is applied, e.g., scores, GRPs, Spend, etc.
Each record in a source corresponds to a count in Harmoni.
3. Data Sources - Project Size and Performance
In Harmoni the size of the project has no technical limitations. However, the number of records and the number of sources influence the time required for analysis.
We recommend taking into account the below considerations.
Project
The number of variables in the dictionary, and particularly the number of user-created variables, is the biggest influencer on response time.
We recommend keeping the dictionary to less than 500,000 items. The following are considered to be items in the dictionary:
- Each continuous variable = 1 item, e.g. Number of Shoes owned
- Each category in categorical variables = 1 item, e.g. Gender.Male
- Each cell in a yes/no matrix variable = 1 item, e.g. Brand.Attribute
- Each scale-point in a scale matrix variable = 1 item, e.g. Brand.Attribute.ScalePt
- Each verbatim = 1 item
Analysis
The more items included in the analysis (cube), the longer it will take to run and render it to the screen. For ease of interpretation, we recommend keeping the number of cells in the analysis under 10,000 cells (e.g., 100 rows x 100 columns). However, performance degradation will not occur until well beyond that.
Where to from here?
-