In Harmoni, you can use data from a range of different file-types as the source for new projects, visualizations, and dashboards.
Learn a few tips and tricks on how to best prepare your data before importing or connecting to Harmoni.
In this article
Imported Data Sources
1. SPSS Data - SPSS.sav format
Variable types
Ensure that the variable type assigned to the variable in SPSS is aligned with how the variable is to be analyzed within Harmoni.
Standard Axes
-
To see named elements appear in Harmoni, at least one value in the variable must have a value label in SPSS.
-
SPSS variables that have a label assigned to at least one value and are a numeric type will appear as standard axes in Harmoni.
- Please ensure values have labels unless they are variables with true numeric values i.e., exact age, volume, number of people, etc.
Measures
-
SPSS variables that no value labels and are numeric type will appear as measures in Harmoni.
-
SPSS variables that no value labels and are date type will appear as measures in Harmoni.
- SPSS has a particular way to store the dates. SPSS date variables are numeric variables; their actual values are just numbers. These values are what Harmoni has stored for the date measure. In SPSS, the date values are the number of seconds between the year 1582 and the start (midnight) of a given date." Note that one day is 60 (seconds) * 60 (minutes) * 24 (hours) = 86400.
Verbatims
-
SPSS variables that have no value labels and are a string type will appear as text items in Harmoni e.g., open-ended questions with the actual text.
- In the case of string type with same label, if the first 5 characters of the variable name are the same, they will be combined together.
Variable labels
For ease of use and to minimize relabeling work in Harmoni, consideration should be given to labeling options within the sav file.
- Harmoni presents the variable labels in the sav file within the data tree.
- Variable labels should, therefore, be meaningful and include the question number from the questionnaire.
- Duplicate labels that don’t appear sequentially will include their variable name in the Harmoni label.
Automation
-
To automate the creation of a gridded item, the variables in SPSS need to have the same (or at least 80% similar) element labels.
-
To automate the creation of a combined item for multiple response variables, the variables in SPSS need to appear sequentially and have the same variable label, the value labels can vary across the variables.
-
When automation is applied to the source data load, multiple response sets constructed in SPSS appear as multiple response variables within Harmoni. Any variables used in a multiple response set are not displayed in their original state within Harmoni
Multiple response sets
A multiple response set can be defined in a sav file, which will result in multiple response variables being available within Harmoni. Multiple response sets:
- will appear at the bottom of the Harmoni Project.
- are automatically recognized and generate multiple response items in Harmoni when the import automation option is on.
Multiple response sets are either defined using either a Dichotomy or a Category
a) Dichotomy
- A multiple dichotomy set typically consists of multiple dichotomous variables.
- Used if the source variables identify the ‘brand’, with responses Yes, and No.
- In the multiple dichotomy set, the Counted Value is 1.
- Multiple response variables that have been created using a dichotomy definition bring only Yes’s.
b) Category
- Used if the source variables identify the 1st response, 2nd response, etc. and the responses identify the ‘brand’. Note, it’s not always actually a brand, but hopefully, the ‘brand’ example is one that resonates!
- Multiple response variables that have been created using a category definition bring all value labels through into the multiple response items. However, if all the variables being brought together do not have identical value labels and codes, the Multiple Response Set merges based on values and just assigns the labels from the first variable in the group. This can cause items to merge that really shouldn’t. Please ensure that any multiple response sets identified using the category definition have identical value codes and labels across the variables being included in the set.
Defining a Multiple Response Set in SPSS
- In SPSS, from the menus, choose: Data > Define Multiple Response Sets...
- Select two or more variables. If your variables are coded as dichotomies, indicate which value you want to have counted i.e. 1 for yes (checked.)
- Under SetName: Enter a unique name for each multiple response set. The name can be up to 63 bytes long. A dollar sign is automatically added to the beginning of the set name.
- Under SetLabel: Enter a descriptive label for the set. (This is optional.)
- Click Add to add the multiple response set to the list of defined sets.
For Multi response sets, the Set Name must be unique and not have the same name as any variable label.
Other considerations
-
Coded responses must have labels.
-
Looped questions must have a separate variable for each loop combination.
-
Cleaned data, with only valid respondents. All respondents asked the question should have a response. Respondents who were not asked the question don’t have any response.
-
Ensure files are not encrypted.
-
One reason SAV files may increase in size is if their strings are set to be way longer than they need to be. Below is an SPSS script you can use to adjust the length of the string to be the size of the longest string; you need to specify the location and name of the input file as well as the location and name of the output file.
GET FILE = 'Drive:\Folder\Filename.sav'.
ALTER TYPE ALL (A=AMIN).
SAVE OUTFILE = 'Drive:\Folder\NewFileName.sav'.
2. Excel, Comma and Tab Delimited, XLSX, CSV or TXT files
-
Excel files must be in XLSX format. When you add an Excel source to a project in Harmoni only the first worksheet is imported.
-
Comma-delimited files must have a CSV extension.
-
Tab-delimited files must have a TXT extension.
-
The first row must contain headers with unique, non-blank descriptions.
-
Data types can be specified/overwritten in the import process. These are the defaults:
-
Heading labels that start with $ default to a measure.
-
Heading labels that start with $weight default to a weight.
-
Heading labels that start with & default to a verbatim.
-
Everything else defaults to a standard axis.
-
-
Fields should contain labels, not codes. If codes are used, they will need to be renamed to labels after importing into Harmoni.
-
Cells that contain text characters in a field flagged as a measure are not imported.
-
Fields cannot contain line feeds, carriage returns, non-printable characters, or | characters.
To remove line feeds in Excel:
-
- Press Ctrl+H to open the Find & Replace dialog box.
- In the Find What field enter Ctrl+J. It will look empty, but you will see a tiny dot.
- In the Replace With field, enter any value. Usually, it is space to avoid 2 words join accidentally. If all you need is deleting the line breaks, leave the "Replace With" field empty.
- Press the Replace All button.
- If you want to remove all non-printable characters from text, including carriage returns, you can use the clean function in excel.
=CLEAN(B2)
Notepad++ is a faster solution for source files with numerous columns with carriage returns. General steps for this:
- Save your file as a .csv format
- Open .csv file using Notepad++
- Type CTRL+F to open the Find and Replace window
- Find "\r" and replace it with a blank space. Use the following selections.
This will remove the carriage returns from your file and the data will look properly inline. You can load this .csv file to Harmoni.
Other considerations
- Fields containing commas or tabs in Comma and Tab Delimited file must be contained in double-quotes.
- Fields contain labels, not codes. If codes are used, they will need to be renamed to labels after importing.
- After the heading row, each row constitutes an unweighted count in Harmoni.
- Blank rows are read as an unweighted count, so are included in the base.
- Time/Date fields contain descriptions, not numbers.
- Numeric values don’t contain text, e.g. $, commas, %.
- If there are any foreign or special characters encoding your files into a UTF-8 format (UTF-8 to ensure anything you upload into Harmoni can be read and displayed properly.
3. Dimensions Data - Dimensions .mdd and .ddf format
Currently available as a service.
Multiple response variables don’t require user intervention. They will import into Harmoni directly as combined axes directly from the data file.
Dimensions data files need to be run through a converter before loading into Harmoni. The converter will be incorporated directly within Harmoni in the near future. At this stage, Dimensions data needs to be converted by Infotools before it can be loaded.
Harmoni displays the description labels from the Dimensions file.
4. Infotools X-files
- X-files are typically already designed.
- Xbf, Xdf, and Xef must be uploaded (Xlk is optional).
- Vaxes must be in an indexless format (achieved using SUP -Clean).
- Each item within a common type needs a unique label.
- Headers need to be new style, i.e., version 4+.
- Matrices, yngrids, and vaxes with predefined bases are not supported.
Direct Connections
Each Direct Connection is developed to map the data types in the data collection system with the data types available in Harmoni. Direct Connections allow market research specific structures, such as grids and multiple responses, to retain their internal relationships.
1. SQL Structure
- All relevant data exists in a single view or table.
- All data associated with a unique record is reported in a single table or view.
- Key identifiers can be added to field names to advise Harmoni on the default type to import.
- Fields contain labels, not codes. If codes are used, they will need to be renamed to labels after importing.
- Keep in mind that field names in SQL are limited to 128 characters, and cannot contain |.
2. Decipher
-
To connect to a Decipher source you’ll need to know the server the data resides on and the 64 character API key of a user that has access to the required source.
- You can find information on how to generate the API key here:
https://decipher.zendesk.com/hc/en-us/articles/360010277813-Decipher-REST-API-2-0
- Before connecting, you can select the variables you want to include in your project: Qualifiers, system and additional variables as well as pipe variables.
-
Once you add the sources to the project, each variable in Decipher will appear in the project tree using the Harmoni data type that best aligns with the Decipher data type.
Adding new variable types to an existing project
For an existing project, if you want to incorporate additional variables (i.e. system or pipe).
- Select view/add sources.
- Select Add/Remove.
- This will open the sources browse area and display the data sources included in your project.
- Choose to Connect.
- Click on the edit icon and then include the desired variable types.
- Once ready select the Update option.
Append sources
With Harmoni, you can append data sources. Append allows you to add new variables to respondents or cases within a project when information of common respondents is captured in separate data sources.
It is also possible to append data sources in Decipher. We only recommend this option when the number of sources becomes too difficult to manage (i.e. instead of 200 sources you end up with 400).
- Decipher can only handle about 150 columns at a time. So if your file has more than 150 columns, you’ll need to break it to a smaller file. Each file has to have the “key variable” that you use, e.g. UUID, psid, source, etc.
- You can prepare a clean Excel file (remove all un-needed variable/columns) rather than clicking on little boxes next to each “imported variable” that you need. If you have a clean Excel file, after you pick the “key variable”, you’ll just only need to click on one little box next to the “imported variable”…it’ll automatically select all the variables in there.
- Make sure your Excel variable is free from unusual characters, e.g. #NULL (for no response) when a .sav file saved as an Excel file…those need to be replaced by space.
3. Voxco
-
To connect to a Voxco source you’ll need to know the server the data resides on and the 140 character API key of a user that has access to the required source.
-
Once you connect to the source, each variable in Voxco will appear in the project, using the Harmoni data type that best aligns with the Voxco data type.
4. Qualtrics
-
To connect to a Qualtics source you’ll need to know your key, login, and password. These will be supplied by Qualtrics to you/your customer directly.
-
You can find information on how to generate the API key here:
https://www.qualtrics.com/support/integrations/api-integration/overview/#GeneratingAnAPIToken
-
Once you connect to the source you will see your projects in Qualtrics and be able to select to connect directly.
- Please note the System Variable type from Qualtrics is not currently supported in Harmoni.
5. SurveyMonkey
- To connect to a SurveyMonkey source, you need to enter the Host name and Access Token.
- Once you establish your connection to SurveyMonkey, you will see your projects/surveys. You can then select the sources you want to use for your Harmoni project.