With Infotools Harmoni you can use data from a range of different file-types as the source for new projects, visualizations, and dashboards.
Learn a few tips and tricks on how to best prepare your data before importing or connecting to Infotools Harmoni.
In this article
- SPSS Data
- Excel, Comma and Tab Delimited Files
- Dimensions Data
- Infotools X-files
- Direct Connections
Currently, we have a 2Gb limit to the size of files uploaded into Harmoni
1. SPSS Data - SPSS.sav format
Ensure the variable type assigned to the variable is aligned with how the variable is to be analyzed within Infotools Harmoni:
To see named elements appear in Harmoni, at least one value in the variable must have a value label in SPSS.
SPSS variables that have a label assigned to at least one value and are a numeric type will appear as standard axes in Infotools Harmoni.
- Please ensure values have labels unless they are variables with true numeric values i.e., exact age, volume, number of people, etc.
SPSS variables that no value labels and are numeric type will appear as measures in Infotools Harmoni.
SPSS variables that no value labels and are date type will appear as measures in Infotools Harmoni.
- SPSS has a particular way to store the dates. SPSS date variables are numeric variables; their actual values are just numbers. These values are what Infotools Harmoni has stored for the date measure. In SPSS, the date values are the number of seconds between the year 1582 and the start (midnight) of a given date." Note that one day is 60 (seconds) * 60 (minutes) * 24 (hours) = 86400.
SPSS variables that have no value labels and are a string type will appear as text items in Infotools Harmoni e.g., open-ended questions with the actual text.
For ease of use and to minimize relabeling work in Infotools Harmoni, consideration should be given to labeling options within the sav file.
- Infotools Harmoni presents the variable labels in the sav file within the data tree.
- Variable labels should, therefore, be meaningful and include the question number from the questionnaire.
To automate the creation of a gridded item, the variables in SPSS need to have the same (or at least 80% similar) element labels.
To automate the creation of a combined item for multiple response variables, the variables in SPSS need to appear sequentially and have the same variable label, the value labels can vary across the variables.
When automation is applied to the source data load, multiple response sets constructed in SPSS appear as multiple response variables within Infotools Harmoni. Any variables used in a multiple response set are not displayed in their original state within Infotools Harmoni
Multiple response sets
A multiple response set can be defined in a sav file, which will result in multiple response variables being available within Infotools Harmoni. Multiple response sets:
- will appear at the bottom of the Infotools Harmoni Project.
- are automatically recognized and generate multiple response items in Infotools Harmoni when the import automation option is on.
Multiple response sets are either defined using either a Dichotomy or a Category
- A multiple dichotomy set typically consists of multiple dichotomous variables.
- Used if the source variables identify the ‘brand’, with responses Yes, and No.
- In the multiple dichotomy set, the Counted Value is 1.
- Multiple response variables that have been created using a dichotomy definition bring only Yes’s.
- Used if the source variables identify the 1st response, 2nd response, etc. and the responses identify the ‘brand’. Note, it’s not always actually a brand, but hopefully, the ‘brand’ example is one that resonates!
- Multiple response variables that have been created using a category definition bring all value labels through into the multiple response items. However, if all the variables being brought together do not have identical value labels and codes, the Multiple Response Set merges based on values and just assigns the labels from the first variable in the group. This can cause items to merge that really shouldn’t. Please ensure that any multiple response sets identified using the category definition have identical value codes and labels across the variables being included in the set.
Defining a Multiple Response Set in SPSS
- In SPSS, from the menus, choose: Data > Define Multiple Response Sets...
- Select two or more variables. If your variables are coded as dichotomies, indicate which value you want to have counted i.e. 1 for yes (checked.)
- Under SetName: Enter a unique name for each multiple response set. The name can be up to 63 bytes long. A dollar sign is automatically added to the beginning of the set name.
- Under SetLabel: Enter a descriptive label for the set. (This is optional.)
- Click Add to add the multiple response set to the list of defined sets.
For Multi response sets, the Set Name must be unique and not have the same name as any of the variable labels.
Coded responses must have labels.
Looped questions must have a separate variable for each loop combination.
Cleaned data, with only valid respondents. All respondents asked the question should have a response. Respondents who were not asked the question don’t have any response.
Ensure files are not encrypted.
Currently, there is a 2Gb limit to the size of files uploaded into Harmoni. One reason SAV files may go over this limit is if their strings are set to be way longer than they need to be. Below is an SPSS script you can use to adjust the length of the string to be the size of the longest string; you need to specify the location and name of the input file as well as the location and name of the output file.
GET FILE = 'Drive:\Folder\Filename.sav'.
ALTER TYPE ALL (A=AMIN).
SAVE OUTFILE = 'Drive:\Folder\NewFileName.sav'.
2. Excel, Comma and Tab Delimited - XLSX, CSV or TXT files
Excel files must be in XLSX format. When you add an Excel source to a project in Infotools Harmoni only the first worksheet is imported.
Comma-delimited files must have a CSV extension.
Tab-delimited files must have a TXT extension.
The first row must contain headers with unique, non-blank descriptions.
Data types can be specified/overwritten in the import process. These are the defaults:
Heading labels that start with $ default to a measure.
Heading labels that start with $weight default to a weight.
Heading labels that start with & default to a verbatim.
Everything else defaults to a standard axis.
Fields should contain labels, not codes. If codes are used, they will need to be renamed to labels after importing into Infotools Harmoni.
Cells that contain text characters in a field flagged as a measure are not imported.
Fields cannot contain line feeds, carriage returns, non printable characters or | characters.To remove line feeds in Excel:
- Press Ctrl+H to open the Find & Replace dialog box.
- In the Find What field enter Ctrl+J. It will look empty, but you will see a tiny dot.
- In the Replace With field, enter any value. Usually, it is space to avoid 2 words join accidentally. If all you need is deleting the line breaks, leave the "Replace With" field empty.
- Press the Replace All button.
- If you want to remove all non-printable characters from text, including carriage returns, you can use the clean function in excel.
Notepad++ is a faster solution for source files with numerous columns with carriage returns. General steps for this:
- Save your file as a .csv format
- Open .csv file using Notepad++
- Type CTRL+F to open the Find and Replace window
- Find "\r" and replace it with a blank space. Use the following selections.
This will remove the carriage returns from your file and the data will look properly inline. You can load this .csv file to Infotools Harmoni.
- Fields containing commas or tabs must be contained in double-quotes.
- Fields contain labels, not codes. If codes are used, they will need to be renamed to labels after importing.
- After the heading row, each row constitutes an unweighted count in Infotools Harmoni.
- Blank rows are read as an unweighted count, so are included in the base.
- Time/Date fields contain descriptions, not numbers.
- Numeric values don’t contain text, e.g. $, commas, %.
- If there are any foreign or special characters encoding your files into a UTF-8 format (UTF-8 to ensure anything you upload into Infotools Harmoni can be read and displayed properly.
3. Dimensions Data - Dimensions .mdd and .ddf format
Currently available as a service.
Multiple response variables don’t require user intervention. They will import into Infotools Harmoni directly as combined axes directly from the data file.
Dimensions data files need to be run through a converter before loading into Infotools Harmoni. The converter will be incorporated directly within Harmoni in the near future. At this stage, Dimensions data needs to be converted by Infotools before it can be loaded.
4. Infotools X-files
- X-files are typically already designed.
- Xbf, Xdf, and Xef must be uploaded (Xlk is optional).
- Vaxes must be in an indexless format (achieved using SUP -Clean).
- Each item within a common type needs a unique label.
- Headers need to be new style, i.e., version 4+.
- Matrices, yngrids, and vaxes with predefined bases are not supported.
5. Direct Connections
Each Direct Connection is developed to map the data types in the data collection system with the data types available in Infotools Harmoni. Direct Connections allow market research specific structures, such as grids and multiple responses, to retain their internal relationships.
All relevant data exists in a single view or table.
All data associated with a unique record is reported in a single table or view.
Key identifiers can be added to field names to advise Infotools Harmoni on the default type to import.
Fields contain labels, not codes. If codes are used, they will need to be renamed to labels after importing.
Keep in mind that field names in SQL are limited to 128 characters, and cannot contain |.
To connect to a Decipher source you’ll need to know the server the data resides on, and the 64 character API key of a user that has access to the required source.
- You can find information on how to generate the API key here:
Once you add the sources to the project, each variable in Decipher will appear in the project tree using the Harmoni data type that best aligns with the Decipher data type.
To connect to a Voxco source you’ll need to know the server the data resides on, and the 140 character API key of a user that has access to the required source.
Once you connect to the source, each variable in Voxco will appear in the project, using the Harmoni data type that best aligns with the Voxco data type.
To connect to a Qualtics source you’ll need to know your key, login and password. These will be supplied by Qualtrics to you/your customer directly.
You can find information on how to generate the API key here:
Once you connect to the source you will see your projects in Qualtrics and be able to select to connect directly.