The data in column-based file formats are organized by field, keeping all the data associated with a field next to each other. Column based files are the most common file formats and can provide performance advantages when querying data.
Harmoni accepts column-based data files for uploads or direct connections.
In this article
1. Data Files
A column-based data set can be stored as a single file in tabular form. Harmoni accepts a variety of file types, including SPSS files, Dimensions files, text files, CSV files, Excel files, SQL databases, AWS, Azure, GCP, NoSQL databases, etc.
The first line of the file, or headers of the columns, determine the variable labels. These labels must be unique and non-blank. All other rows are treated as records, with a response in each column.
Example of a column-based data set:
ID | Gender | Age | Comment |
---|---|---|---|
120 | Male | 23 | |
121 | Female | 42 | It was great |
122 | Female | 43 | |
123 | Female | 48 | |
124 | Female | 29 | |
125 | Male | 58 | |
126 | Not Specified | 52 | |
127 | Female | 66 | Too expensive |
128 | Male | 40 |
Variable Order
Variable order in the Harmoni project tree depends on the order of the columns and cannot be determined any other way.
File Formats
In the case of a text file, the columns need to be delimited by either commas or tabs. Comma delimited files must have a CSV extension and tab delimited files must have a TXT extension.
Excel files must be in XLSX format, and only the first worksheet will be imported into Harmoni.
Variable Mapping
An inherent dictionary in file formats such as SPSS and Dimensions convert variables to Harmoni variable types.
In text file formats, there is no dictionary, and the response labels themselves form the dictionary. As an option, variable types can be specified by adding the following prefixes to the variable labels:
- $Weight - weight type
- $ - measure type
- & - verbatim type
These variable types can also be specified in Harmoni when the data is added to a project. Learn more about defining sources in Harmoni.
2. Multi-Level Linking
When dealing with a hierarchical structure with multiple levels, where each level has its own dataset, a linking variable (e.g., ID) must connect each level.
For example the following data set about visits could be linked with the example shown above using the the ID field.
ID | VisitID | Store visited | Time of visit |
---|---|---|---|
120 | 1 | Supermarket | Morning |
120 | 2 | Convenience Store | Morning |
122 | 3 | Convenience Store | Night |
123 | 4 | Convenience Store | Morning |
124 | 5 | Supermarket | Morning |
124 | 6 | Convenience Store | Night |
124 | 7 | Supermarket | Night |
125 | 8 | Supermarket | Day |
128 | 9 | Convenience Store | Day |
And then the following dataset about purchases could be linked to the visit dataset on VisitID.
ID | VisitID | Purchase ID | Beverage purchased |
---|---|---|---|
120 | 1 | 11 | Flavored Milk |
120 | 1 | 12 | Bottled Water |
122 | 3 | 13 | Tea |
123 | 3 | 14 | Flavored Milk |
124 | 5 | 15 | Tea |
124 | 6 | 16 | Coffee |
124 | 7 | 17 | Flavored Milk |
125 | 8 | 18 | Bottled Water |
128 | 8 | 19 | Tea |
Where to from here?
Learn more about Harmoni data sources