Data may be collected at multiple levels, for example, one of the files contains information about respondents and a separate file contains information about their consumption occasions. If this is the case, you can append data across multiple levels by specifying the unit of count.
Append allows you to add new variables to respondents or cases within a project when information of common respondents is captured in separate data sources.
In this article
- Link Sources
- Define sources
- Append - select a unique identifier
1. Link Sources
You can append data sources when you create a project for the first time or append to a project that already contains the primary source. Regardless, the first step is to upload or connect to the data sources that are required in your project.
Once loaded, you need to identify the primary or parent source, and using the three-dot menu on the source tile, select the Link option.
Selecting the link option opens the append data wizard
The first step is to link your sources based on their hierarchy. The source you have identified as the primary or parent source appears on the right column of the wizard. All other sources appear on the left.
You need to select each of the secondary sources you want to append to your primary source. Once selected, on the right column, you will notice the linking hierarchy of sources.
After selecting the secondary sources, they will appear underneath the parent source as children.
In a hierarchical database model:
- The primary source or parent can have multiple child records.
- At the same time, each child can become the parent source for another child.
- However, each child record can only have one parent.
To create a different hierarchy, if that is what the data set requires, you need to drag and drop the child source onto the new parent. In the example below, Occasions becomes the Drinks parent.
- Ticking the orange box of the second child source (on the left side called Drinks) will move the source to the right side at the same level as the first child source (it will sit directly under the Occasions and be a child of Main).
- Dragging the second child source (Drinks) and dropping it onto the first child source (Occasions) creates another level in the hierarchy. The first child source (Occasions) is a parent to the second child source (Drinks). Drinks sits slightly indented below Occasions to indicate the relationship.
Unit of Count
When using append to link your data sources, you need to take into account what each record in each of the data files represents, whether it be a respondent (person), or an occasion, etc.
If the unit of count of the secondary source is:
- The same as the primary source, usually respondents, you don't need to enter any information and can proceed to the next step. Learn more.
- Different from the primary source (e.g., occasions), you need to define the unit of count for each of the sources before proceeding to the next step.
To define the unit of count you need to enter the word that best describes the counts for each level. You can make these counts plural and the option to provide a singular version of the count will be available within the project tree when loading the project. Learn more.
In this example:
- People - for the main file which contains information about respondents.
- Occasions - for the file which contains the information about consumption occasions
- Drinks - for the file which contains the information about the drinks consumed on each occasion.
If you have three levels of data, the append wizard won't allow you to continue to the next step until you define the relevant units of count.
But if you have two levels of data, the wizard will allow you to proceed without entering the Unit of Count. It is critical to enter the Unit of Count if you are creating a multi-level database.
Link Sources Example
In this example, data has been collected at multiple levels.
- The main file contains information about the respondents (people). This is the primary or parent source.
- The occasions file contains information about the consumption occasions (occasions) of these people. This is a secondary or child source, where the parent is the main file.
- The drinks file contains information about the drinks consumed (drinks) in each of the consumption occasions. This is also a secondary or child source, but in this case, the parent is the occasions file.
After selecting the link option on the primary file, work through the wizard.
- Select the secondary source (s).
- Ensure they are in the correct hierarchy. If you need to create an additional hierarchy, drag and drop the child source into the relevant parent.
- Define the unit of count for each of the sources.
- Proceed to the Next step.
2. Define - data types for delimited sources
Harmoni automatically maps variables There are six variable types in Harmoni: Headings, Axes, Grids, Measures, Weights and Verbatims. Learn more about Harmoni Variable Types. when data sources contain an inherent dictionary.Meta-data to guide interpretation Learn more about source dictionaries.
When this is the case, the append wizard displays a message indicating there are no delimited sources and that you can carry on to the next step.
3. Append - Select a unique identifier
When appending variables to existing respondents with the same unit of count (e.g., records in all sources represent respondents), you can match records based on the order or a unique identifier. Learn more.
However, when you have multiple units of count (i.e. multi-level and you have correctly filled in the Units of Count) you only have the option to match based on unique identifiers. That is, how the parent source will be linked to the child source.
Harmoni will ask you to select the primary or parent sources' identifier as well as the secondary or child sources. Harmoni will then match the records based on this identifier.
- A unique common identifier must exist across the sources.
- The unique identifier:
- Can be either a measure (numeric) or verbatim (text), but it must be the same type across the sources you want to append.
- Must not have blanks, else Harmoni will flag them as duplicates if more than one is found.
- If the identifier is unique and can be matched, the sources will append.
- If the identifier is not unique (i.e., repeated across multiple records), the warning: "We have found duplicated records identifiers in your sources." In this case, you will need to insert unique identifiers in both your primary and secondary sources.
- Orphaned records are ignored.
Selecting unique identifiers
The source you have identified as the primary or parent source appears on the left column of the wizard. Child sources display on the right.
When you have more than two levels (i.e., a secondary source becomes the parent of another source), you can use the parent source drop-down to choose the relevant source. In the same way, if you have multiple children to link to a parent source, you can select using the drop-down to choose the relevant source.
An * in either the parent or the child source columns indicates there are sources that still need to be linked. The wizard won't allow you to continue unless all unique identifiers have been defined.
- The parent source (Main) will link to the child source (Occasions) though the unique identifier LinkID. LinkID exists in the occasion file to link back to its corresponding respondent.
- The parent source (Occasions) will link to the child source (Drinks) through the unique identifier LinkID2. LinkID2 exists in the drinks file to link back to its corresponding consumption occasion.
If the identifier is unique and can be matched, the sources will append and once the append process is complete you will be taken back to the PROJECTS area.