Project Owner

Row-based file formats, also known as Entity-attribute-value data (EAV), organize data by record, keeping all the data associated with a record in the same row. Row based files are the traditional way of organizing data.

1. Data Files

Row-based formats have three data sets that are stored in tabular form in your media of choice, such as text files, SQL databases, AWS, Azure, GCP, NoSQL databases, to name a few.

The three data sets are:

a. Respondent table - contains the respondent IDs

b. Dictionary table - defines the attributes

c. Data table - contains the data values

This section provides information about the three data sets and general guidance about Harmoni requirements for row-based tables.

a. Respondent table

A tabular form of the respondents included in the data, listing all ids. It is important that the data is sorted.

b. Dictionary table

A tabular form of the dictionary that determines the order in which you should have the questions and pre-coded answers.

In addition, it specifies the type of the questions, for example, questions with...

numerical answers (e.g., exact age)
text answers (e.g., email address)
weight (numerical value but marked as weight)
single-response questions with pre-coded answers
multiple-response questions with pre-coded answers.

The table should be in the form:

<Question Label, Answer Label, Question Type, [Question Label Desc.], [Answer Label Desc.]>

(The last two fields are optional)

When there are no answer labels (for those questions with no pre-coded answers, for instance in the case of a question with a numerical answer), the answer label should be null or empty.

Following is a sample table illustrating a dictionary with all possible question types. In this example, question types with numerical answers (e.g., exact age), text answers (e.g., email address), weight (numerical value but marked as weight), single-response questions with pre-coded answers, and multiple-response questions with pre-coded answers, are coded as nv, tv, wt, sr, and mr, respectively. Any user-chosen codes are acceptable, so long as the codes are unique.

The dictionary table needs to be sorted by Question Label then Answer Label in the preferred order.

Question Label	Answer Label	Question Type	Question Label Description (optional)	Answer Label Description (optional)
Respondent ID (text)		tv
Q11	Male	sr	Gender	Males
Q11	Female	sr		Females
Marital Status	Married	sr
Marital Status	Widowed	sr
Marital Status	Divorced	sr
Marital Status	Separated	sr
Marital Status	Never married	sr
Weight		wt	Main Weight
Exact Age		nv
News Sources	Internet	mr	Sources Used
News Sources	Newspapers	mr	Sources Used
News Sources	News magazines	mr	Sources Used
News Sources	Television	mr	Sources Used
News Sources	Radio	mr	Sources Used

Using the two optional columns will produce different labels in Harmoni. For Q11 in the table above, here is an example of the axis and element labels, with and without the optional fields.

Row-based Dictionary table example.png

c. Data Table

A tabular form of the data, this is a long list of answers for each question asked of a respondent. The table should be in the form:

<Respondent ID, Question Label, Answer>

The table should be sorted by respondent ID then by question label.

Respondent ID	Question Label	Answer
0	Respondent ID (text)	0
1	Respondent ID (text)	1
0	Gender	Male
1	Gender	Male
0	Marital Status	Divorced
1	Marital Status	Never married

So as long as the respondent, dictionary, and data tables can be obtained, the actual data can be stored in any format the user chooses.

Data Accessibility

Typically, we envisage the data to be in a storage accessible via SQL commands. The following examples illustrate some sample SQL commands to obtain the respondent count, dictionary table, and data table.

1) caseIDs (respondent IDs)
SELECT distinct CaseIdCol FROM OurDataTable ORDER BY CaseIdCol;

2) Dictionary
SELECT DISTINCT ItemLabelCol, ValueLabelCol, ItemTypeCol, ItemLabelColDesc, ValueLabelColDesc FROM OurDictionaryTable order by ItemLabelColOrder, ElementLabelColOrder;

3) Data
SELECT DISTINCT CaseIdCol, ItemLabelCol, ValueLabelCol FROM OurDataTable ORDER BY CaseIdCol, ItemLabelCol;

Row-based Data storage.png

2. Multi-Level Linking

Continuing on with the example in the section above, the caseIDs are represented in this data as Respondent ID and determine the link to other levels.

Respondent ID	Question Label	Answer
0	Respondent ID (text)	0
1	Respondent ID (text)	1
0	Gender	Male
1	Gender	Male
0	Marital Status	Divorced
1	Marital Status	Never married

The next level dataset down (child) will have its own caseIDs. These caseIDs are represented in the data as Visit ID. The child data will link with the parent data on Respondent ID.

Visit ID	Question Label	Answer
11	Respondent ID	0
12	Respondent ID	0
13	Respondent ID	1
11	Visit ID	11
12	Visit ID	12
13	Visit ID	13
11	Location	Supermarket
12	Location	Convenience Store
13	Location	Supermarket

If more levels are required, for example a purchase level, the same logic applies. The grandchild data will link with the child data on Visit ID.

Purchase ID	Question Label	Answer
0	Visit ID	11
1	Visit ID	11
2	Visit ID	12
3	Visit ID	12
4	Visit ID	12
5	Visit ID	13
0	Purchase ID	0
1	Purchase ID	1
2	Purchase ID	2
3	Purchase ID	3
4	Purchase ID	4
5	Purchase ID	5
0	Beverage	Coffee
1	Beverage	Tea
2	Beverage	Energy Drink
3	Beverage	Flavored Milk
4	Beverage	Sports Drink
5	Beverage	Bottled Water

3. Multiple Datasets on the Same Level

It is also possible to link datasets so that they are created at the same level. As with multi-level datasets, these must be linked by caseIDs. For example, in a dataset for visit levels that is measuring transport methods, this transport method data will act as a child to the existing visit level data, and will link on Visit ID. In this file the caseIDs are 11, 12 and 13.

Visit ID	Question Label	Answer
11	Visit ID	11
12	Visit ID	12
13	Visit ID	13
11	Transport method	Bus
12	Transport method	Walk
13	Transport method	Car

4. Useful Information

Any ID that does not exist in the CaseIDs table will be ignored, even if it appears in the data table.
Any variable not found in the dictionary table will be ignored, even if it exists in the data table.
Any variable listed in the dictionary table that has no corresponding response in the data table will be output with a count of zero.

Where to from here?

Learn more about Harmoni

Row-based data

In this article

1. Data Files

a. Respondent table

b. Dictionary table

c. Data Table

Data Accessibility

2. Multi-Level Linking

3. Multiple Datasets on the Same Level

4. Useful Information

Where to from here?

In this article

1. Data Files

a. Respondent table

b. Dictionary table

c. Data Table

Data Accessibility

2. Multi-Level Linking

3. Multiple Datasets on the Same Level

4. Useful Information

Where to from here?

Related articles