Chapter 11 The ROCK standard

The ROCK standard has been developed as an open standard for qualitative data analysis. It follows the principles that also guided the development of the Markdown and YAML standards: prioritizing human-readability while retaining machine-readability. The aim of the ROCK is to provide a standard that enables researchers to exchange data and analyses in a format that is readable even without running any specific software. In other words, coded transcripts should be readable as is.

This open standard enables development of programs or scripts to perform specific functions that are not yet present in any of the existing applications that support the ROCK format. In addition, this enables all existing qualitative data analysis programs to import data files in this format and to export to this format.

In this chapter, the vocabulary explained in Chapter 10 is used to describe the ROCK standard. Qualitative data files that implement the ROCK standard can be recognized by their extension: .rock. These files normally follow the conventions set out in this chapter.

11.0.1 Sources, utterances, and sections

Sources are plain-text files that contain qualitative data. These data are segmented into the smallest codable unit which is called an utterance. In the ROCK standard, utterances are separated by a newline character (“\n”), which means that every line in a source is an utterance. Often, utterances will be sentences, but not necessarily.

A source will often, but not necessarily, contain data that is somehow related. For example, different interview transcripts may be stored in different sources (i.e. different plain-text files). It is also possible to split up data over several sources, or combine data from multiple data collections in one source. A source can be seen as a logistical necessity, but sources do not have meaning: in the ROCK standard, whether two utterances are in the same source or not is not relevant.

There are two approaches to represent meaningful grouping of utterances. One is using persistent class instance identifiers, which will be discussed below. The other is using sections: sections segment the data. Section breaks occur between two utterances, separated from those utterances by newline characters. They start with three dashes and a smaller-than sign, followed by an identifier for the section break, and end with a greater-than sign and three dashes. For example, valid section breaks are ---<turn_of_talk>--- and ---<new_question>---.

11.0.2 Codes

Codes are identified by code identifiers. Code identifiers are unique strings of characters (specifically, lower or uppercase Latin letters, Arabic numerals, and underscores) to represent a code in sources. These identifiers are placed in between two pairs of square brackets ([[ and ]]). Codes are designated per utterance, or in other words, per line. As many codes can be specified per line as one wishes. For example, see these two lines (utterances):

So what went right [[reflection_positive]]
What went wrong [[reflection_negative]]

The first line is coded with reflection_positive, and the second line with code reflection_negative.

11.0.3 Structuring inductive codes

When engaging in inductive coding (i.e. when not working with a prespecified code structure, but instead developing the code structure as one goes along; see the section below re: deductive coding), to represent the hierarchical structure of the codes, a greater-than sign (“>”) can be used. For example, perhaps a researcher wants to specify a parent code such as reflection with two child codes such as positive and negative. This helps one to identify patterns in the data, and makes it possible to easily extract all utterances coded as any type of reflection. For example, see the same fragment but coded in two levels:

So what went right [[reflection>positive]]
What went wrong [[reflection>negative]]

When this source is parsed by rock, it will recognize these codes and their structure, and it will generate the corresponding hierarchical coding structure, as illustrated in the more extensive example below.

11.0.4 Class instance identifiers

It is often desirable to attach specific attributes to utterances. For example, one may want to compare the patterns in codes between different categories of participants, such as those who do and do not own a car, or those that listen to progressive metal versus those that listen to psychedelic trance. Instead of coding all utterances with all relevant attributes, instead, it is possible to specify class instance identifiers to easily link utterances to characteristics of the data provision (such as data providers, for example participants, or the moment of data collection, for example daytime or nighttime, or winter or summer, or the location of data collection, such as in a busy place or in a silent office).

By default, three types of class instance identifiers are specified: case identifiers, coder identifiers, and item identifiers. They are again specified using two pairs of square brackets, but this time, the opening brackets are immediately followed by a string of identifying characters (the class instance identifier), followed by an equals sign, and then by the unique identifier. This may seem a bit abstract; it will become clearer as we look at the first example.

11.0.4.1 Case identifiers

Case identifiers can be used to link utterances to data providers, such as participants. Their class instance identifier is cid, and by default, their full regular expression is \[\[cid=([a-zA-Z0-9_]+)\]\]. A source excerpt coded with only case identifiers may look like this:

CAIAPHAS: No, wait! We need a more permanent solution to our problem. [[cid=1]]

ANNAS: What then to do about Jesus of Nazareth? Miracle wonderman, hero of fools. [[cid=2]]

PRIEST THREE: No riots, no army, no fighting, no slogans. [[cid=3]]

CAIAPHAS: One thing I'll say for him -- Jesus is cool. [[cid=1]]

ANNAS: We dare not leave him to his own devices. His half-witted fans will get out of control. [[cid=2]]

(Note that in this example, the names of the participants were retained; normally, the researcher would anonymize the transcripts so as to allow publication of the coded transcripts.)

When rock parses this source, it will know that the first and fourth utterances belong to the same case, as do the second and fifth. The attributes specified for these cases will then be attached to these utterances (see the section about attributes below).

Class instance identifiers have a shorthand alias, which is used in the codes themselves (in this example, cid), and a longer version, which for case identifiers, is caseId. This longer version is used when specifying the attributes (see the section below).

11.0.4.2 Stanza identifiers

A stanza is a unit of analysis in ENA analysis (see the glossary for the exact definition).

11.0.5 Specifying deductive coding structures

When a researcher works with a prespecified coding structure (i.e. engages in deductive coding), they only use codes that were determined a priori. Like in inductive coding, there are often multiple levels in such a coding structure, with the codes organised hierarchically. To efficiently be able to collapse codes to higher levels, rock needs to know the deductive coding structure. This can be specified using YAML fragments in the sources. YAML fragments are, by default, delimited by two lines that each contain only three dashes (---). Between those delimiters, YAML (a recursive acronym that stands for ‘YAML ain’t markup language’) can be specified. Specifically, in YAML terminology, each fragment should be a sequence of mappings that is named codes.

The code tree specified in the section on inductive coding, for example, can be efficiently specified as a deductive coding structure like this:

---
codes:
  -
    id: reflection
    children:
      -
        id: positive
      -
        id: negative
---

If all children of a code are so-called ‘leaves’ (i.e. in the code tree, they have no children of their own^) they can be specified more efficiently:

---
codes:
  -
    id: reflection
    children: ["positive", "negative"]
---

When rock parses the sources, it will collect all such code specifications and combined them into one coding three using each code’s identifiers. It is possible to specify a parent in other code specification fragment by adding the field parentId. For example, in another source, we could add this fragment:

---
codes:
  -
    id: neutral
    parentId: reflection
---

This would add neutral as a sibling to positive and negative.

11.0.6 Specifying attributes

Attributes are also specified using YAML fragments in one or more sources. These fragments have to start with ROCK_attributes, and have to contain the long version of the class instance identifiers. By default, the long version of the case identifier is “caseId” (the shorthand alias, used when coding, is “cid”). Each of the attributes that is specified will appear in the qualitative data table as a column.

---
ROCK_attributes:
  -
    caseId: 1
    hair_color: grey
    age: 50
  -
    caseId: 2
    hair_color: brown
    age: 40
  -
    caseId: 3
    hair_color: red
    age: 45
---

11.1 Examples

11.1.1 Section breaks

So what went right
What went wrong
---paragraph-break---
Was it a story
or was it a song
---paragraph-break---
Was it over night
Or did it take you long
---paragraph-break---
Was knowing your weakness
what made you strong

Source excerpt as example of section breaks (lyrics from Smiley Faces by Gnarls Barclay)

11.1.2 Identifiers

CAIAPHAS
No, wait!   We need a more permanent solution to our problem.

ANNAS
What then to do about Jesus of Nazareth?   Miracle wonderman, hero of fools.

PRIEST THREE
No riots, no army, no fighting, no slogans.

CAIAPHAS
One thing I'll say for him -- Jesus is cool.

ANNAS
We dare not leave him to his own devices.   His half-witted fans will get out of control.

PRIESTS
But how can we stop him?   His glamour increases by leaps every moment; he's top of the poll.

CAIAPHAS
I see bad things arising.   The crowd crown him king; which the Romans would ban.
I see blood and destruction,   Our elimination because of one man.   Blood and destruction because of one man.

ALL (inside)
Because, because, because of one man.

CAIAPHAS
Our elimination because of one man.

ALL (inside)
Because, because, because of one, 'cause of one, 'cause of one man.

PRIEST THREE
What then to do about this Jesus-mania?

ANNAS
How do we deal with a carpenter king?

PRIESTS
Where do we start with a man who is bigger   Than John was when John did his baptism thing?

CAIAPHAS
Fools, you have no perception!   The stakes we are gambling are frighteningly high!
We must crush him completely,   So like John before him, this Jesus must die.   For the sake of the nation, this Jesus must die.

This Jesus Must Die by Andrew Lloyd Webber

The ROCK book