stype
(pronounced stipe) is an R package for statistical data types. It depends heavily upon the vctrs
package to:
The stype
package provides classes that enforce (run-time) safety for types common to many statistical analyses such as v_binary
, v_continuous
, v_count
, v_nominal
, and v_event_time
. For example, binary data can be represented in R
in at least three ways: a logical
, a factor
with two levels, or a numeric
using just 0
and 1
. Which representation should one use? The latter two do not guarantee that certain binary operations are closed in a mathematical sense; e.g., c(0, 1, 0, 1) + 1:4
returns c(1, 3, 3, 5)
. Such behavior is not possible with v_binary
. Similarly, count data can be represented by an integer
in R
but without the restriction of being non-negative. The v_count
constructor enforces positivity.
Each instance of stype
objects contain 2 attributes that users may find useful: context
and data_summary
. A context
can be used to specify project-specific metadata. It is an S4
object containing slots such as short_label
, long_label
, description
, security_type
, tags
, and purpose
. A purpose
, for example, can be used to define a variable’s role in a study design such as “outcome”, “identifier”, “covariate”, or “exposure”. This kind of contextual information is invaluable in data pipelines.
A stype
vector also contains a data_summary
object, which is automatically generated and contain summary statistics about the data. All objects contain the following statistics:
n
: number of observationshas_missing
: an indicator of whether the variable has missing datan_nonmissing
: the number of nonmissingn_missing
: the number of missingproportion_missing
: the proportion missingis_constant
L an indicator of whether all the values are the sameEach type has additional summary statistics relevant to its data. For example, v_continuous
contains the mean, standard deviation, min, max, and various quantiles. The data_summary
is updated whenever a variable is subset or two vectors of the type are combined.
The package also prints certain attributes, for example:
> stype::v_binary(c(TRUE, FALSE, TRUE))
<binary[3]>
[1] 1 0 1
Proportion = 0.667
> stype::v_binary(c(TRUE, FALSE, TRUE, NA))
<binary[4]>
[1] 1 0 1 NA
Proportion = 0.667; Missing = 1.000