Title: | Composite Indicator Construction and Analysis |
---|---|
Description: | A comprehensive high-level package, for composite indicator construction and analysis. It is a "development environment" for composite indicators and scoreboards, which includes utilities for construction (indicator selection, denomination, imputation, data treatment, normalisation, weighting and aggregation) and analysis (multivariate analysis, correlation plotting, short cuts for principal component analysis, global sensitivity analysis, and more). A composite indicator is completely encapsulated inside a single hierarchical list called a "coin". This allows a fast and efficient work flow, as well as making quick copies, testing methodological variations and making comparisons. It also includes many plotting options, both statistical (scatter plots, distribution plots) as well as for presenting results. |
Authors: | William Becker [aut, cre, cph]
|
Maintainer: | William Becker <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.1.14 |
Built: | 2025-01-28 11:33:07 UTC |
Source: | https://github.com/bluefoxr/coinr |
The vector of weights w
is relative since the formula is:
a_amean(x, w)
a_amean(x, w)
x |
A numeric vector. |
w |
A vector of numeric weights of the same length as |
If x
contains NA
s, these x
values and the corresponding w
values are removed before applying the
formula above.
The weighted mean as a scalar value
x <- c(1:10) w <- c(10:1) a_amean(x,w)
x <- c(1:10) w <- c(10:1) a_amean(x,w)
Aggregates a data frame of indicator values into a single column using the Copeland method.
This function calls outrankMatrix()
.
a_copeland(X, w = NULL)
a_copeland(X, w = NULL)
X |
A numeric data frame or matrix of indicator data, with observations as rows and indicators as columns. No other columns should be present (e.g. label columns). |
w |
A numeric vector of weights, which should have length equal to |
The outranking matrix is transformed as follows:
values > 0.5 are replaced by 1
values < 0.5 are replaced by -1
values == 0.5 are replaced by 0
the diagonal of the matrix is all zeros
The Copeland scores are calculated as the row sums of this transformed matrix.
This function replaces the now-defunct copeland()
from COINr < v1.0.
Numeric vector of Copeland scores.
# some example data ind_data <- COINr::ASEM_iData[12:16] # aggregate with vector of weights outlist <- outrankMatrix(ind_data)
# some example data ind_data <- COINr::ASEM_iData[12:16] # aggregate with vector of weights outlist <- outrankMatrix(ind_data)
Weighted generalised mean of a vector. NA
are skipped by default.
a_genmean(x, w = NULL, p)
a_genmean(x, w = NULL, p)
x |
A numeric vector of positive values. |
w |
A vector of weights, which should have length equal to |
p |
Coefficient - see details. |
The generalised mean is as follows:
where p
is a coefficient specified in the function argument here. Note that:
For negative p
, all x
values must be positive
Setting p = 0
will result in an error due to the negative exponent. This case
is equivalent to the geometric mean in the limit, so use a_gmean()
instead.
Weighted harmonic mean, as a numeric value.
# a vector of values x <- 1:10 # a vector of weights w <- runif(10) # cubic mean a_genmean(x,w, p = 2)
# a vector of values x <- 1:10 # a vector of weights w <- runif(10) # cubic mean a_genmean(x,w, p = 2)
Weighted geometric mean of a vector. NA
are skipped by default.
a_gmean(x, w = NULL)
a_gmean(x, w = NULL)
x |
A numeric vector of positive values. |
w |
A vector of weights, which should have length equal to |
This function replaces the now-defunct geoMean()
from COINr < v1.0.
The geometric mean, as a numeric value.
# a vector of values x <- 1:10 # a vector of weights w <- runif(10) # weighted geometric mean a_gmean(x,w)
# a vector of values x <- 1:10 # a vector of weights w <- runif(10) # weighted geometric mean a_gmean(x,w)
Weighted harmonic mean of a vector. NA
are skipped by default.
a_hmean(x, w = NULL)
a_hmean(x, w = NULL)
x |
A numeric vector of positive values. |
w |
A vector of weights, which should have length equal to |
This function replaces the now-defunct harMean()
from COINr < v1.0.
Weighted harmonic mean, as a numeric value.
# a vector of values x <- 1:10 # a vector of weights w <- runif(10) # weighted harmonic mean a_hmean(x,w)
# a vector of values x <- 1:10 # a vector of weights w <- runif(10) # weighted harmonic mean a_hmean(x,w)
Methods for aggregating numeric vectors, data frames, coins and purses. See individual method documentation for more details:
Aggregate(x, ...)
Aggregate(x, ...)
x |
Object to be aggregated |
... |
Further arguments to be passed to methods. |
An object similar to the input
# see individual method documentation
# see individual method documentation
Aggregates a named data set specified by dset
using aggregation function(s) f_ag
, weights w
, and optional
function parameters f_ag_para
. Note that COINr has a number of aggregation functions built in,
all of which are of the form a_*()
, e.g. a_amean()
, a_gmean()
and friends.
## S3 method for class 'coin' Aggregate( x, dset, f_ag = NULL, w = NULL, f_ag_para = NULL, dat_thresh = NULL, by_df = FALSE, out2 = "coin", write_to = NULL, ... )
## S3 method for class 'coin' Aggregate( x, dset, f_ag = NULL, w = NULL, f_ag_para = NULL, dat_thresh = NULL, by_df = FALSE, out2 = "coin", write_to = NULL, ... )
x |
A coin class object. |
dset |
The name of the data set to apply the function to, which should be accessible in |
f_ag |
The name of an aggregation function, a string. This can either be a single string naming
a function to use for all aggregation levels, or else a character vector of function names of length |
w |
An optional data frame of weights. If |
f_ag_para |
Optional parameters to pass to |
dat_thresh |
An optional data availability threshold, specified as a number between 0 and 1. If a row
within an aggregation group has data availability lower than this threshold, the aggregated value for that row will be
|
by_df |
Controls whether to send a numeric vector to |
out2 |
Either |
write_to |
If specified, writes the aggregated data to |
... |
arguments passed to or from other methods. |
When by_df = FALSE
, aggregation is performed row-wise using the function f_ag
, such that for each row x_row
, the output is
f_ag(x_row, f_ag_para)
, and for the whole data frame, it outputs a numeric vector. Otherwise if by_df = TRUE
,
the entire data frame of each indicator group is passed to f_ag
.
The function f_ag
must be supplied as a string, e.g. "a_amean"
, and it must take as a minimum an input
x
which is either a numeric vector (if by_df = FALSE
), or a data frame (if by_df = TRUE
). In the former
case f_ag
should return a single numeric value (i.e. the result of aggregating x
), or in the latter case
a numeric vector (the result of aggregating the whole data frame in one go).
Weights are passed to the function f_ag
as an argument named w
. This means that the function should have
arguments that look like f_ag(x, w, ...)
, where ...
are possibly other input arguments to the function. If the
aggregation function doesn't use weights, you can set w = "none"
, and no weights will be passed to it.
f_ag
can optionally have other parameters, apart from x
and w
, specified as a list in f_ag_para
.
The aggregation specifications can be set to be different for each level of aggregation: the arguments f_ag
,
f_ag_para
, dat_thresh
, w
and by_df
can all be optionally specified as vectors or lists of length n-1, where
n is the number of levels in the index. In this case, the first value in each vector/list will be used for the first
round of aggregation, i.e. from indicators to the aggregates at level 2. The next will be used to aggregate from
level 2 to level 3, and so on.
When different functions are used for different levels, it is important to get the list syntax correct. For example, in a case with
three aggregations using different functions, say we want to use a_amean()
for the first two levels, then a custom
function f_cust()
for the last. f_cust()
has some additional parameters a
and b
. In this case, we would specify e.g.
f_ag_para = list(NULL, NULL, list(a = 2, b = 3))
- this is becauase a_amean()
requires no additional parameters, so
we pass NULL
.
Note that COINr has a number of aggregation functions built in,
all of which are of the form a_*()
, e.g. a_amean()
, a_gmean()
and friends. To see a list browse COINr functions alphabetically or
type a_
in the R Studio console and press the tab key (after loading COINr), or see the online documentation.
Optionally, a data availability threshold can be assigned below which the aggregated value will return
NA
(see dat_thresh
argument). If by_df = TRUE
, this will however be ignored because aggregation is not
done on individual rows. Note that more complex constraints could be built into f_ag
if needed.
An updated coin with aggregated data set added at .$Data[[write_to]]
if out2 = "coin"
,
else if out2 = "df"
outputs the aggregated data set as a data frame.
# build example up to normalised data set coin <- build_example_coin(up_to = "Normalise") # aggregate normalised data set coin <- Aggregate(coin, dset = "Normalised")
# build example up to normalised data set coin <- build_example_coin(up_to = "Normalise") # aggregate normalised data set coin <- Aggregate(coin, dset = "Normalised")
Aggregates a data frame into a single column using a specified function. Note that COINr has a number of aggregation functions built in,
all of which are of the form a_*()
, e.g. a_amean()
, a_gmean()
and friends.
## S3 method for class 'data.frame' Aggregate( x, f_ag = NULL, f_ag_para = NULL, dat_thresh = NULL, by_df = FALSE, ... )
## S3 method for class 'data.frame' Aggregate( x, f_ag = NULL, f_ag_para = NULL, dat_thresh = NULL, by_df = FALSE, ... )
x |
Data frame to be aggregated |
f_ag |
The name of an aggregation function, as a string. |
f_ag_para |
Any additional parameters to pass to |
dat_thresh |
An optional data availability threshold, specified as a number between 0 and 1. If a row
of |
by_df |
Controls whether to send a numeric vector to |
... |
arguments passed to or from other methods. |
Aggregation is performed row-wise using the function f_ag
, such that for each row x_row
, the output is
f_ag(x_row, f_ag_para)
, and for the whole data frame, it outputs a numeric vector. The data frame x
must
only contain numeric columns.
The function f_ag
must be supplied as a string, e.g. "a_amean"
, and it must take as a minimum an input
x
which is either a numeric vector (if by_df = FALSE
), or a data frame (if by_df = TRUE
). In the former
case f_ag
should return a single numeric value (i.e. the result of aggregating x
), or in the latter case
a numeric vector (the result of aggregating the whole data frame in one go).
f_ag
can optionally have other parameters, e.g. weights, specified as a list in f_ag_para
.
Note that COINr has a number of aggregation functions built in,
all of which are of the form a_*()
, e.g. a_amean()
, a_gmean()
and friends. To see a list browse COINr functions alphabetically or
type a_
in the R Studio console and press the tab key (after loading COINr), or see the online documentation.
Optionally, a data availability threshold can be assigned below which the aggregated value will return
NA
(see dat_thresh
argument). If by_df = TRUE
, this will however be ignored because aggregation is not
done on individual rows. Note that more complex constraints could be built into f_ag
if needed.
A numeric vector
# get some indicator data - take a few columns from built in data set X <- ASEM_iData[12:15] # normalise to avoid zeros - min max between 1 and 100 X <- Normalise(X, global_specs = list(f_n = "n_minmax", f_n_para = list(l_u = c(1,100)))) # aggregate using harmonic mean, with some weights y <- Aggregate(X, f_ag = "a_hmean", f_ag_para = list(w = c(1, 1, 2, 1)))
# get some indicator data - take a few columns from built in data set X <- ASEM_iData[12:15] # normalise to avoid zeros - min max between 1 and 100 X <- Normalise(X, global_specs = list(f_n = "n_minmax", f_n_para = list(l_u = c(1,100)))) # aggregate using harmonic mean, with some weights y <- Aggregate(X, f_ag = "a_hmean", f_ag_para = list(w = c(1, 1, 2, 1)))
Aggregates indicators following the structure specified in iMeta
, for each coin inside the purse.
See Aggregate.coin()
, which is applied to each coin, for more information
## S3 method for class 'purse' Aggregate( x, dset, f_ag = NULL, w = NULL, f_ag_para = NULL, dat_thresh = NULL, write_to = NULL, by_df = FALSE, ... )
## S3 method for class 'purse' Aggregate( x, dset, f_ag = NULL, w = NULL, f_ag_para = NULL, dat_thresh = NULL, write_to = NULL, by_df = FALSE, ... )
x |
A purse-class object |
dset |
The name of the data set to apply the function to, which should be accessible in |
f_ag |
The name of an aggregation function, a string. This can either be a single string naming
a function to use for all aggregation levels, or else a character vector of function names of length |
w |
An optional data frame of weights. If |
f_ag_para |
Optional parameters to pass to |
dat_thresh |
An optional data availability threshold, specified as a number between 0 and 1. If a row
within an aggregation group has data availability lower than this threshold, the aggregated value for that row will be
|
write_to |
If specified, writes the aggregated data to |
by_df |
Controls whether to send a numeric vector to |
... |
arguments passed to or from other methods. |
An updated purse with new treated data sets added at .$Data[[write_to]]
in each coin.
# build example purse up to normalised data set purse <- build_example_purse(up_to = "Normalise", quietly = TRUE) # aggregate using defaults purse <- Aggregate(purse, dset = "Normalised")
# build example purse up to normalised data set purse <- build_example_purse(up_to = "Normalise", quietly = TRUE) # aggregate using defaults purse <- Aggregate(purse, dset = "Normalised")
Given a numeric data frame Y
with rows indexed by a time vector tt
, interpolates at time values
specified by the vector tt_est
. If tt_est
is not in tt
, will create new rows in the data frame
corresponding to these interpolated points.
approx_df(Y, tt, tt_est = NULL, ...)
approx_df(Y, tt, tt_est = NULL, ...)
Y |
A data frame with all numeric columns |
tt |
A time vector with length equal to |
tt_est |
A time vector of points to interpolate in |
... |
Further arguments to pass to |
This is a wrapper for stats::approx()
, with some differences. In the first place, stats::approx()
is
applied to each column of Y
, using tt
each time as the corresponding time vector indexing Y
. Interpolated
values are generated at points specified in tt_est
but these are appended to the existing data (whereas
stats::approx()
will only return the interpolated points and nothing else). Further arguments to
stats::approx()
can be passed using the ...
argument.
A list with:
.$tt
the vector of time points, including time values of interpolated points
.$Y
the corresponding interpolated data frame
Both outputs are sorted by tt
.
# a time vector tt <- 2011:2020 # two random vectors with some missing values y1 <- runif(10) y2 <- runif(10) y1[2] <- y1[5] <- NA y2[3] <- y2[5] <- NA # make into df Y <- data.frame(y1, y2) # interpolate for time = 2012 Y_int <- approx_df(Y, tt, 2012) Y_int$Y # notice Y_int$y2 is unchanged since at 2012 it did not have NA value stopifnot(identical(Y_int$Y$y2, y2)) # interpolate at value not in tt approx_df(Y, tt, 2015.5)
# a time vector tt <- 2011:2020 # two random vectors with some missing values y1 <- runif(10) y2 <- runif(10) y1[2] <- y1[5] <- NA y2[3] <- y2[5] <- NA # make into df Y <- data.frame(y1, y2) # interpolate for time = 2012 Y_int <- approx_df(Y, tt, 2012) Y_int$Y # notice Y_int$y2 is unchanged since at 2012 it did not have NA value stopifnot(identical(Y_int$Y$y2, y2)) # interpolate at value not in tt approx_df(Y, tt, 2015.5)
This is an "old format" "COIN" object which is stored for testing purposes.
It is generated using the COINr6 package (only available on GitHub) using
COINr6::build_ASEM()
ASEM_COIN
ASEM_COIN
A "COIN" class object
https://github.com/bluefoxr/COINr6
A data set containing raw values of indicators for 51 countries, groups and denominators. See the ASEM Portal
for further information and detailed description of each indicator. See also vignette("coins")
for the format
of this data.
ASEM_iData
ASEM_iData
A data frame with 51 rows and 60 variables.
This data set is in the new v1.0 format.
https://composite-indicators.jrc.ec.europa.eu/asem-sustainable-connectivity/repository
This is an artificially-generated set of panel data (multiple observations of indicators over time) that is included to build the example "purse" class, i.e. to build composite indicators over time. This will eventually be replaced with a better example, i.e. a real data set.
ASEM_iData_p
ASEM_iData_p
A data frame with 255 rows and 60 variables.
This data set is in the new v1.0 format.
https://composite-indicators.jrc.ec.europa.eu/asem-sustainable-connectivity/repository
This contains all metadata for ASEM indicators, including names, weights, directions, etc. See the ASEM Portal
for further information and detailed description of each indicator.
See also vignette("coins")
for the format
of this data.
ASEM_iMeta
ASEM_iMeta
A data frame with 68 rows and 9 variables
This data set is in the new v1.0 format.
https://bluefoxr.github.io/COINrDoc/coins-the-currency-of-coinr.html#aggregation-metadata
Simple Box Cox, with no optimisation of lambda.
boxcox(x, lambda, makepos = TRUE, na.rm = FALSE)
boxcox(x, lambda, makepos = TRUE, na.rm = FALSE)
x |
A vector or column of data to transform |
lambda |
The lambda parameter of the Box Cox transform |
makepos |
If |
na.rm |
If |
This function replaces the now-defunct BoxCox()
from COINr < v1.0.
A vector of length length(x)
with transformed values.
# example data x <- runif(30) # Apply Box Cox xBox <- boxcox(x, lambda = 2) # plot one against the other plot(x, xBox)
# example data x <- runif(30) # Apply Box Cox xBox <- boxcox(x, lambda = 2) # plot one against the other plot(x, xBox)
Shortcut function to build the ASEM example coin, using inbuilt example data. This can be useful for testing and also
for building reproducible examples. To see the underlying commands run edit(build_example_coin)
. See also
vignette("coins")
.
build_example_coin(up_to = NULL, quietly = FALSE)
build_example_coin(up_to = NULL, quietly = FALSE)
up_to |
The point up to which to build the index. If |
quietly |
If |
This function replaces the now-defunct build_ASEM()
from COINr < v1.0.
coin class object
# build example coin up to data treatment step coin <- build_example_coin(up_to = "Treat") coin
# build example coin up to data treatment step coin <- build_example_coin(up_to = "Treat") coin
Shortcut function to build an example purse. This is currently an "artificial" example, in that it takes the ASEM data set
used in build_example_coin()
and replicates it for five years, adding artificial noise to simulate year-on-year variation.
This was done simply to demonstrate the functionality of purses, and will at some point be replaced with a real example.
See also vignette("coins")
.
build_example_purse(up_to = NULL, quietly = FALSE)
build_example_purse(up_to = NULL, quietly = FALSE)
up_to |
The point up to which to build the index. If |
quietly |
If |
purse class object
# build example purse up to unit screening step purse <- build_example_purse(up_to = "Screen") purse
# build example purse up to unit screening step purse <- build_example_purse(up_to = "Screen") purse
Given a variable y
indexed by a time vector x
, calculates the compound annual growth rate. Note that CAGR assumes
that the x
refer to years. Also it is only calculated using the first and latest observed values.
CAGR(y, x)
CAGR(y, x)
y |
A numeric vector |
x |
A numeric vector of the same length as |
A scalar value (CAGR)
# random points over 10 years x <- 2011:2020 y <- runif(10) CAGR(y, x)
# random points over 10 years x <- 2011:2020 y <- runif(10) CAGR(y, x)
A shortcut function to add and remove indicators. This will make the relevant changes
and recalculate the index if asked. Adding and removing is done relative to the current set of
indicators used in calculating the index results. Any indicators that are added must of course be
present in the original iData
and iMeta
that were input to new_coin()
.
change_ind(coin, add = NULL, drop = NULL, regen = FALSE)
change_ind(coin, add = NULL, drop = NULL, regen = FALSE)
coin |
coin object |
add |
A character vector of indicator codes to add (must be present in the original input data) |
drop |
A character vector of indicator codes to remove (must be present in the original input data) |
regen |
Logical (default): if |
See also vignette("adjustments")
.
This function replaces the now-defunct indChange()
from COINr < v1.0.
An updated coin, with regenerated results if regen = TRUE
.
# build full example coin coin <- build_example_coin(quietly = TRUE) # exclude two indicators and regenerate # remove two indicators and regenerate the coin coin_remove <- change_ind(coin, drop = c("LPI", "Forest"), regen = TRUE) coin_remove
# build full example coin coin <- build_example_coin(quietly = TRUE) # exclude two indicators and regenerate # remove two indicators and regenerate the coin coin_remove <- change_ind(coin, drop = c("LPI", "Forest"), regen = TRUE) coin_remove
Checks the format of iData
input to new_coin()
. This check must be passed to successfully build a new
coin.
check_iData(iData, quietly = FALSE)
check_iData(iData, quietly = FALSE)
iData |
A data frame of indicator data. |
quietly |
Set |
The restrictions on iData
are not extensive. It should be a data frame with only one required column
uCode
which gives the code assigned to each unit (alphanumeric, not starting with a number). All other
columns are defined by corresponding entries in iMeta
, with the following special exceptions:
Time
is an optional column which allows panel data to be input, consisting of e.g. multiple rows for
each uCode
: one for each Time
value. This can be used to split a set of panel data into multiple coins
(a so-called "purse") which can be input to COINr functions. See new_coin()
for more details.
uName
is an optional column which specifies a longer name for each unit. If this column is not included,
unit codes (uCode
) will be used as unit names where required.
No column names should contain blank spaces.
Message if everything ok, else error messages.
check_iData(ASEM_iData)
check_iData(ASEM_iData)
Checks the format of iMeta
input to new_coin()
. This performs a series of thorough checks to make sure
that iMeta
agrees with the specifications. This also includes checks to make sure the structure makes
sense, there are no duplicates, and other things. iMeta
must pass this check to build a new coin.
check_iMeta(iMeta, quietly = FALSE)
check_iMeta(iMeta, quietly = FALSE)
iMeta |
A data frame of indicator metadata. See details. |
quietly |
Set |
Required columns for iMeta
are:
Level
: Level in aggregation, where 1 is indicator level, 2 is the level resulting from aggregating
indicators, 3 is the result of aggregating level 2, and so on. Set to NA
for entries that are not included
in the index (groups, denominators, etc).
iCode
: Indicator code, alphanumeric. Must not start with a number or contain blank spaces.
Parent
: Group (iCode
) to which indicator/aggregate belongs in level immediately above.
Each entry here should also be found in iCode
. Set to NA
only
for the highest (Index) level (no parent), or for entries that are not included
in the index (groups, denominators, etc).
Direction
: Numeric, either -1 or 1
Weight
: Numeric weight, will be rescaled to sum to 1 within aggregation group. Set to NA
for entries that are not included
in the index (groups, denominators, etc).
Type
: The type, corresponding to iCode
. Can be either Indicator
, Aggregate
, Group
, Denominator
,
or Other
.
Optional columns that are recognised in certain functions are:
iName
: Name of the indicator: a longer name which is used in some plotting functions.
Unit
: the unit of the indicator, e.g. USD, thousands, score, etc. Used in some plots if available.
Target
: a target for the indicator. Used if normalisation type is distance-to-target.
The iMeta
data frame essentially gives details about each of the columns found in iData
, as well as
details about additional data columns eventually created by aggregating indicators. This means that the
entries in iMeta
must include all columns in iData
, except the three special column names: uCode
,
uName
, and Time
. In other words, all column names of iData
should appear in iMeta$iCode
, except
the three special cases mentioned. The iName
column optionally can be used to give longer names to each indicator
which can be used for display in plots.
iMeta
also specifies the structure of the index, by specifying the parent of each indicator and aggregate.
The Parent
column must refer to entries that can be found in iCode
. Try View(ASEM_iMeta)
for an example
of how this works.
Level
is the "vertical" level in the hierarchy, where 1 is the bottom level (indicators), and each successive
level is created by aggregating the level below according to its specified groups.
Direction
is set to 1 if higher values of the indicator should result in higher values of the index, and
-1 in the opposite case.
The Type
column specifies the type of the entry: Indicator
should be used for indicators at level 1.
Aggregate
for aggregates created by aggregating indicators or other aggregates. Otherwise set to Group
if the variable is not used for building the index but instead is for defining groups of units. Set to
Denominator
if the variable is to be used for scaling (denominating) other indicators. Finally, set to
Other
if the variable should be ignored but passed through. Any other entries here will cause an error.
Note: this function requires the columns above as specified, but extra columns can also be added without causing errors.
Message if everything ok, else error messages.
check_iMeta(ASEM_iMeta)
check_iMeta(ASEM_iMeta)
Logical test: if abs(skewness) < skew_thresh
OR kurtosis < kurt_thresh
, returns TRUE
, else FALSE
check_SkewKurt(x, na.rm = FALSE, skew_thresh = 2, kurt_thresh = 3.5)
check_SkewKurt(x, na.rm = FALSE, skew_thresh = 2, kurt_thresh = 3.5)
x |
A numeric vector. |
na.rm |
Set |
skew_thresh |
A threshold for absolute skewness (positive). Default 2.25. |
kurt_thresh |
A threshold for kurtosis. Default 3.5. |
A list with .$Pass
is a Logical, where TRUE
is pass, FALSE
is fail, and .$Details
is a
sub-list with skew and kurtosis values.
set.seed(100) x <- runif(20) # this passes check_SkewKurt(x) # if we add an outlier, doesn't pass check_SkewKurt(c(x, 1000))
set.seed(100) x <- runif(20) # this passes check_SkewKurt(x) # if we add an outlier, doesn't pass check_SkewKurt(c(x, 1000))
Converts an older COIN class to the newer coin class. Note that there are some limitations to this. First,
the function arguments used to create the COIN will not be passed to the coin, since the function arguments
are different. This means that any data sets beyond "Raw" cannot be regenerated. The second limitation is
that anything from the .$Analysis
folder will not be passed on.
COIN_to_coin(COIN, recover_dsets = FALSE, out2 = "coin")
COIN_to_coin(COIN, recover_dsets = FALSE, out2 = "coin")
COIN |
A COIN class object, generated by COINr version <= 0.6.1, OR a list containing IndData, IndMeta and AggMeta entries. |
recover_dsets |
Logical: if |
out2 |
If |
This function works by building the iData
and iMeta
arguments to new_coin()
, using information from
the COIN. It then uses these to build a coin if out2 = "coin"
or else outputs both data frames in a list.
If recover_dsets = TRUE
, any data sets found in COIN$Data
(except "Raw") will also be put in coin$Data
,
in the correct format. These can be used to inspect the data but not to regenerate.
Note that if you want to exclude any indicators, you will have to set out2 = "list"
and build the coin
in a separate step with exclude
specified. Any exclusions/inclusions from the COIN are not passed on
automatically.
A coin class object if out2 = "coin"
, else a list of data frames if out2 = "list"
.
# see vignette("other_functions")
# see vignette("other_functions")
Compares two coin class objects using a specified iCode
(column of data) from specified data sets.
compare_coins( coin1, coin2, dset, iCode, also_get = NULL, compare_by = "ranks", sort_by = NULL, decreasing = FALSE )
compare_coins( coin1, coin2, dset, iCode, also_get = NULL, compare_by = "ranks", sort_by = NULL, decreasing = FALSE )
coin1 |
A coin class object |
coin2 |
A coin class object |
dset |
A data set that is found in |
iCode |
The name of a column that is found in |
also_get |
Optional metadata columns to attach to the table: see |
compare_by |
Either |
sort_by |
Optionally, a column name of the output data frame to sort rows by. Can be either
|
decreasing |
Argument to pass to |
This function replaces the now-defunct compTable()
from COINr < v1.0.
A data frame of comparison information.
# build full example coin coin <- build_example_coin(quietly = TRUE) # copy coin coin2 <- coin # change to prank function (percentile ranks) # we don't need to specify any additional parameters (f_n_para) here coin2$Log$Normalise$global_specs <- list(f_n = "n_prank") # regenerate coin2 <- Regen(coin2) # compare index, sort by absolute rank difference compare_coins(coin, coin2, dset = "Aggregated", iCode = "Index", sort_by = "Abs.diff", decreasing = TRUE)
# build full example coin coin <- build_example_coin(quietly = TRUE) # copy coin coin2 <- coin # change to prank function (percentile ranks) # we don't need to specify any additional parameters (f_n_para) here coin2$Log$Normalise$global_specs <- list(f_n = "n_prank") # regenerate coin2 <- Regen(coin2) # compare index, sort by absolute rank difference compare_coins(coin, coin2, dset = "Aggregated", iCode = "Index", sort_by = "Abs.diff", decreasing = TRUE)
Given two coins, this function returns the correlation between the two coins,
for target datset dset
and target indicator code(s) iCodes
. Correlation
is calculated as the Pearson correlation coefficient, but if compare_by = "Ranks"
then this is the correlation coefficient of the ranks, which amounts to the
Spearman rank correlation. Set compare_by = "Scores"
to return the Pearson
correlation between scores.
compare_coins_corr(coin1, coin2, dset, iCodes, compare_by = "ranks")
compare_coins_corr(coin1, coin2, dset, iCodes, compare_by = "ranks")
coin1 |
A coin |
coin2 |
A coin, with possibly alternative methodology. This should share at
least two units in common with |
dset |
Target data set, must be present in both |
iCodes |
Character vector of indicator codes to correlate between the two coins. |
compare_by |
Either |
A list containing a correlation table and a list of comparison data frames.
# build example coin <- build_example_coin() # copy coin coin2 <- coin # change to prank function (percentile ranks) # we don't need to specify any additional parameters (f_n_para) here coin2$Log$Normalise$global_specs <- list(f_n = "n_prank") # regenerate coin2 <- Regen(coin2) # iCodes to compare: all at level 3 and 4 iCodes <- coin$Meta$Ind$iCode[which(coin$Meta$Ind$Level > 2)] # compare index, sort by absolute rank difference l_comp <- compare_coins_corr(coin, coin2, dset = "Aggregated", iCodes = iCodes) # see df l_comp$df_corr
# build example coin <- build_example_coin() # copy coin coin2 <- coin # change to prank function (percentile ranks) # we don't need to specify any additional parameters (f_n_para) here coin2$Log$Normalise$global_specs <- list(f_n = "n_prank") # regenerate coin2 <- Regen(coin2) # iCodes to compare: all at level 3 and 4 iCodes <- coin$Meta$Ind$iCode[which(coin$Meta$Ind$Level > 2)] # compare index, sort by absolute rank difference l_comp <- compare_coins_corr(coin, coin2, dset = "Aggregated", iCodes = iCodes) # see df l_comp$df_corr
Given multiple coins as a list, generates a rank comparison of a single indicator or aggregate which is specified
by the dset
and iCode
arguments (passed to get_data()
). The indicator or aggregate targeted must be available
in all the coins in coins
.
compare_coins_multi( coins, dset, iCode, also_get = NULL, tabtype = "Values", ibase = 1, sort_table = TRUE, compare_by = "ranks" )
compare_coins_multi( coins, dset, iCode, also_get = NULL, tabtype = "Values", ibase = 1, sort_table = TRUE, compare_by = "ranks" )
coins |
A list of coins. If names are provided, these will be used in the tables returned by this function. |
dset |
The name of a data set found in |
iCode |
A column name of the data set targeted by |
also_get |
Optional metadata columns to attach to the table: see |
tabtype |
The type of table to generate. One of:
|
ibase |
The index of the coin to use as a base comparison (default first coin in list) |
sort_table |
If TRUE, sorts by the base COIN ( |
compare_by |
Either |
By default, the ranks of the target indicator/aggregate of each coin will be merged using the uCode
s within each coin.
Optionally, specifying also_get
(passed to get_data()
) will additionally merge using the metadata columns.
This means that coins must share the same metadata columns that are returned as a result of also_get
.
This function replaces the now-defunct compTableMulti()
from COINr < v1.0.
Data frame unless tabtype = "All"
, in which case a list of three data frames is returned.
# see vignette("adjustments")
# see vignette("adjustments")
A custom function for comparing two data frames of indicator data, to see whether they match up, at a specified number of
significant figures. Specifically, this is intended to compare two data frames, without regard to row or column ordering.
Rows are matched by the required matchcol
argument. Hence, it is different from e.g. all.equal()
which requires rows
to be ordered. In COINr, typically matchcol
is the uCode
column, for example.
compare_df(df1, df2, matchcol, sigfigs = 5)
compare_df(df1, df2, matchcol, sigfigs = 5)
df1 |
A data frame |
df2 |
Another data frame |
matchcol |
A common column name that is used to match row order. E.g. this might be |
sigfigs |
The number of significant figures to use for matching numerical columns |
This function compares numerical and non-numerical columns to see if they match. Rows and columns can be in any order. The function performs the following checks:
Checks that the two data frames are the same size
Checks that column names are the same, and that the matching column has the same entries
Checks column by column that the elements are the same, after sorting according to the matching column
It then summarises for each column whether there are any differences, and also what the differences are, if any.
This is intended to cross-check results. For example, if you run something in COINr and want to check indicator results against external calculations.
This function replaces the now-defunct compareDF()
from COINr < v1.0.
A list with comparison results. List contains:
.$Same
: overall summary: if TRUE
the data frames are the same according to the rules specified, otherwise FALSE
.
.$Details
: details of each column as a data frame. Each row summarises a column of the data frame, saying whether
the column is the same as its equivalent, and the number of differences, if any. In case the two data frames have differing
numbers of columns and rows, or have differing column names or entries in matchcol
, .$Details
will simply contain a
message to this effect.
.$Differences
: a list with one entry for every column which contains different entries. Differences are summarised as
a data frame with one row for each difference, reporting the value from df1
and its equivalent from df2
.
# take a sample of indicator data (including the uCode column) data1 <- ASEM_iData[c(2,12:15)] # copy the data data2 <- data1 # make a change: replace one value in data2 by NA data2[1,2] <- NA # compare data frames compare_df(data1, data2, matchcol = "uCode")
# take a sample of indicator data (including the uCode column) data1 <- ASEM_iData[c(2,12:15)] # copy the data data2 <- data1 # make a change: replace one value in data2 by NA data2[1,2] <- NA # compare data frames compare_df(data1, data2, matchcol = "uCode")
Allows a custom data operation on coins or purses.
Custom(x, ...)
Custom(x, ...)
x |
Object to be operated on (coin or purse) |
... |
arguments passed to or from other methods. |
Modified object.
Custom operation on a coin. This is an experimental new feature so please check the results carefully.
## S3 method for class 'coin' Custom( x, dset, f_cust, f_cust_para = NULL, write_to = NULL, write2log = TRUE, ... )
## S3 method for class 'coin' Custom( x, dset, f_cust, f_cust_para = NULL, write_to = NULL, write2log = TRUE, ... )
x |
A coin |
dset |
Target data set |
f_cust |
Function to apply to the data set. See details. |
f_cust_para |
Optional additional parameters to pass to the function defined
by |
write_to |
Name of data set to write to |
write2log |
Logical: whether or not to write to the log. |
... |
Arguments to pass to/from other methods. |
In this function, the data set named dset
is extracted from the coin using
get_dset(coin, dset)
. It is passed to the function f_cust
, which is required
to return an equivalent but modified data frame, which is then written as a new
data set with name write_to
. This is intended to allow arbitrary operations
on coin data sets while staying within the COINr framework, which means that if
Regen()
is used, these operations will be re-run, allowing them to be included
in things like sensitivity analysis.
The format of f_cust
is important. It must be a function whose first argument
is called x
: this will be the argument that the data is passed to. The data will
be in the same format as extracted via get_dset(coin, dset)
, which means it will
have a uCode
column. f_cust
can have other arguments which are passed
to it via f_cust_para
. The function should return a data frame similar to the data
that was passed to it, it must contain have the same column names (meaning you can't
remove indicators), but otherwise is flexible - this means some caution is necessary
to ensure that subsequent operations don't fail. Be careful, for example, to ensure
that there are no duplicates in uCode
, and that indicator columns are numeric.
The function assigned to f_cust
is passed to base::do.call()
, therefore it can
be passed either as a string naming the function, or as the function itself. Depending
on the context, the latter option may be preferable because this stores the function
within the coin, which makes it portable. Otherwise, if the function is simply
named as a string, you must make sure it is available to access in the environment.
A coin
# build example coin coin <- build_example_coin(up_to = "new_coin") # create function - replaces suspected unreliable point with NA f_NA <- function(x){ x[3, 10] <- NA; return(x)} # call function from Custom() coin <- Custom(coin, dset = "Raw", f_cust = f_NA) stopifnot(is.na(coin$Data$Custom[3,10]))
# build example coin coin <- build_example_coin(up_to = "new_coin") # create function - replaces suspected unreliable point with NA f_NA <- function(x){ x[3, 10] <- NA; return(x)} # call function from Custom() coin <- Custom(coin, dset = "Raw", f_cust = f_NA) stopifnot(is.na(coin$Data$Custom[3,10]))
Custom operation on a purse. This is an experimental new feature.
## S3 method for class 'purse' Custom( x, dset, f_cust, f_cust_para = NULL, global = FALSE, write_to = NULL, ... )
## S3 method for class 'purse' Custom( x, dset, f_cust, f_cust_para = NULL, global = FALSE, write_to = NULL, ... )
x |
A purse object |
dset |
The data set to apply the operation to. |
f_cust |
Function to apply to the data set. See details. |
f_cust_para |
Optional additional parameters to pass to the function defined
by |
global |
Logical: if |
write_to |
Name of data set to write to |
... |
Arguments to pass to/from other methods. |
In this function, the data set named dset
is extracted from the coin using
get_dset(purse, dset)
. It is passed to the function f_cust
, which is required
to return an equivalent but modified data frame, which is then written as a new
data set with name write_to
. This is intended to allow arbitrary operations
on coin data sets while staying within the COINr framework, which means that if
Regen()
is used, these operations will be re-run, allowing them to be included
in things like sensitivity analysis.
The format of f_cust
is important. It must be a function whose first argument
is called x
: this will be the argument that the data is passed to. The data will
be in the same format as extracted via get_dset(purse, dset)
, which means it will
have uCode
and Time
columns. f_cust
can have other arguments which are passed
to it via f_cust_para
. The function should return a data frame similar to the data
that was passed to it, it must contain have the same column names (meaning you can't
remove indicators), but otherwise is flexible - this means some caution is necessary
to ensure that subsequent operations don't fail. Be careful, for example, to ensure
that there are no duplicates in uCode
, and that indicator columns are numeric.
The function assigned to f_cust
is passed to base::do.call()
, therefore it can
be passed either as a string naming the function, or as the function itself. Depending
on the context, the latter option may be preferable because this stores the function
within the coin, which makes it portable. Otherwise, if the function is simply
named as a string, you must make sure it is available to access in the environment.
An updated purse.
# build example purse purse <- build_example_purse(up_to = "new_coin") # custom function - set points before 2020 to NA for BEL in FDI due to a # break in the series f_cust <- function(x){x[(x$uCode == "BEL") & (x$Time < 2020), "FDI"] <- NA; return(x)}
# build example purse purse <- build_example_purse(up_to = "new_coin") # custom function - set points before 2020 to NA for BEL in FDI due to a # break in the series f_cust <- function(x){x[(x$uCode == "BEL") & (x$Time < 2020), "FDI"] <- NA; return(x)}
"Denominates" or "scales" variables by other variables. Typically this is done by dividing extensive variables such as GDP by a scaling variable such as population, to give an intensive variable (GDP per capita).
Denominate(x, ...)
Denominate(x, ...)
x |
Object to be denominated |
... |
arguments passed to or from other methods |
See documentation for individual methods:
This function replaces the now-defunct denominate()
from COINr < v1.0.
See individual method documentation
# See individual method documentation
# See individual method documentation
"Denominates" or "scales" indicators by other variables. Typically this is done by dividing extensive variables such as GDP by a scaling variable such as population, to give an intensive variable (GDP per capita).
## S3 method for class 'coin' Denominate( x, dset, denoms = NULL, denomby = NULL, denoms_ID = NULL, f_denom = NULL, write_to = NULL, out2 = "coin", ... )
## S3 method for class 'coin' Denominate( x, dset, denoms = NULL, denomby = NULL, denoms_ID = NULL, f_denom = NULL, write_to = NULL, out2 = "coin", ... )
x |
A coin class object |
dset |
The name of the data set to apply the function to, which should be accessible in |
denoms |
An optional data frame of denominator data. Columns should be denominator data, with column names corresponding
to entries in |
denomby |
Optional data frame which specifies which denominators to use for each indicator, and any scaling factors
to apply. Should have columns |
denoms_ID |
An ID column for matching |
f_denom |
A function which takes two numeric vector arguments and is used to perform the denomination for each
column. By default, this is division, i.e. |
write_to |
If specified, writes the aggregated data to |
out2 |
Either |
... |
arguments passed to or from other methods |
This function denominates a data set dset
inside the coin. By default, denominating variables are taken from
the coin, specifically as variables in iData
with Type = "Denominator"
in iMeta
(input to new_coin()
).
Specifications to map denominators to indicators are also taken by default from iMeta$Denominator
, if it exists.
These specifications can be overridden using the denoms
and denomby
arguments. The operator for denomination
can also be changed using the f_denom
argument.
See also documentation for Denominate.data.frame()
which is called by this method.
An updated coin if out2 = "coin"
, else a data frame of denominated data if out2 = "df"
.
# build example coin coin <- build_example_coin(up_to = "new_coin", quietly = TRUE) # denominate (here, we only need to say which dset to use, takes # specs and denominators from within the coin) coin <- Denominate(coin, dset = "Raw")
# build example coin coin <- build_example_coin(up_to = "new_coin", quietly = TRUE) # denominate (here, we only need to say which dset to use, takes # specs and denominators from within the coin) coin <- Denominate(coin, dset = "Raw")
"Denominates" or "scales" variables by other variables. Typically this is done by dividing extensive variables such as GDP by a scaling variable such as population, to give an intensive variable (GDP per capita).
## S3 method for class 'data.frame' Denominate( x, denoms, denomby, x_ID = NULL, denoms_ID = NULL, f_denom = NULL, ... )
## S3 method for class 'data.frame' Denominate( x, denoms, denomby, x_ID = NULL, denoms_ID = NULL, f_denom = NULL, ... )
x |
A data frame of data to be denominated. Columns to be denominated must be numeric, but any columns not
specified in |
denoms |
A data frame of denominator data. Columns should be denominator data, with column names corresponding
to entries in |
denomby |
A data frame which specifies which denominators to use for each indicator, and any scaling factors
to apply. Should have columns |
x_ID |
A column name of |
denoms_ID |
A column name of |
f_denom |
A function which takes two numeric vector arguments and is used to perform the denomination for each
column. By default, this is division, i.e. |
... |
arguments passed to or from other methods. |
A data frame x
is denominated by variables found in another data frame denoms
, according to specifications in
denomby
. denomby
specifies which columns in x
are to be denominated, and by which columns in denoms
, and
any scaling factors to apply to each denomination.
Both x
and denomby
must contain an ID column which matches the rows of x
to denomby
. If not specified, this
is assumed to be uCode
, but can also be specified using the x_ID
and denoms_ID
arguments. All entries in
x[[x_ID]]
must be present in denoms[[denoms_ID]]
, although extra rows are allowed in denoms
. This is because
the rows of x
are matched to the rows of denoms
using these ID columns, to ensure that units (rows) are correctly
denominated.
By default, columns of x
are divided by columns of denoms
. This can be generalised by setting f_denom
to another
function which takes two numeric vector arguments. I.e. setting denoms = ``*``
will multiply columns of x
and
denoms together.
A data frame of the same size as x
, with any specified columns denominated according to specifications.
WorldDenoms A data set of some common national-level denominators.
# Get a sample of indicator data (note must be indicators plus a "UnitCode" column) iData <- ASEM_iData[c("uCode", "Goods", "Flights", "LPI")] # Also get some denominator data denoms <- ASEM_iData[c("uCode", "GDP", "Population")] # specify how to denominate denomby <- data.frame(iCode = c("Goods", "Flights"), Denominator = c("GDP", "Population"), ScaleFactor = c(1, 1000)) # Denominate one by the other iData_den <- Denominate(iData, denoms, denomby)
# Get a sample of indicator data (note must be indicators plus a "UnitCode" column) iData <- ASEM_iData[c("uCode", "Goods", "Flights", "LPI")] # Also get some denominator data denoms <- ASEM_iData[c("uCode", "GDP", "Population")] # specify how to denominate denomby <- data.frame(iCode = c("Goods", "Flights"), Denominator = c("GDP", "Population"), ScaleFactor = c(1, 1000)) # Denominate one by the other iData_den <- Denominate(iData, denoms, denomby)
This works in almost exactly the same way as Denominate.coin()
. The only point of care is that the
denoms
argument here cannot take time-indexed data, but only a single value for each unit. It is
therefore recommended to pass the time-dependent denominator data as part of iData
when calling
new_coin()
. In this way, denominators can vary with time. See vignette("denomination")
.
## S3 method for class 'purse' Denominate( x, dset, denoms = NULL, denomby = NULL, denoms_ID = NULL, f_denom = NULL, write_to = NULL, ... )
## S3 method for class 'purse' Denominate( x, dset, denoms = NULL, denomby = NULL, denoms_ID = NULL, f_denom = NULL, write_to = NULL, ... )
x |
A purse class object |
dset |
The name of the data set to apply the function to, which should be accessible in |
denoms |
An optional data frame of denominator data. Columns should be denominator data, with column names corresponding
to entries in |
denomby |
Optional data frame which specifies which denominators to use for each indicator, and any scaling factors
to apply. Should have columns |
denoms_ID |
An ID column for matching |
f_denom |
A function which takes two numeric vector arguments and is used to perform the denomination for each
column. By default, this is division, i.e. |
write_to |
If specified, writes the aggregated data to |
... |
arguments passed to or from other methods. |
An updated purse
# build example purse purse <- build_example_purse(up_to = "new_coin", quietly = TRUE) # denominate using data/specs already included in coin purse <- Denominate(purse, dset = "Raw")
# build example purse purse <- build_example_purse(up_to = "new_coin", quietly = TRUE) # denominate using data/specs already included in coin purse <- Denominate(purse, dset = "Raw")
Writes coins and purses to Excel. See individual method documentation:
export_to_excel(x, fname, ...)
export_to_excel(x, fname, ...)
x |
A coin or purse |
fname |
The file name to write to |
... |
Arguments passed to/from methods |
This function replaces the now-defunct coin2Excel()
from COINr < v1.0.
An Excel spreadsheet.
# see individual method documentation
# see individual method documentation
Exports the contents of the coin to Excel. This writes all data frames inside the coin to Excel, with each data frame on a separate tab. Tabs are named according to the position in the coin object. You can write other data frames by simply attaching them to the coin object somewhere.
## S3 method for class 'coin' export_to_excel(x, fname = "coin_export.xlsx", include_log = FALSE, ...)
## S3 method for class 'coin' export_to_excel(x, fname = "coin_export.xlsx", include_log = FALSE, ...)
x |
A coin class object |
fname |
The file name/path to write to, as a character string |
include_log |
Logical: if |
... |
arguments passed to or from other methods. |
.xlsx file at specified path
## Here we write a COIN to Excel, but this is done to a temporary directory ## to avoid "polluting" the working directory when running automatic tests. ## In a real case, set fname to a directory of your choice. # build example coin up to data treatment step coin <- build_example_coin(up_to = "Treat") # write to Excel in temporary directory export_to_excel(coin, fname = paste0(tempdir(), "\\ASEM_results.xlsx")) # spreadsheet is at: print(paste0(tempdir(), "\\ASEM_results.xlsx")) # now delete temporary file to keep things tidy in testing unlink(paste0(tempdir(),"\\ASEM_results.xlsx"))
## Here we write a COIN to Excel, but this is done to a temporary directory ## to avoid "polluting" the working directory when running automatic tests. ## In a real case, set fname to a directory of your choice. # build example coin up to data treatment step coin <- build_example_coin(up_to = "Treat") # write to Excel in temporary directory export_to_excel(coin, fname = paste0(tempdir(), "\\ASEM_results.xlsx")) # spreadsheet is at: print(paste0(tempdir(), "\\ASEM_results.xlsx")) # now delete temporary file to keep things tidy in testing unlink(paste0(tempdir(),"\\ASEM_results.xlsx"))
Exports the contents of the purse to Excel. This is similar to the coin method export_to_excel.coin()
,
but combines data sets from various time points. It also selectively writes metadata since this may be
spread across multiple coins.
## S3 method for class 'purse' export_to_excel(x, fname = "coin_export.xlsx", include_log = FALSE, ...)
## S3 method for class 'purse' export_to_excel(x, fname = "coin_export.xlsx", include_log = FALSE, ...)
x |
A purse class object |
fname |
The file name/path to write to, as a character string |
include_log |
Logical: if |
... |
arguments passed to or from other methods. |
.xlsx file at specified path
#
#
Helper function for getting correlations between indicators and aggregates. This retrieves subsets of correlation
matrices between different aggregation levels, in different formats. By default, it will return a
long-form data frame, unless make_long = FALSE
. By default, any correlations with a p-value less than 0.05 are
replaced with NA
. See pval
argument to adjust this.
get_corr( coin, dset, iCodes = NULL, Levels = NULL, ..., cortype = "pearson", pval = 0.05, withparent = FALSE, grouplev = NULL, make_long = TRUE, use_directions = FALSE )
get_corr( coin, dset, iCodes = NULL, Levels = NULL, ..., cortype = "pearson", pval = 0.05, withparent = FALSE, grouplev = NULL, make_long = TRUE, use_directions = FALSE )
coin |
A coin class coin object |
dset |
The name of the data set to apply the function to, which should be accessible in |
iCodes |
An optional list of character vectors where the first entry specifies the indicator/aggregate
codes to correlate against the second entry (also a specification of indicator/aggregate codes). If this is specified as a character vector
it will coerced to the first entry of a list, i.e. |
Levels |
The aggregation levels to take the two groups of indicators from. See |
... |
Further arguments to be passed to |
cortype |
The type of correlation to calculate, either |
pval |
The significance level for including correlations. Correlations with |
withparent |
If |
grouplev |
The aggregation level to group correlations by if |
make_long |
Logical: if |
use_directions |
Logical: if |
This function allows you to obtain correlations between any subset of indicators or aggregates, from
any data set present in a coin. Indicator selection is performed using get_data()
. Two different
indicator sets can be correlated against each other by specifying iCodes
and Levels
as vectors.
The correlation type can be specified by the cortype
argument, which is passed to stats::cor()
.
The withparent
argument will optionally only return correlations which correspond to the structure
of the index. For example, if Levels = c(1,2)
(i.e. we wish to correlate indicators from Level 1 with
aggregates from Level 2), and we set withparent = TRUE
, only the correlations between each indicator
and its parent group will be returned (not correlations between indicators and other aggregates to which
it does not belong). This can be useful to check whether correlations of an indicator/aggregate with
any of its parent groups exceeds or falls below thresholds.
Similarly, the grouplev
argument can be used to restrict correlations to within groups corresponding
to the index structure. Setting e.g. grouplev = 2
will only return correlations within the groups
defined at Level 2.
The grouplev
and withparent
options are disabled if make_long = FALSE
.
Note that this function can only call correlations within the same data set (i.e. only one data set in .$Data
).
This function replaces the now-defunct getCorr()
from COINr < v1.0.
A data frame of pairwise correlation values in wide or long format (see make_long
).
Correlations with will be returned as
NA
.
plot_corr()
Plot correlation matrices of indicator subsets
# build example coin coin <- build_example_coin(up_to = "new_coin", quietly = TRUE) # get correlations cmat <- get_corr(coin, dset = "Raw", iCodes = list("Environ"), Levels = 1, make_long = FALSE)
# build example coin coin <- build_example_coin(up_to = "new_coin", quietly = TRUE) # get correlations cmat <- get_corr(coin, dset = "Raw", iCodes = list("Environ"), Levels = 1, make_long = FALSE)
This returns a data frame of any highly correlated indicators within the same aggregation group. The level of the aggregation
grouping can be controlled by the grouplev
argument.
get_corr_flags( coin, dset, cor_thresh = 0.9, thresh_type = "high", cortype = "pearson", grouplev = NULL, roundto = 3, use_directions = FALSE )
get_corr_flags( coin, dset, cor_thresh = 0.9, thresh_type = "high", cortype = "pearson", grouplev = NULL, roundto = 3, use_directions = FALSE )
coin |
A coin class object |
dset |
The name of the data set to apply the function to, which should be accessible in |
cor_thresh |
A threshold to flag high correlation. Default 0.9. |
thresh_type |
Either |
cortype |
The type of correlation, either |
grouplev |
The level to group indicators in. E.g. if |
roundto |
Number of decimal places to round correlations to. Default 3. Set |
use_directions |
Logical: if |
This function is motivated by the idea that having very highly-correlated indicators within the same group may amount to double counting, or possibly redundancy in the framework.
This function replaces the now-defunct hicorrSP()
from COINr < v1.0.
A data frame with one entry for every indicator pair that is highly correlated within the same group, at the specified level. Pairs are only reported once, i.e. only uses the upper triangle of the correlation matrix.
# build example coin coin <- build_example_coin(up_to = "Normalise", quietly = TRUE) # get correlations between indicator over 0.75 within level 2 groups get_corr_flags(coin, dset = "Normalised", cor_thresh = 0.75, thresh_type = "high", grouplev = 2)
# build example coin coin <- build_example_coin(up_to = "Normalise", quietly = TRUE) # get correlations between indicator over 0.75 within level 2 groups get_corr_flags(coin, dset = "Normalised", cor_thresh = 0.75, thresh_type = "high", grouplev = 2)
Calculates Cronbach's alpha, a measure of statistical reliability. Cronbach's alpha is a simple measure
of "consistency" of a data set, where a high value implies higher reliability/consistency. The
selection of indicators via get_data()
allows to calculate the measure on any group of
indicators or aggregates.
get_cronbach(coin, dset, iCodes, Level, ..., use = "pairwise.complete.obs")
get_cronbach(coin, dset, iCodes, Level, ..., use = "pairwise.complete.obs")
coin |
A coin or a data frame containing only numerical columns of data. |
dset |
The name of the data set to apply the function to, which should be accessible in |
iCodes |
Indicator codes to retrieve. If |
Level |
The level in the hierarchy to extract data from. See |
... |
Further arguments passed to |
use |
Argument to pass to stats::cor to calculate the covariance matrix. Default |
This function simply returns Cronbach's alpha. If you want a lot more details on reliability, the 'psych' package has a much more detailed analysis.
This function replaces the now-defunct getCronbach()
from COINr < v1.0.
Cronbach alpha as a numerical value.
# build example coin coin <- build_example_coin(up_to = "new_coin", quietly = TRUE) # Cronbach's alpha for the "P2P" group get_cronbach(coin, dset = "Raw", iCodes = "P2P", Level = 1)
# build example coin coin <- build_example_coin(up_to = "new_coin", quietly = TRUE) # Cronbach's alpha for the "P2P" group get_cronbach(coin, dset = "Raw", iCodes = "P2P", Level = 1)
A helper function to retrieve a named data set from coin or purse objects. See individual method documentation:
get_data(x, ...)
get_data(x, ...)
x |
A coin or purse |
... |
Arguments passed to methods |
This function replaces the now-defunct getIn()
from COINr < v1.0.
Data frame of indicator data, indexed also by time if input is a purse.
# see individual method documentation
# see individual method documentation
Generic function for getting the data availability of each unit (row).
get_data_avail(x, ...)
get_data_avail(x, ...)
x |
Either a coin or a data frame |
... |
Arguments passed to other methods |
See method documentation:
See also vignettes: vignette("analysis")
and vignette("imputation")
.
Returns a list of data frames: the data availability of each unit (row) in a given data set, as well as percentage of zeros. A second data frame gives data availability by aggregation (indicator) groups.
## S3 method for class 'coin' get_data_avail(x, dset, out2 = "coin", ...)
## S3 method for class 'coin' get_data_avail(x, dset, out2 = "coin", ...)
x |
A coin |
dset |
String indicating name of data set in |
out2 |
Either |
... |
arguments passed to or from other methods. |
This function ignores any non-numeric columns, and returns a data availability table of numeric columns with non-numeric columns appended at the beginning.
See also vignettes: vignette("analysis")
and vignette("imputation")
.
An updated coin with data availability tables written in .$Analysis[[dset]]
, or a
list of data availability tables.
# build example coin coin <- build_example_coin(up_to = "new_coin", quietly = TRUE) # get data availability of Raw dset l_dat <- get_data_avail(coin, dset = "Raw", out2 = "list") head(l_dat$Summary, 5)
# build example coin coin <- build_example_coin(up_to = "new_coin", quietly = TRUE) # get data availability of Raw dset l_dat <- get_data_avail(coin, dset = "Raw", out2 = "list") head(l_dat$Summary, 5)
Returns a data frame of the data availability of each unit (row), as well as percentage of zeros. This function ignores any non-numeric columns, and returns a data availability table with non-numeric columns appended at the beginning.
## S3 method for class 'data.frame' get_data_avail(x, ...)
## S3 method for class 'data.frame' get_data_avail(x, ...)
x |
A data frame |
... |
arguments passed to or from other methods. |
See also vignettes: vignette("analysis")
and vignette("imputation")
.
A data frame of data availability statistics for each column of x
.
# data availability of "airquality" data set get_data_avail(airquality)
# data availability of "airquality" data set get_data_avail(airquality)
A flexible function for retrieving data from a coin, from a specified data set. Subsets of data can
be returned based on selection of columns, using the iCodes
and Level
arguments, and by filtering
rowwise using the uCodes
and use_group
arguments. The also_get
argument also allows unit metadata
columns to be attached, such as names, groups, and denominators.
## S3 method for class 'coin' get_data( x, dset, iCodes = NULL, Level = NULL, uCodes = NULL, use_group = NULL, also_get = NULL, ... )
## S3 method for class 'coin' get_data( x, dset, iCodes = NULL, Level = NULL, uCodes = NULL, use_group = NULL, also_get = NULL, ... )
x |
A coin class object |
dset |
The name of the data set to apply the function to, which should be accessible in |
iCodes |
Optional indicator codes to retrieve. If |
Level |
Optionally, the level in the hierarchy to extract data from. See details. |
uCodes |
Optional unit codes to filter rows of the resulting data set. Can also be used in conjunction with groups. See details. |
use_group |
Optional group to filter rows of the data set. Specified as |
also_get |
A character vector specifying any columns to attach to the data set that are not
indicators or aggregates. These will be e.g. |
... |
arguments passed to or from other methods. |
The iCodes
argument can be used to directly select named indicators, i.e. setting iCodes = c("a", "b")
will select indicators "a" and "b", attaching any extra columns specified by also_get
. However,
using this in conjunction with the Level
argument returns named groups of indicators. For example,
setting iCodes = "Group1"
(for e.g. an aggregation group in Level 2) and Level = 1
will return
all indicators in Level 1, belonging to "Group1".
Rows can also be subsetted. The uCodes
argument can be used to select specified units in the same
way as iCodes
. Additionally, the use_group
argument filters to specified groups. If uCodes
is
specified, and use_group
refers to a named group column, then it will return all units in the
groups that the uCodes
belong to. This is useful for putting a unit into context with its peers
based on some grouping variable.
Note that if you want to retrieve a whole data set (with no column/row subsetting), use the
get_dset()
function which should be slightly faster.
A data frame of indicator data according to specifications.
# build full example coin coin <- build_example_coin(up_to = "new_coin", quietly = TRUE) # get all indicators in "Political group x <- get_data(coin, dset = "Raw", iCodes = "Political", Level = 1) head(x, 5) # see vignette("data_selection") for more examples
# build full example coin coin <- build_example_coin(up_to = "new_coin", quietly = TRUE) # get all indicators in "Political group x <- get_data(coin, dset = "Raw", iCodes = "Political", Level = 1) head(x, 5) # see vignette("data_selection") for more examples
This retrieves data from a purse. It functions in a similar way to get_data.coin()
but has the
additional Time
argument to allow selection based on the point(s) in time.
## S3 method for class 'purse' get_data( x, dset, iCodes = NULL, Level = NULL, uCodes = NULL, use_group = NULL, Time = NULL, also_get = NULL, ... )
## S3 method for class 'purse' get_data( x, dset, iCodes = NULL, Level = NULL, uCodes = NULL, use_group = NULL, Time = NULL, also_get = NULL, ... )
x |
A purse class object |
dset |
The name of the data set to apply the function to, which should be accessible in |
iCodes |
Optional indicator codes to retrieve. If |
Level |
Optionally, the level in the hierarchy to extract data from. See details. |
uCodes |
Optional unit codes to filter rows of the resulting data set. Can also be used in conjunction with groups. See details. |
use_group |
Optional group to filter rows of the data set. Specified as |
Time |
Optional time index to extract from a subset of the coins present in the purse. Should be a
vector containing one or more entries in |
also_get |
A character vector specifying any columns to attach to the data set that are not
indicators or aggregates. These will be e.g. |
... |
arguments passed to or from other methods. |
Note that
A data frame of indicator data indexed by a "Time" column.
# build full example purse purse <- build_example_purse(up_to = "new_coin", quietly = TRUE) # get specified indicators for specific years, for specified units get_data(purse, dset = "Raw", iCodes = c("Lang", "Forest"), uCodes = c("AUT", "CHN", "DNK"), Time = c(2019, 2020))
# build full example purse purse <- build_example_purse(up_to = "new_coin", quietly = TRUE) # get specified indicators for specific years, for specified units get_data(purse, dset = "Raw", iCodes = c("Lang", "Forest"), uCodes = c("AUT", "CHN", "DNK"), Time = c(2019, 2020))
Get a data frame containing any correlations between indicators and denominators that exceed a given threshold. This can be useful when whether to denominate an indicator and by what may not be obvious. If an indicator is strongly correlated with a denominator, this may suggest to denominate it by that denominator.
get_denom_corr( coin, dset, cor_thresh = 0.6, cortype = "pearson", nround = 2, use_directions = FALSE )
get_denom_corr( coin, dset, cor_thresh = 0.6, cortype = "pearson", nround = 2, use_directions = FALSE )
coin |
A coin class object. |
dset |
The name of the data set to apply the function to, which should be accessible in |
cor_thresh |
A correlation threshold: the absolute value of any correlations between indicator-denominator pairs above this threshold will be flagged. |
cortype |
The type of correlation: to be passed to the |
nround |
Optional number of decimal places to round correlation values to. Default 2, set |
use_directions |
Logical: if |
A data frame of pairwise correlations that exceed the threshold.
# build example coin coin <- build_example_coin(up_to = "new_coin", quietly = TRUE) # get correlations >0.7 of any indicator with denominators get_denom_corr(coin, dset = "Raw", cor_thresh = 0.7)
# build example coin coin <- build_example_coin(up_to = "new_coin", quietly = TRUE) # get correlations >0.7 of any indicator with denominators get_denom_corr(coin, dset = "Raw", cor_thresh = 0.7)
A helper function to retrieve a named data set from coin or purse objects. See individual documentation on:
get_dset(x, dset, ...)
get_dset(x, dset, ...)
x |
A coin or purse |
dset |
A character string corresponding to a named data set within |
... |
arguments passed to or from other methods. |
Data frame of indicator data, indexed also by time if input is a purse.
# see examples for methods
# see examples for methods
A helper function to retrieve a named data set from the coin object. Also performs input checks at the same time.
## S3 method for class 'coin' get_dset(x, dset, also_get = NULL, ...)
## S3 method for class 'coin' get_dset(x, dset, also_get = NULL, ...)
x |
A coin class object |
dset |
A character string corresponding to a named data set within |
also_get |
A character vector specifying any columns to attach to the data set that are not
indicators or aggregates. These will be e.g. |
... |
arguments passed to or from other methods. |
If also_get
is not specified, this will return the indicator columns with the uCode
identifiers
in the first column. Optionally, also_get
can be specified to attach other metadata columns, or
to only return the numeric (indicator) columns with no identifiers. This latter option might be useful
for e.g. examining correlations.
Data frame of indicator data.
# build example coin, just up to raw dset for speed coin <- build_example_coin(up_to = "new_coin", quietly = TRUE) # retrieve raw data set with added cols get_dset(coin, dset = "Raw", also_get = c("uName", "GDP_group"))
# build example coin, just up to raw dset for speed coin <- build_example_coin(up_to = "new_coin", quietly = TRUE) # retrieve raw data set with added cols get_dset(coin, dset = "Raw", also_get = c("uName", "GDP_group"))
A helper function to retrieve a named data set from a purse object. Retrieves the specified data set
from each coin in the purse and joins them together in a single data frame using rbind()
, indexed
with a Time
column.
## S3 method for class 'purse' get_dset(x, dset, Time = NULL, also_get = NULL, ...)
## S3 method for class 'purse' get_dset(x, dset, Time = NULL, also_get = NULL, ...)
x |
A purse class object |
dset |
A character string corresponding to a named data set within each coin |
Time |
Optional time index to extract from a subset of the coins present in the purse. Should be a
vector containing one or more entries in |
also_get |
A character vector specifying any columns to attach to the data set that are not
indicators or aggregates. These will be e.g. |
... |
arguments passed to or from other methods. |
Data frame of indicator data.
# build example purse purse <- build_example_purse(up_to = "new_coin", quietly = TRUE) # get raw data set df1 <- get_dset(purse, dset = "Raw")
# build example purse purse <- build_example_purse(up_to = "new_coin", quietly = TRUE) # get raw data set df1 <- get_dset(purse, dset = "Raw")
Calculates the "effective weight" of each indicator and aggregate at the index level. The effective weight is calculated
as the final weight of each component in the index, and this is due to not just to its own weight, but also to the weights of
each aggregation that it is involved in, plus the number of indicators/aggregates in each group. The effective weight
is one way of understanding the final contribution of each indicator to the index. See also vignette("weights")
.
get_eff_weights(coin, out2 = "df")
get_eff_weights(coin, out2 = "df")
coin |
A coin class object |
out2 |
Either |
This function replaces the now-defunct effectiveWeight()
from COINr < v1.0.
Either an iMeta data frame with effective weights as an added column, or an updated coin with effective
weights added to .$Meta$Ind
.
# build example coin coin <- build_example_coin(up_to = "new_coin", quietly = TRUE) # get effective weights as data frame w_eff <- get_eff_weights(coin, out2 = "df") head(w_eff)
# build example coin coin <- build_example_coin(up_to = "new_coin", quietly = TRUE) # get effective weights as data frame w_eff <- get_eff_weights(coin, out2 = "df") head(w_eff)
Given a data frame of weights, this function returns multiple replicates of the weights, with added noise. This is intended for use in uncertainty and sensitivity analysis.
get_noisy_weights(w, noise_specs, Nrep)
get_noisy_weights(w, noise_specs, Nrep)
w |
A data frame of weights, in the format found in |
noise_specs |
a data frame with columns:
|
Nrep |
The number of weight replications to generate. |
Weights are expected to be in a data frame format with columns Level
, iCode
and Weight
, as
used in iMeta
. Note that no NA
s are allowed anywhere in the data frame.
Noise is added using the noise_specs
argument, which is specified by a data frame with columns
Level
and NoiseFactor
. The aggregation level refers to number of the aggregation level to target
while the NoiseFactor
refers to the size of the perturbation. If e.g. a row is Level = 1
and
NoiseFactor = 0.2
, this will allow the weights in aggregation level 1 to deviate by +/- 20% of their
nominal values (the values in w
).
This function replaces the now-defunct noisyWeights()
from COINr < v1.0.
A list of Nrep
sets of weights (data frames).
get_sensitivity()
Perform global sensitivity or uncertainty analysis on a COIN
# build example coin coin <- build_example_coin(up_to = "new_coin", quietly = TRUE) # get nominal weights w_nom <- coin$Meta$Weights$Original # build data frame specifying the levels to apply the noise at # here we vary at levels 2 and 3 noise_specs = data.frame(Level = c(2,3), NoiseFactor = c(0.25, 0.25)) # get 100 replications noisy_wts <- get_noisy_weights(w = w_nom, noise_specs = noise_specs, Nrep = 100) # examine one of the noisy weight sets, last few rows tail(noisy_wts[[1]])
# build example coin coin <- build_example_coin(up_to = "new_coin", quietly = TRUE) # get nominal weights w_nom <- coin$Meta$Weights$Original # build data frame specifying the levels to apply the noise at # here we vary at levels 2 and 3 noise_specs = data.frame(Level = c(2,3), NoiseFactor = c(0.25, 0.25)) # get 100 replications noisy_wts <- get_noisy_weights(w = w_nom, noise_specs = noise_specs, Nrep = 100) # examine one of the noisy weight sets, last few rows tail(noisy_wts[[1]])
This function provides optimised weights to agree with a pre-specified vector of "target importances".
get_opt_weights( coin, itarg = NULL, dset, Level, cortype = "pearson", optype = "balance", toler = NULL, maxiter = NULL, weights_to = NULL, out2 = "list" )
get_opt_weights( coin, itarg = NULL, dset, Level, cortype = "pearson", optype = "balance", toler = NULL, maxiter = NULL, weights_to = NULL, out2 = "list" )
coin |
coin object |
itarg |
a vector of (relative) target importances. For example, |
dset |
Name of the aggregated data set found in |
Level |
The aggregation level to apply the weight adjustment to. This can only be one level. |
cortype |
The type of correlation to use - can be either |
optype |
The optimisation type. Either |
toler |
Tolerance for convergence. Defaults to 0.1 (decrease for more accuracy, increase if convergence problems). |
maxiter |
Maximum number of iterations. Default 500. |
weights_to |
Name to write the optimised weight set to, if |
out2 |
Where to output the results. If |
This is a linear version of the weight optimisation proposed in this paper: doi:10.1016/j.ecolind.2017.03.056. Weights are optimised to agree with a pre-specified vector of "importances". The optimised weights are returned back to the coin.
See vignette("weights")
for more details on the usage of this function and an explanation of the underlying
method. Note that this function calculates correlations without considering statistical significance.
This function replaces the now-defunct weightOpt()
from COINr < v1.0.
If out2 = "coin"
returns an updated coin object with a new set of weights in .$Meta$Weights
, plus
details of the optimisation in .$Analysis
.
Else if out2 = "list"
the same outputs (new weights plus details of optimisation) are wrapped in a list.
# build example coin coin <- build_example_coin(quietly = TRUE) # check correlations between level 3 and index get_corr(coin, dset = "Aggregated", Levels = c(3, 4)) # optimise weights at level 3 l_opt <- get_opt_weights(coin, itarg = "equal", dset = "Aggregated", Level = 3, weights_to = "OptLev3", out2 = "list") # view results tail(l_opt$WeightsOpt) l_opt$CorrResultsNorm
# build example coin coin <- build_example_coin(quietly = TRUE) # check correlations between level 3 and index get_corr(coin, dset = "Aggregated", Levels = c(3, 4)) # optimise weights at level 3 l_opt <- get_opt_weights(coin, itarg = "equal", dset = "Aggregated", Level = 3, weights_to = "OptLev3", out2 = "list") # view results tail(l_opt$WeightsOpt) l_opt$CorrResultsNorm
Performs Principle Component Analysis (PCA) on a specified data set and subset of indicators or aggregation groups.
This function has two main outputs: the output(s) of stats::prcomp()
, and optionally the weights resulting from
the PCA. Therefore it can be used as an analysis tool and/or a weighting tool. For the weighting aspect, please
see the details below.
get_PCA( coin, dset = "Raw", iCodes = NULL, Level = NULL, by_groups = TRUE, nowarnings = FALSE, weights_to = NULL, out2 = "list" )
get_PCA( coin, dset = "Raw", iCodes = NULL, Level = NULL, by_groups = TRUE, nowarnings = FALSE, weights_to = NULL, out2 = "list" )
coin |
A coin |
dset |
The name of the data set in |
iCodes |
An optional character vector of indicator codes to subset the indicator data, passed to |
Level |
The aggregation level to take indicator data from. Integer from 1 (indicator level) to N (top aggregation level, typically the index). |
by_groups |
If |
nowarnings |
If |
weights_to |
A string to name the resulting set of weights. If this is specified, and |
out2 |
If the input is a coin object, this controls where to send the output. If |
PCA must be approached with care and an understanding of what is going on. First, let's consider the PCA excluding the weighting component. PCA takes a set of data consisting of variables (indicators) and observations. It then rotates the coordinate system such that in the new coordinate system, the first axis (called the first principal component (PC)) aligns with the direction of maximum variance of the data set. The amount of variance explained by the first PC, and by the next several PCs, can help to understand whether the data can be explained by simpler set of variables. PCA is often used for dimensionality reduction in modelling, for example.
In the context of composite indicators, PCA can be used first as an analysis tool. We can check for example, within an aggregation group, can the indicators mostly be explained by one PC? If so, this gives a little extra justification to aggregating the indicators because the information lost in aggregation will be less. We can also check this over the entire set of indicators.
The complications are in a composite indicator, the indicators are grouped and arranged into a hierarchy. This means
that when performing a PCA, we have to decide which level to perform it at, and which groupings to use, if any. The get_PCA()
function, using the by_groups
argument, allows to automatically apply PCA by group if this is required.
The output of get_PCA()
is a PCA object for each of the groups specified, which can then be examined using existing
tools in R, see vignette("analysis")
.
The other output of get_PCA()
is a set of "PCA weights" if the weights_to
argument is specified. Here we also need
to say some words of caution. First, what constitutes "PCA weights" in composite indicators is not very well-defined.
In COINr, a simple option is adopted. That is, the loadings of the first principal component are taken as the weights.
The logic here is that these loadings should maximise the explained variance - the implication being that if we use
these as weights in an aggregation, we should maximise the explained variance and hence the information passed from
the indicators to the aggregate value. This is a nice property in a composite indicator, where one of the aims is to
represent many indicators by single composite. See doi:10.1016/j.envsoft.2021.105208 for a
discussion on this.
But. The weights that result from PCA have a number of downsides. First, they can often include negative weights which can be hard to justify. Also PCA may arbitrarily flip the axes (since from a variance point of view the direction is not important). In the quest for maximum variance, PCA will also weight the strongest-correlating indicators the highest, which means that other indicators may be neglected. In short, it often results in a very unbalanced set of weights. Moreover, PCA can only be performed on one level at a time.
All these considerations point to the fact: while PCA as an analysis tool is well-established, please use PCA weights with care and understanding of what is going on.
This function replaces the now-defunct getPCA()
from COINr < v1.0.
If out2 = "coin"
, results are appended to the coin object. Specifically:
A list is added to .$Analysis
containing PCA weights (loadings) of the first principle component, and the output of stats::prcomp, for each
aggregation group found in the targeted level.
If weights_to
is specified, a new set of PCA weights is added to .$Meta$Weights
If out2 = "list"
the same outputs are contained in a list.
stats::prcomp Principle component analysis
# build example coin coin <- build_example_coin(up_to = "new_coin", quietly = TRUE) # PCA on "Sust" group of indicators l_pca <- get_PCA(coin, dset = "Raw", iCodes = "Sust", out2 = "list", nowarnings = TRUE) # Summary of results for one of the sub-groups summary(l_pca$PCAresults$Social$PCAres)
# build example coin coin <- build_example_coin(up_to = "new_coin", quietly = TRUE) # PCA on "Sust" group of indicators l_pca <- get_PCA(coin, dset = "Raw", iCodes = "Sust", out2 = "list", nowarnings = TRUE) # Summary of results for one of the sub-groups summary(l_pca$PCAresults$Social$PCAres)
This is a stripped down version of the "cor.mtest()" function from the "corrplot" package. It uses
the stats::cor.test()
function to calculate pairwise p-values. Unlike the corrplot version, this
only calculates p-values, and not confidence intervals. Credit to corrplot for this code, I only
replicate it here to avoid depending on their package for a single function.
get_pvals(X, ...)
get_pvals(X, ...)
X |
A numeric matrix or data frame |
... |
Additional arguments passed to function |
Matrix of p-values
# a matrix of random numbers, 3 cols x <- matrix(runif(30), 10, 3) # get correlations between cols cor(x) # get p values of correlations between cols get_pvals(x)
# a matrix of random numbers, 3 cols x <- matrix(runif(30), 10, 3) # get correlations between cols cor(x) # get p values of correlations between cols get_pvals(x)
Generates fast results tables, either attached to the coin or as a data frame.
get_results( coin, dset, tab_type = "Summ", also_get = NULL, use = "scores", order_by = NULL, nround = 2, use_group = NULL, dset_indicators = NULL, out2 = "df" )
get_results( coin, dset, tab_type = "Summ", also_get = NULL, use = "scores", order_by = NULL, nround = 2, use_group = NULL, dset_indicators = NULL, out2 = "df" )
coin |
The coin object, or a data frame of indicator data |
dset |
Name of data set in |
tab_type |
The type of table to generate. Either |
also_get |
Names of further columns to attach to table. |
use |
Either |
order_by |
A code of the indicator or aggregate to sort the table by. If not specified, defaults to the highest
aggregate level, i.e. the index in most cases. If |
nround |
The number of decimal places to round numerical values to. Defaults to 2. |
use_group |
An optional grouping variable. If specified, the results table includes this group column,
and if |
dset_indicators |
Optional data set from which to take only indicator (level 1) data from. This can be set to |
out2 |
If |
Although results are available in a coin in .$Data
, the format makes it difficult to quickly present results. This function
generates results tables that are suitable for immediate presentation, i.e. sorted by index or other indicators, and only including
relevant columns. Scores are also rounded by default, and there is the option to present scores or ranks.
See also vignette("results")
for more info.
This function replaces the now-defunct getResults()
from COINr < v1.0.
If out2 = "df"
, the results table is returned as a data frame. If out2 = "coin"
, this function returns an updated
coin with the results table attached to .$Results
.
# build full example coin coin <- build_example_coin(quietly = TRUE) # get results table df_results <- get_results(coin, dset = "Aggregated", tab_type = "Aggs") head(df_results)
# build full example coin coin <- build_example_coin(quietly = TRUE) # get results table df_results <- get_results(coin, dset = "Aggregated", tab_type = "Aggs") head(df_results)
This function performs global sensitivity and uncertainty analysis of a coin. You must specify which parameters of the coin to vary, and the alternatives/distributions for those parameters.
get_sensitivity( coin, SA_specs, N, SA_type = "UA", dset, iCode, Nboot = NULL, quietly = FALSE, check_addresses = TRUE, diagnostic_mode = FALSE )
get_sensitivity( coin, SA_specs, N, SA_type = "UA", dset, iCode, Nboot = NULL, quietly = FALSE, check_addresses = TRUE, diagnostic_mode = FALSE )
coin |
A coin |
SA_specs |
Specifications of the input uncertainties |
N |
The number of regenerations |
SA_type |
The type of analysis to run. |
dset |
The data set to extract the target variable from (passed to |
iCode |
The variable within |
Nboot |
Number of bootstrap samples to take when estimating confidence intervals on sensitivity indices. |
quietly |
Set to |
check_addresses |
Logical: if |
diagnostic_mode |
Logical: if |
COINr implements a flexible variance-based global sensitivity analysis approach, which allows almost any assumption to be varied, as long as the distribution of alternative values can be described. Variance-based "sensitivity indices" are estimated using a Monte Carlo design (running the composite indicator many times with a particular combination of input values). This follows the methodology described in doi:10.1111/j.1467-985X.2005.00350.x.
To understand how this function works, please see vignette("sensitivity")
. Here, we briefly recap the main input
arguments.
First, you can select whether to run an uncertainty analysis SA_type = "UA"
or sensitivity analysis SA_type = "SA"
.
The number of replications (regenerations of the coin) is specified by N
. Keep in mind that the total number of
replications is N
for an uncertainty analysis but is N*(d + 2)
for a sensitivity analysis due to the experimental
design used.
To run either types of analysis, you must specify which parts of the coin to vary and what the distributions/alternatives are
This is done using SA_specs
, a structured list. See vignette("sensitivity")
for details and examples.
You also need to specify the target of the sensitivity analysis. This should be an indicator or aggregate that can be
found in one of the data sets of the coin, and is specified using the dset
and iCode
arguments.
If SA_type = "SA"
, it is advisable to set Nboot
to e.g. 100 or more, which is the number of bootstrap samples
to take when estimating confidence intervals on sensitivity indices. This does not perform extra regenerations of the
coin, so setting this to a higher number shouldn't have much impact on computational time.
If you want to understand what is going on more deeply in the regenerated coins in the sensitivity analysis, set
diagnostic_mode = TRUE
in get_sensitivity()
. This will additionally output a list containing every coin that was generated
as part of the sensitivity analysis, and allows you to check in detail whether the coins are generated as you expect.
Clearly it is better to run this on a low sample size as the output can potentially become quite large.
This function replaces the now-defunct sensitivity()
from COINr < v1.0.
Sensitivity analysis results as a list, containing:
.$Scores
a data frame with a row for each unit, and columns are the scores for each replication.
.$Ranks
as .$Scores
but for unit ranks
.$RankStats
summary statistics for ranks of each unit
.$Para
a list containing parameter values for each run
.$Nominal
the nominal scores and ranks of each unit (i.e. from the original COIN)
.$Sensitivity
(only if SA_type = "SA"
) sensitivity indices for each parameter. Also confidence intervals if Nboot
.$coins
(only if diagnostic_mode = TRUE
) a list of all coins generated during the sensitivity analysis
Some information on the time elapsed, average time, and the parameters perturbed.
Depending on the setting of store_results
, may also contain a list of Methods or a list of COINs for each replication.
# for examples, see `vignette("sensitivity")` # (this is because package examples are run automatically and this function can # take a few minutes to run at realistic settings)
# for examples, see `vignette("sensitivity")` # (this is because package examples are run automatically and this function can # take a few minutes to run at realistic settings)
Generic function for reports various statistics from a data frame or coin. See method documentation:
get_stats(x, ...)
get_stats(x, ...)
x |
Object (data frame or coin) |
... |
Further arguments to be passed to methods. |
See also vignette("analysis")
.
This function replaces the now-defunct getStats()
from COINr < v1.0.
A data frame of statistics for each column
# see individual method documentation
# see individual method documentation
Given a coin and a specified data set (dset
), returns a table of statistics with entries for each column.
## S3 method for class 'coin' get_stats( x, dset, t_skew = 2, t_kurt = 3.5, t_avail = 0.65, t_zero = 0.5, t_unq = 0.5, nsignif = 3, out2 = "df", ... )
## S3 method for class 'coin' get_stats( x, dset, t_skew = 2, t_kurt = 3.5, t_avail = 0.65, t_zero = 0.5, t_unq = 0.5, nsignif = 3, out2 = "df", ... )
x |
A coin |
dset |
A data set present in |
t_skew |
Absolute skewness threshold. See details. |
t_kurt |
Kurtosis threshold. See details. |
t_avail |
Data availability threshold. See details. |
t_zero |
A threshold between 0 and 1 for flagging indicators with high proportion of zeroes. See details. |
t_unq |
A threshold between 0 and 1 for flagging indicators with low proportion of unique values. See details.plot |
nsignif |
Number of significant figures to round the output table to. |
out2 |
Either |
... |
arguments passed to or from other methods. |
The statistics (columns in the output table) are as follows (entries correspond to each column):
Min
: the minimum
Max
: the maximum
Mean
: the (arirthmetic) mean
Median
: the median
Std
: the standard deviation
Skew
: the skew
Kurt
: the kurtosis
N.Avail
: the number of non-NA
values
N.NonZero
: the number of non-zero values
N.Unique
: the number of unique values
Frc.Avail
: the fraction of non-NA
values
Frc.NonZero
: the fraction of non-zero values
Frc.Unique
: the fraction of unique values
Flag.Avail
: a data availability flag - columns with Frc.Avail < t_avail
will be flagged as "LOW"
, else "ok"
.
Flag.NonZero
: a flag for columns with a high proportion of zeros. Any columns with Frc.NonZero < t_zero
are
flagged as "LOW"
, otherwise "ok"
.
Flag.Unique
: a unique value flag - any columns with Frc.Unique < t_unq
are flagged as "LOW"
, otherwise "ok"
.
Flag.SkewKurt
: a skew and kurtosis flag which is an indication of possible outliers. Any columns with
abs(Skew) > t_skew
AND Kurt > t_kurt
are flagged as "OUT"
, otherwise "ok"
.
The aim of this table, among other things, is to check the basic statistics of each column/indicator, and identify
any possible issues for each indicator. For example, low data availability, having a high proportion of zeros and/or
a low proportion of unique values. Further, the combination of skew and kurtosis (i.e. the Flag.SkewKurt
column)
is a simple test for possible outliers, which may require treatment using Treat()
.
The table can be returned either to the coin or as a standalone data frame - see out2
.
See also vignette("analysis")
.
Either a data frame or updated coin - see out2
.
# build example coin coin <- build_example_coin(up_to = "new_coin", quietly = TRUE) # get table of indicator statistics for raw data set get_stats(coin, dset = "Raw", out2 = "df")
# build example coin coin <- build_example_coin(up_to = "new_coin", quietly = TRUE) # get table of indicator statistics for raw data set get_stats(coin, dset = "Raw", out2 = "df")
Takes a data frame and returns a table of statistics with entries for each column.
## S3 method for class 'data.frame' get_stats( x, t_skew = 2, t_kurt = 3.5, t_avail = 0.65, t_zero = 0.5, t_unq = 0.5, nsignif = 3, ... )
## S3 method for class 'data.frame' get_stats( x, t_skew = 2, t_kurt = 3.5, t_avail = 0.65, t_zero = 0.5, t_unq = 0.5, nsignif = 3, ... )
x |
A data frame with only numeric columns. |
t_skew |
Absolute skewness threshold. See details. |
t_kurt |
Kurtosis threshold. See details. |
t_avail |
Data availability threshold. See details. |
t_zero |
A threshold between 0 and 1 for flagging indicators with high proportion of zeroes. See details. |
t_unq |
A threshold between 0 and 1 for flagging indicators with low proportion of unique values. See details. |
nsignif |
Number of significant figures to round the output table to. |
... |
arguments passed to or from other methods. |
The statistics (columns in the output table) are as follows (entries correspond to each column):
Min
: the minimum
Max
: the maximum
Mean
: the (arirthmetic) mean
Median
: the median
Std
: the standard deviation
Skew
: the skew
Kurt
: the kurtosis
N.Avail
: the number of non-NA
values
N.NonZero
: the number of non-zero values
N.Unique
: the number of unique values
Frc.Avail
: the fraction of non-NA
values
Frc.NonZero
: the fraction of non-zero values
Frc.Unique
: the fraction of unique values
Flag.Avail
: a data availability flag - columns with Frc.Avail < t_avail
will be flagged as "LOW"
, else "ok"
.
Flag.NonZero
: a flag for columns with a high proportion of zeros. Any columns with Frc.NonZero < t_zero
are
flagged as "LOW"
, otherwise "ok"
.
Flag.Unique
: a unique value flag - any columns with Frc.Unique < t_unq
are flagged as "LOW"
, otherwise "ok"
.
Flag.SkewKurt
: a skew and kurtosis flag which is an indication of possible outliers. Any columns with
abs(Skew) > t_skew
AND Kurt > t_kurt
are flagged as "OUT"
, otherwise "ok"
.
The aim of this table, among other things, is to check the basic statistics of each column/indicator, and identify
any possible issues for each indicator. For example, low data availability, having a high proportion of zeros and/or
a low proportion of unique values. Further, the combination of skew and kurtosis (i.e. the Flag.SkewKurt
column)
is a simple test for possible outliers, which may require treatment using Treat()
.
See also vignette("analysis")
.
A data frame of statistics for each column
# stats of mtcars get_stats(mtcars)
# stats of mtcars get_stats(mtcars)
Generates a table of strengths and weaknesses for a selected unit, based on ranks, or ranks within a specified grouping variable.
get_str_weak( coin, dset, usel = NULL, topN = 5, bottomN = 5, withcodes = TRUE, use_group = NULL, unq_discard = NULL, min_discard = TRUE, report_level = NULL, with_units = TRUE, adjust_direction = NULL, sig_figs = 3 )
get_str_weak( coin, dset, usel = NULL, topN = 5, bottomN = 5, withcodes = TRUE, use_group = NULL, unq_discard = NULL, min_discard = TRUE, report_level = NULL, with_units = TRUE, adjust_direction = NULL, sig_figs = 3 )
coin |
A coin |
dset |
The data set to extract indicator data from, to use as strengths and weaknesses. |
usel |
A selected unit code |
topN |
The top N indicators to report |
bottomN |
The bottom N indicators to report |
withcodes |
If |
use_group |
An optional grouping variable to use for reporting
in-group ranks. Specifying this will report the ranks of the selected unit within the group of |
unq_discard |
Optional parameter for handling discrete indicators. Some indicators may be binary
variables of the type "yes = 1", "no = 0". These may be picked up as strengths or weaknesses, when they
may not be wanted to be highlighted, since e.g. maybe half of units will have a zero or a one. This argument
takes a number between 0 and 1 specifying a unique value threshold for ignoring indicators as strengths. E.g.
setting |
min_discard |
If |
report_level |
Aggregation level to report parent codes from. For example, setting
|
with_units |
If |
adjust_direction |
If |
sig_figs |
Number of significant figures to round values to. If |
This currently only works at the indicator level. Indicators with NA
values for the selected unit are ignored.
Strengths and weaknesses mean the topN
-ranked indicators for the selected unit. Effectively, this takes the rank that the
selected unit has in each indicator, sorts the ranks, and takes the top N highest and lowest.
This function must be used with a little care: indicators should be adjusted for their directions before use,
otherwise a weakness might be counted as a strength, and vice versa. Use the adjust_direction
parameter
to help here.
A further useful parameter is unq_discard
, which also filters out any indicators with a low number of
unique values, based on a specified threshold. Also min_discard
which filters out any indicators which
have the minimum rank.
The best way to use this function is to play around with the settings a little bit. The reason being that
in practice, indicators have very different distributions and these can sometimes lead to unexpected
outcomes. An example is if you have an indicator with 50% zero values, and the rest non-zero (but unique).
Using the sport ranking system, all units with zero values will receive a rank which is equal to the number
of units divided by two. This then might be counted as a "strength" for some units with overall low scores.
But a zero value can hardly be called a strength. This is where the min_discard
function can help out.
Problems such as these mainly arise when e.g. generating a large number of country profiles.
This function replaces the now-defunct getStrengthNWeak()
from COINr < v1.0.
A list containing a data frame .$Strengths
, and a data frame .$Weaknesses
.
Each data frame has columns with indicator code, name, rank and value (for the selected unit).
# build example coin coin <- build_example_coin(up_to = "new_coin", quietly = TRUE) # get strengths and weaknesses for ESP get_str_weak(coin, dset = "Raw", usel = "ESP")
# build example coin coin <- build_example_coin(up_to = "new_coin", quietly = TRUE) # get strengths and weaknesses for ESP get_str_weak(coin, dset = "Raw", usel = "ESP")
Get time trends from a purse object. This function extracts a panel data set from a purse, and calculates trends
for each indicator/unit pair using a specified function f_trend
. For example, if f_trend = "CAGR"
, this extracts
the time series for each indicator/unit pair and passes it to CAGR()
.
get_trends( purse, dset, uCodes = NULL, iCodes = NULL, Time = NULL, use_latest = NULL, f_trend = "CAGR", interp_at = NULL, adjust_directions = FALSE )
get_trends( purse, dset, uCodes = NULL, iCodes = NULL, Time = NULL, use_latest = NULL, f_trend = "CAGR", interp_at = NULL, adjust_directions = FALSE )
purse |
A purse object |
dset |
Name of the data set to extract, passed to |
uCodes |
Optional subset of unit codes to extract, passed to |
iCodes |
Optional subset of indicator/aggregate codes to extract, passed to |
Time |
Optional vector of time points to extract, passed to |
use_latest |
A positive integer which specifies to use only the latest "n" data points. If this is specified, it
overrides |
f_trend |
Function that returns a metric describing the trend of the time series. See details. |
interp_at |
Option to linearly interpolate missing data points in each time series. Must be specified as a vector
of time values where to apply interpolation. If |
adjust_directions |
Logical: if |
This function requires a purse object as an input. The data set is selected using get_data()
, such that a subset
of the data set can be analysed using the uCodes
, iCodes
and Time
arguments. The latter is useful especially
if only a subset of the time series should be analysed.
The function f_trend
is a function that, given a time series, returns a trend metric. This must follow a
specific format. It must of course be available to call, and must have arguments y
and x
, which are
respectively a vector of values and a vector indexing the values in time. See prc_change()
and CAGR()
for examples. The function must return a single value (not a vector with multiple entries, or a list).
The function can return either numeric or character values.
A data frame in long format, with trend metrics for each indicator/unit pair, plus data availability statistics.
#
#
Generates a summary table for a single unit. This is mostly useful in unit reports.
get_unit_summary(coin, usel, Levels, dset = "Aggregated", nround = 2)
get_unit_summary(coin, usel, Levels, dset = "Aggregated", nround = 2)
coin |
A coin |
usel |
A selected unit code |
Levels |
The aggregation levels to display results from. |
dset |
The data set within the coin to extract scores and ranks from |
nround |
Number of decimals to round scores to, default 2. Set to |
This returns the scores and ranks for each indicator/aggregate as specified in aglevs
. It orders the table so that
the highest aggregation levels are first. This means that if the index level is included, it will be first.
This function replaces the now-defunct getUnitSummary()
from COINr < v1.0.
A summary table as a data frame, containing scores and ranks for specified indicators/aggregates.
# build full example coin coin <- build_example_coin(quietly = TRUE) # summary of scores for IND at levels 4, 3 and 2 get_unit_summary(coin, usel = "IND", Levels = c(4,3,2), dset = "Aggregated")
# build full example coin coin <- build_example_coin(quietly = TRUE) # summary of scores for IND at levels 4, 3 and 2 get_unit_summary(coin, usel = "IND", Levels = c(4,3,2), dset = "Aggregated")
Replaces NA
s in a numeric vector with the mean of the non-NA
values.
i_mean(x)
i_mean(x)
x |
A numeric vector |
A numeric vector
x <- c(1,2,3,4, NA) i_mean(x)
x <- c(1,2,3,4, NA) i_mean(x)
Replaces NA
s in a numeric vector with the grouped arithmetic means of the non-NA
values.
Groups are defined by the f
argument.
i_mean_grp(x, f, skip_f_na = TRUE)
i_mean_grp(x, f, skip_f_na = TRUE)
x |
A numeric vector |
f |
A grouping variable, of the same length of |
skip_f_na |
If |
A numeric vector
x <- c(NA, runif(10), NA) f <- c(rep("a", 6), rep("b", 6)) i_mean_grp(x, f)
x <- c(NA, runif(10), NA) f <- c(rep("a", 6), rep("b", 6)) i_mean_grp(x, f)
Replaces NA
s in a numeric vector with the median of the non-NA
values.
i_median(x)
i_median(x)
x |
A numeric vector |
A numeric vector
x <- c(1,2,3,4, NA) i_median(x)
x <- c(1,2,3,4, NA) i_median(x)
Replaces NA
s in a numeric vector with the grouped medians of the non-NA
values.
Groups are defined by the f
argument.
i_median_grp(x, f, skip_f_na = TRUE)
i_median_grp(x, f, skip_f_na = TRUE)
x |
A numeric vector |
f |
A grouping variable, of the same length of |
skip_f_na |
If |
A numeric vector
x <- c(NA, runif(10), NA) f <- c(rep("a", 6), rep("b", 6)) i_median_grp(x, f)
x <- c(NA, runif(10), NA) f <- c(rep("a", 6), rep("b", 6)) i_median_grp(x, f)
Convert iCodes to iNames
icodes_to_inames(coin, iCodes)
icodes_to_inames(coin, iCodes)
coin |
A coin |
iCodes |
A vector of iCodes |
Vector of iNames
The COIN Tool is an Excel-based tool for building composite indicators. This function provides a direct interface for reading a COIN Tool input deck and converting it to COINr. You need to provide a COIN Tool file, with the "Database" sheet properly compiled.
import_coin_tool(fname, makecodes = FALSE, oldtool = FALSE, out2 = "list")
import_coin_tool(fname, makecodes = FALSE, oldtool = FALSE, out2 = "list")
fname |
The file name and path to read, e.g. |
makecodes |
Logical: if |
oldtool |
Logical: if |
out2 |
Either |
This function replaces the now-defunct COINToolIn()
from COINr < v1.0.
Either a list or a coin, depending on out2
## Not run: ## This example downloads a COIN Tool spreadsheet containing example data, ## saves it to a temporary directory, unzips, and reads into R. Finally it ## assembles it into a COIN. # Make temp zip filename in temporary directory tmpz <- tempfile(fileext = ".zip") # Download an example COIN Tool file to temporary directory # NOTE: the download.file() command may need its "method" option set to a # specific value depending on the platform you run this on. You can also # choose to download/unzip this file manually. download.file("https://knowledge4policy.ec.europa.eu/sites/default/ files/coin_tool_v1_lite_exampledata.zip", tmpz) # Unzip CTpath <- unzip(tmpz, exdir = tempdir()) # Read COIN Tool into R l <- import_coin_tool(CTpath, makecodes = TRUE) ## End(Not run)
## Not run: ## This example downloads a COIN Tool spreadsheet containing example data, ## saves it to a temporary directory, unzips, and reads into R. Finally it ## assembles it into a COIN. # Make temp zip filename in temporary directory tmpz <- tempfile(fileext = ".zip") # Download an example COIN Tool file to temporary directory # NOTE: the download.file() command may need its "method" option set to a # specific value depending on the platform you run this on. You can also # choose to download/unzip this file manually. download.file("https://knowledge4policy.ec.europa.eu/sites/default/ files/coin_tool_v1_lite_exampledata.zip", tmpz) # Unzip CTpath <- unzip(tmpz, exdir = tempdir()) # Read COIN Tool into R l <- import_coin_tool(CTpath, makecodes = TRUE) ## End(Not run)
This is a generic function with the following methods:
Impute(x, ...)
Impute(x, ...)
x |
Object to be imputed |
... |
arguments passed to or from other methods. |
See those methods for individual documentation.
This function replaces the now-defunct impute()
from COINr < v1.0.
An object of the same class as x
, but imputed.
# See individual method documentation
# See individual method documentation
Given a data frame of panel data, with a time-index column time_col
and a unit ID column unit_col
, imputes other
columns using the entry from the latest available time point.
impute_panel( iData, time_col = NULL, unit_col = NULL, cols = NULL, imp_type = NULL, max_time = NULL )
impute_panel( iData, time_col = NULL, unit_col = NULL, cols = NULL, imp_type = NULL, max_time = NULL )
iData |
A data frame of indicator data, containing a time index column |
time_col |
The name of a column found in |
unit_col |
The name of a column found in |
cols |
Optionally, a character vector of names of columns to impute. If |
imp_type |
One of |
max_time |
The maximum number of time points to look backwards to impute from. E.g. if |
This presumes that there are multiple observations for each unit code, i.e. one per time point. It then searches for any missing values in the target year, and replaces them with the equivalent points
from previous time points. It will replace using the most recently available point or using linear interpolation: see imp_type
argument.
A list containing:
.$iData_imp
: An iData
format data frame with missing data imputed using previous time points (where possible).
.$DataT
: A data frame in the same format as iData
, where each entry shows which time point each data point
came from.
# Copy example panel data iData_p <- ASEM_iData_p # we introduce two NAs: one for NZ in 2022 in LPI indicator iData_p$LPI[iData_p$uCode == "NZ" & iData_p$Time == 2022] <- NA # one for AT, also in 2022, but for Flights indicator iData_p$Flights[iData_p$uCode == "AT" & iData_p$Time == 2022] <- NA # impute: target only the two columns where NAs introduced l_imp <- impute_panel(iData_p, cols = c("LPI", "Flights")) # get imputed df iData_imp <- l_imp$iData_imp # check the output is what we expect: both NAs introduced should now have 2021 values iData_imp$LPI[iData_imp$uCode == "NZ" & iData_imp$Time == 2022] == ASEM_iData_p$LPI[ASEM_iData_p$uCode == "NZ" & ASEM_iData_p$Time == 2021] iData_imp$Flights[iData_imp$uCode == "AT" & iData_imp$Time == 2022] == ASEM_iData_p$Flights[ASEM_iData_p$uCode == "AT" & ASEM_iData_p$Time == 2021]
# Copy example panel data iData_p <- ASEM_iData_p # we introduce two NAs: one for NZ in 2022 in LPI indicator iData_p$LPI[iData_p$uCode == "NZ" & iData_p$Time == 2022] <- NA # one for AT, also in 2022, but for Flights indicator iData_p$Flights[iData_p$uCode == "AT" & iData_p$Time == 2022] <- NA # impute: target only the two columns where NAs introduced l_imp <- impute_panel(iData_p, cols = c("LPI", "Flights")) # get imputed df iData_imp <- l_imp$iData_imp # check the output is what we expect: both NAs introduced should now have 2021 values iData_imp$LPI[iData_imp$uCode == "NZ" & iData_imp$Time == 2022] == ASEM_iData_p$LPI[ASEM_iData_p$uCode == "NZ" & ASEM_iData_p$Time == 2021] iData_imp$Flights[iData_imp$uCode == "AT" & iData_imp$Time == 2022] == ASEM_iData_p$Flights[ASEM_iData_p$uCode == "AT" & ASEM_iData_p$Time == 2021]
This imputes any NA
s in the data set specified by dset
by invoking the function f_i
and any optional arguments f_i_para
on each column at a time (if
impute_by = "column"
), or on each row at a time (if impute_by = "row"
), or by passing the entire
data frame to f_i
if impute_by = "df"
.
## S3 method for class 'coin' Impute( x, dset, f_i = NULL, f_i_para = NULL, impute_by = "column", use_group = NULL, group_level = NULL, normalise_first = NULL, out2 = "coin", write_to = NULL, disable = FALSE, warn_on_NAs = TRUE, ... )
## S3 method for class 'coin' Impute( x, dset, f_i = NULL, f_i_para = NULL, impute_by = "column", use_group = NULL, group_level = NULL, normalise_first = NULL, out2 = "coin", write_to = NULL, disable = FALSE, warn_on_NAs = TRUE, ... )
x |
A coin class object |
dset |
The name of the data set to apply the function to, which should be accessible in |
f_i |
An imputation function. See details. |
f_i_para |
Further arguments to pass to |
impute_by |
Specifies how to impute: if |
use_group |
Optional grouping variable name to pass to imputation function if this supports group imputation. |
group_level |
A level of the framework to use for grouping indicators. This is only
relevant if |
normalise_first |
Logical: if |
out2 |
Either |
write_to |
Optional character string for naming the data set in the coin. Data will be written to
|
disable |
Logical: if |
warn_on_NAs |
Logical: if |
... |
arguments passed to or from other methods. |
Clearly, the function f_i
needs to be able to accept with the data class passed to it - if
impute_by
is "row"
or "column"
this will be a numeric vector, or if "df"
it will be a data
frame. Moreover, this function should return a vector or data frame identical to the vector/data frame passed to
it except for NA
values, which can be replaced. The function f_i
is not required to replace all NA
values.
COINr has several built-in imputation functions of the form i_*()
for vectors which can be called by Impute()
. See the
online documentation for more details.
When imputing row-wise, prior normalisation of the data is recommended. This is because imputation
will use e.g. the mean of the unit values over all indicators (columns). If the indicators are on
very different scales, the result will likely make no sense. If the indicators are normalised first,
more sensible results can be obtained. There are two options to pre-normalise: first is by setting
normalise_first = TRUE
- this is anyway the default if impute_by = "row"
. In this case, you also
need to supply a vector of directions. The data will then be normalised using a min-max approach
before imputation, followed by the inverse operation to return the data to the original scales.
Another approach which gives more control is to simply run Normalise()
first, and work with the
normalised data from that point onwards. In that case it is better to set normalise_first = FALSE
,
since by default if impute_by = "row"
it will be set to TRUE
.
Checks are made on the format of the data returned by imputation functions, to ensure the
type and that non-NA
values have not been inadvertently altered. This latter check is allowed
a degree of tolerance for numerical precision, controlled by the sfigs
argument. This is because
if the data frame is normalised, and/or depending on the imputation function, there may be a very
small differences. By default sfigs = 9
, meaning that the non-NA
values pre and post-imputation
are compared to 9 significant figures.
See also documentation for Impute.data.frame()
and Impute.numeric()
which are called by this function.
An updated coin with imputed data set at .$Data[[write_to]]
#' # build coin coin <- build_example_coin(up_to = "new_coin") # impute raw data set using population groups # output to data frame directly Impute(coin, dset = "Raw", f_i = "i_mean_grp", use_group = "Pop_group", out2 = "df")
#' # build coin coin <- build_example_coin(up_to = "new_coin") # impute raw data set using population groups # output to data frame directly Impute(coin, dset = "Raw", f_i = "i_mean_grp", use_group = "Pop_group", out2 = "df")
Impute a data frame using any function, either column-wise, row-wise or by the whole data frame in one shot.
## S3 method for class 'data.frame' Impute( x, f_i = NULL, f_i_para = NULL, impute_by = "column", normalise_first = NULL, directions = NULL, warn_on_NAs = TRUE, ... )
## S3 method for class 'data.frame' Impute( x, f_i = NULL, f_i_para = NULL, impute_by = "column", normalise_first = NULL, directions = NULL, warn_on_NAs = TRUE, ... )
x |
A data frame with only numeric columns. |
f_i |
A function to use for imputation. By default, imputation is performed by simply substituting
the mean of non- |
f_i_para |
Any additional parameters to pass to |
impute_by |
Specifies how to impute: if |
normalise_first |
Logical: if |
directions |
A vector of directions: either -1 or 1 to indicate the direction of each column
of |
warn_on_NAs |
Logical: if |
... |
arguments passed to or from other methods. |
This function only accepts data frames with all numeric columns. It imputes any NA
s in the data frame
by invoking the function f_i
and any optional arguments f_i_para
on each column at a time (if
impute_by = "column"
), or on each row at a time (if impute_by = "row"
), or by passing the entire
data frame to f_i
if impute_by = "df"
.
Clearly, the function f_i
needs to be able to accept with the data class passed to it - if
impute_by
is "row"
or "column"
this will be a numeric vector, or if "df"
it will be a data
frame. Moreover, this function should return a vector or data frame identical to the vector/data frame passed to
it except for NA
values, which can be replaced. The function f_i
is not required to replace all NA
values.
COINr has several built-in imputation functions of the form i_*()
for vectors which can be called by Impute()
. See the
online documentation for more details.
When imputing row-wise, prior normalisation of the data is recommended. This is because imputation
will use e.g. the mean of the unit values over all indicators (columns). If the indicators are on
very different scales, the result will likely make no sense. If the indicators are normalised first,
more sensible results can be obtained. There are two options to pre-normalise: first is by setting
normalise_first = TRUE
- this is anyway the default if impute_by = "row"
. In this case, you also
need to supply a vector of directions. The data will then be normalised using a min-max approach
before imputation, followed by the inverse operation to return the data to the original scales.
Another approach which gives more control is to simply run Normalise()
first, and work with the
normalised data from that point onwards. In that case it is better to set normalise_first = FALSE
,
since by default if impute_by = "row"
it will be set to TRUE
.
Checks are made on the format of the data returned by imputation functions, to ensure the
type and that non-NA
values have not been inadvertently altered. This latter check is allowed
a degree of tolerance for numerical precision, controlled by the sfigs
argument. This is because
if the data frame is normalised, and/or depending on the imputation function, there may be a very
small differences. By default sfigs = 9
, meaning that the non-NA
values pre and post-imputation
are compared to 9 significant figures.
An imputed data frame
# a df of random numbers X <- as.data.frame(matrix(runif(50), 10, 5)) # introduce NAs (2 in 3 of 5 cols) X[sample(1:10, 2), 1] <- NA X[sample(1:10, 2), 3] <- NA X[sample(1:10, 2), 5] <- NA # impute using column mean Impute(X, f_i = "i_mean") # impute using row median (no normalisation) Impute(X, f_i = "i_median", impute_by = "row", normalise_first = FALSE)
# a df of random numbers X <- as.data.frame(matrix(runif(50), 10, 5)) # introduce NAs (2 in 3 of 5 cols) X[sample(1:10, 2), 1] <- NA X[sample(1:10, 2), 3] <- NA X[sample(1:10, 2), 5] <- NA # impute using column mean Impute(X, f_i = "i_mean") # impute using row median (no normalisation) Impute(X, f_i = "i_median", impute_by = "row", normalise_first = FALSE)
Imputes missing values in a numeric vector using a function f_i
. This function should return a vector identical
to x
except for NA
values, which can be replaced. The function f_i
is not required to replace all NA
values.
## S3 method for class 'numeric' Impute(x, f_i = NULL, f_i_para = NULL, ...)
## S3 method for class 'numeric' Impute(x, f_i = NULL, f_i_para = NULL, ...)
x |
A numeric vector, possibly with |
f_i |
A function that imputes missing values in a numeric vector. See description and details. |
f_i_para |
Optional further arguments to be passed to |
... |
arguments passed to or from other methods. |
This calls the function f_i()
, with optionally further arguments f_i_para
, to impute any missing
values found in x
. By default, f_i = "i_mean()"
, which simply imputes NA
s with the mean of the
non-NA
values in x
.
COINr has several built-in imputation functions of the form i_*()
for vectors which can be called by Impute()
. See the
online documentation for more details.
You could also use one of the imputation functions directly (such as i_mean()
). However, this
function offers a few extra advantages, such as checking the input and output formats, and making
sure the resulting imputed vector agrees with the input. It will also skip imputation entirely if
there are no NA
s at all.
An imputed numeric vector of the same length of x
.
# a vector with a missing value x <- 1:10 x[3] <- NA x # impute using median # this calls COINr's i_median() function Impute(x, f_i = "i_median")
# a vector with a missing value x <- 1:10 x[3] <- NA x # impute using median # this calls COINr's i_median() function Impute(x, f_i = "i_median")
This function imputes the target data set dset
in each coin using the imputation function f_i
, and optionally by specifying
parameters via f_i_para
. This is performed in the same way as the coin method Impute.coin()
, i.e. each time point is imputed separately,
unless f_i = "impute_panel"
. See details for more information.
## S3 method for class 'purse' Impute( x, dset, f_i = NULL, f_i_para = NULL, impute_by = "column", group_level = NULL, use_group = NULL, normalise_first = NULL, write_to = NULL, warn_on_NAs = TRUE, ... )
## S3 method for class 'purse' Impute( x, dset, f_i = NULL, f_i_para = NULL, impute_by = "column", group_level = NULL, use_group = NULL, normalise_first = NULL, write_to = NULL, warn_on_NAs = TRUE, ... )
x |
A purse object |
dset |
The name of the data set to apply the function to, which should be accessible in |
f_i |
An imputation function. For the "purse" class, if |
f_i_para |
Further arguments to pass to |
impute_by |
Specifies how to impute: if |
group_level |
A level of the framework to use for grouping indicators. This is only
relevant if |
use_group |
Optional grouping variable name to pass to imputation function if this supports group imputation. |
normalise_first |
Logical: if |
write_to |
Optional character string for naming the resulting data set in each coin. Data will be written to
|
warn_on_NAs |
Logical: if |
... |
arguments passed to or from other methods. |
If f_i = "impute_panel"
this is treated as a special case, and the data sets inside the purse are imputed using the impute_panel()
function, which allows imputation of time series, using past/future values as information for imputation.
In this case, coins are not imputed individually, but treated as a single data set. To do this, set f_i = "impute_panel"
and pass further parameters to impute_panel()
using the f_i_para
argument. Note that as this is a special case,
the supported parameters of impute_panel()
to specify through Impute()
are "imp_type"
and "max_time"
(see Impute()
for details on these). No further arguments need to be passed to impute_panel()
. See vignette("imputation")
for more
details.
An updated purse with imputed data sets added to each coin.
# see vignette("imputation")
# see vignette("imputation")
Check if object is coin class
is.coin(x)
is.coin(x)
x |
An object to be checked. |
Logical
Check if object is purse class
is.purse(x)
is.purse(x)
x |
An object to be checked. |
Logical
Calculates kurtosis of the values of a numeric vector. This uses the same definition of kurtosis as
as the "kurtosis()" function in the e1071 package, where type == 2
, which is equivalent to the definition of kurtosis used in Excel.
kurt(x, na.rm = FALSE)
kurt(x, na.rm = FALSE)
x |
A numeric vector. |
na.rm |
Set |
A kurtosis value (scalar).
x <- runif(20) kurt(x)
x <- runif(20) kurt(x)
Performs a log transform on a numeric vector.
log_CT(x, na.rm = FALSE)
log_CT(x, na.rm = FALSE)
x |
A numeric vector. |
na.rm |
Set |
Specifically, this performs a modified "COIN Tool log" transform: log(x-min(x) + a)
, where
a <- 0.01*(max(x)-min(x))
.
A log-transformed vector of data, and treatment details wrapped in a list.
x <- runif(20) log_CT(x)
x <- runif(20) log_CT(x)
Performs a log transform on a numeric vector.
log_CT_orig(x, na.rm = FALSE)
log_CT_orig(x, na.rm = FALSE)
x |
A numeric vector. |
na.rm |
Set |
Specifically, this performs a "COIN Tool log" transform: log(x-min(x) + 1)
.
A log-transformed vector of data, and treatment details wrapped in a list.
x <- runif(20) log_CT_orig(x)
x <- runif(20) log_CT_orig(x)
Performs a log transform on a numeric vector, but with consideration for the direction of the skew. The aim here is to reduce the absolute value of skew, regardless of its direction.
log_CT_plus(x, na.rm = FALSE)
log_CT_plus(x, na.rm = FALSE)
x |
A numeric vector |
na.rm |
Set |
Specifically:
If the skew of x
is positive, this performs a modified "COIN Tool log" transform: log(x-min(x) + a)
, where
a <- 0.01*(max(x)-min(x))
.
If the skew of x
is negative, it performs an equivalent transformation -log(xmax + a - x)
.
A log-transformed vector of data, and treatment details wrapped in a list.
x <- runif(20) log_CT(x)
x <- runif(20) log_CT(x)
Performs a log transform on a numeric vector. This function is currently not recommended - see comments below.
log_GII(x, na.rm = FALSE)
log_GII(x, na.rm = FALSE)
x |
A numeric vector. |
na.rm |
Set |
Specifically, this performs a "GII log" transform, which is what was encoded in the GII2020 spreadsheet.
Note that this transformation is currently NOT recommended because it seems quite volatile and can flip the direction of the indicator. If the maximum value of the indicator is less than one, this reverses the direction.
A log-transformed vector of data.
x <- runif(20) log_GII(x)
x <- runif(20) log_GII(x)
Calculates Borda scores as rank(x) - 1
.
n_borda(x, ties.method = "min")
n_borda(x, ties.method = "min")
x |
A numeric vector |
ties.method |
This argument is passed to |
Numeric vector
x <- runif(20) n_borda(x)
x <- runif(20) n_borda(x)
A measure of the distance to the maximum value, where the maximum value is the highest-scoring value. The formula used is:
n_dist2max(x)
n_dist2max(x)
x |
A numeric vector |
This means that the closer a value is to the maximum, the higher its score will be. Scores will be in the range of 0 to 1.
Numeric vector
x <- runif(20) n_dist2max(x)
x <- runif(20) n_dist2max(x)
A measure of the distance to a specific value found in x
, specified by iref
. The formula is:
n_dist2ref(x, iref, cap_max = FALSE)
n_dist2ref(x, iref, cap_max = FALSE)
x |
A numeric vector |
iref |
An integer which indexes |
cap_max |
If |
Values exceeding x_ref
can be optionally capped at 1 if cap_max = TRUE
.
Numeric vector
x <- runif(20) n_dist2ref(x, 5)
x <- runif(20) n_dist2ref(x, 5)
A measure of the distance of each value of x
to a specified target which can be a high or low target depending on direction
. See details below.
n_dist2targ(x, targ, direction = 1, cap_max = FALSE)
n_dist2targ(x, targ, direction = 1, cap_max = FALSE)
x |
A numeric vector |
targ |
An target value |
direction |
Either 1 (default) or -1. In the former case, the indicator is assumed to be "positive" so that the target is at the higher end of the range. In the latter, the indicator is "negative" so that the target is typically at the low end of the range. |
cap_max |
If |
If direction = 1
, the formula is:
else if direction = -1
:
Values surpassing x_targ
in either case can be optionally capped at 1 if cap_max = TRUE
.
This function also supports parameter specification in iMeta
for the Normalise.coin()
method.
To do this, add columns Target
, and dist2targ_cap_max
to the iMeta
table, which correspond
to the targ
and cap_max
parameters respectively. Then set f_n_para = "use_iMeta"
within the
global_specs
list. See also examples in the normalisation vignette.
Numeric vector
x <- runif(20) n_dist2targ(x, 0.8, cap_max = TRUE)
x <- runif(20) n_dist2targ(x, 0.8, cap_max = TRUE)
The ratio of each value of x
to max(x)
.
n_fracmax(x)
n_fracmax(x)
x |
A numeric vector |
Numeric vector
x <- runif(20) n_fracmax(x)
x <- runif(20) n_fracmax(x)
The fraction of the distance of each value of x
from the lower "goalpost" to the upper one. Goalposts are specified by
gposts = c(l, u, a)
, where l
is the lower bound, u
is the upper bound, and a
is a scaling parameter.
n_goalposts(x, gposts, direction = 1, trunc2posts = TRUE)
n_goalposts(x, gposts, direction = 1, trunc2posts = TRUE)
x |
A numeric vector |
gposts |
A numeric vector |
direction |
Either 1 or -1. Set to -1 to flip goalposts. |
trunc2posts |
If |
Specify direction = -1
to "flip" the goalposts. In this case, the fraction from the upper to the lower goalpost is
measured.
The goalposts equations are:
and for a negative directionality indicator:
This function also supports parameter specification in iMeta
for the Normalise.coin()
method.
To do this, add columns:
goalpost_lower
: the lower goalpost
goalpost_upper
: the upper goalpost
goalpost_scale
: the scaling parameter
goalpost_trunc2posts
: corresponds to the trunc2posts
argument
to the iMeta
table. Then set f_n_para = "use_iMeta"
within the
global_specs
list. See also examples in the normalisation vignette.
Numeric vector
# positive direction n_goalposts(1, gposts = c(0, 10, 1)) # negative direction n_goalposts(1, gposts = c(0, 10, 1), direction = -1)
# positive direction n_goalposts(1, gposts = c(0, 10, 1)) # negative direction n_goalposts(1, gposts = c(0, 10, 1), direction = -1)
Scales a vector using min-max method.
n_minmax(x, l_u = c(0, 100))
n_minmax(x, l_u = c(0, 100))
x |
A numeric vector |
l_u |
A vector |
This function also supports parameter specification in iMeta
for the Normalise.coin()
method.
To do this, add columns minmax_lower
, and minmax_upper
to the iMeta
table, which specify the
lower and upper bounds to scale each indicator to. Then set f_n_para = "use_iMeta"
within the
global_specs
list. See also examples in the normalisation vignette.
Normalised vector
x <- runif(20) n_minmax(x)
x <- runif(20) n_minmax(x)
Calculates percentile ranks of a numeric vector using "sport" ranking. Ranks are calculated by base::rank()
and converted to percentile ranks. The ties.method
can be changed - this is directly passed to
base::rank()
.
n_prank(x, ties.method = "min")
n_prank(x, ties.method = "min")
x |
A numeric vector |
ties.method |
This argument is passed to |
Numeric vector
x <- runif(20) n_prank(x)
x <- runif(20) n_prank(x)
This is simply a wrapper for base::rank()
. Higher scores will give higher ranks.
n_rank(x, ties.method = "min")
n_rank(x, ties.method = "min")
x |
A numeric vector |
ties.method |
This argument is passed to |
Numeric vector
x <- runif(20) n_rank(x)
x <- runif(20) n_rank(x)
Scales a vector for normalisation using the method applied in the GII2020 for some indicators. This
does x_scaled <- (x-l)/(u-l) * scale_factor
. Note this is not the minmax transformation (see n_minmax()
).
This is a linear transformation with shift u
and scaling factor u-l
.
n_scaled(x, npara = c(0, 100), scale_factor = 100)
n_scaled(x, npara = c(0, 100), scale_factor = 100)
x |
A numeric vector |
npara |
Parameters as a vector |
scale_factor |
Optional scaling factor to apply to the result. Default 100. |
This function also supports parameter specification in iMeta
for the Normalise.coin()
method.
To do this, add columns scaled_lower
, scaled_upper
and scale_factor
to the iMeta
table, which specify the
first and second elements of npara
, respectively. Then set f_n_para = "use_iMeta"
within the
global_specs
list. See also examples in the normalisation vignette.
Scaled vector
x <- runif(20) n_scaled(x, npara = c(1,10))
x <- runif(20) n_scaled(x, npara = c(1,10))
Standardises a vector x
by scaling it to have a mean and standard deviation specified by m_sd
.
n_zscore(x, m_sd = c(0, 1))
n_zscore(x, m_sd = c(0, 1))
x |
A numeric vector |
m_sd |
A vector |
This function also supports parameter specification in iMeta
for the Normalise.coin()
method.
To do this, add columns zscore_mean
, and zscore_sd
to the iMeta
table, which specify the
mean and standard deviation to scale each indicator to, respectively. Then set f_n_para = "use_iMeta"
within the
global_specs
list. See also examples in the normalisation vignette.
Numeric vector
x <- runif(20) n_zscore(x)
x <- runif(20) n_zscore(x)
Given a character vector of long names (probably with spaces), generates short codes. Intended for use when importing from the COIN Tool.
names_to_codes(cvec, maxword = 2, maxlet = 4)
names_to_codes(cvec, maxword = 2, maxlet = 4)
cvec |
A character vector of names |
maxword |
The maximum number of words to use in building a short name (default 2) |
maxlet |
The number of letters to take from each word (default 4) |
This function replaces the now-defunct names2Codes()
from COINr < v1.0.
A corresponding character vector, but with short codes, and no duplicates.
import_coin_tool()
Import data from the COIN Tool (Excel).
# get names from example data iNames <- ASEM_iMeta$iName # convert to codes names_to_codes(iNames)
# get names from example data iNames <- ASEM_iMeta$iName # convert to codes names_to_codes(iNames)
Creates a new "coin" class object, or a "purse" class object (time-indexed collection of coins). A purse class object is created if panel data is supplied. Coins and purses are the main object classes used in COINr, although a number of functions also support other classes such as data frames and vectors.
new_coin( iData, iMeta, exclude = NULL, split_to = NULL, level_names = NULL, retain_all_uCodes_on_split = FALSE, quietly = FALSE )
new_coin( iData, iMeta, exclude = NULL, split_to = NULL, level_names = NULL, retain_all_uCodes_on_split = FALSE, quietly = FALSE )
iData |
The indicator data and metadata of each unit |
iMeta |
Indicator metadata |
exclude |
Optional character vector of any indicator codes ( |
split_to |
This is used to split panel data into multiple coins, a so-called "purse". Should be either
|
level_names |
Optional character vector of names of levels. Must have length equal to the number of
levels in the hierarchy ( |
retain_all_uCodes_on_split |
Logical: if panel data is input and split to a purse using |
quietly |
If |
A coin object is fundamentally created by passing two data frames to new_coin()
:
iData
which specifies the data points for each unit and indicator, as well as other optional
variables; and iMeta
which specifies details about each indicator/variable found in iData
,
including its type, name, position in the index, units, and other properties.
These data frames need to follow fairly strict requirements regarding their format and consistency.
Run check_iData()
and check_iMeta()
to validate your data frames, and these should generate helpful
error messages when things go wrong.
It is worth reading a little about coins and purses to use COINr. See vignette("coins")
for more details.
iData
iData
should be a data frame with required column
uCode
which gives the code assigned to each unit (alphanumeric, not starting with a number). All other
columns are defined by corresponding entries in iMeta
, with the following special exceptions:
Time
is an optional column which allows panel data to be input, consisting of e.g. multiple rows for
each uCode
: one for each Time
value. This can be used to split a set of panel data into multiple coins
(a so-called "purse") which can be input to COINr functions.
uName
is an optional column which specifies a longer name for each unit. If this column is not included,
unit codes (uCode
) will be used as unit names where required.
iMeta
Required columns for iMeta
are:
Level
: Level in aggregation, where 1 is indicator level, 2 is the level resulting from aggregating
indicators, 3 is the result of aggregating level 2, and so on. Set to NA
for entries that are not included
in the index (groups, denominators, etc).
iCode
: Indicator code, alphanumeric. Must not start with a number.
Parent
: Group (iCode
) to which indicator/aggregate belongs in level immediately above.
Each entry here should also be found in iCode
. Set to NA
only
for the highest (Index) level (no parent), or for entries that are not included
in the index (groups, denominators, etc).
Direction
: Numeric, either -1 or 1
Weight
: Numeric weight, will be rescaled to sum to 1 within aggregation group. Set to NA
for entries that are not included
in the index (groups, denominators, etc).
Type
: The type, corresponding to iCode
. Can be either Indicator
, Aggregate
, Group
, Denominator
,
or Other
.
Optional columns that are recognised in certain functions are:
iName
: Name of the indicator: a longer name which is used in some plotting functions.
Unit
: the unit of the indicator, e.g. USD, thousands, score, etc. Used in some plots if available.
Target
: a target for the indicator. Used if normalisation type is distance-to-target.
The iMeta
data frame essentially gives details about each of the columns found in iData
, as well as
details about additional data columns eventually created by aggregating indicators. This means that the
entries in iMeta
must include all columns in iData
, except the three special column names: uCode
,
uName
, and Time
. In other words, all column names of iData
should appear in iMeta$iCode
, except
the three special cases mentioned. The iName
column optionally can be used to give longer names to each indicator
which can be used for display in plots.
iMeta
also specifies the structure of the index, by specifying the parent of each indicator and aggregate.
The Parent
column must refer to entries that can be found in iCode
. Try View(ASEM_iMeta)
for an example
of how this works.
Level
is the "vertical" level in the hierarchy, where 1 is the bottom level (indicators), and each successive
level is created by aggregating the level below according to its specified groups.
Direction
is set to 1 if higher values of the indicator should result in higher values of the index, and
-1 in the opposite case.
The Type
column specifies the type of the entry: Indicator
should be used for indicators at level 1.
Aggregate
for aggregates created by aggregating indicators or other aggregates. Otherwise set to Group
if the variable is not used for building the index but instead is for defining groups of units. Set to
Denominator
if the variable is to be used for scaling (denominating) other indicators. Finally, set to
Other
if the variable should be ignored but passed through. Any other entries here will cause an error.
Note: this function requires the columns above as specified, but extra columns can also be added without causing errors.
The exclude
argument can be used to exclude specified indicators. If this is specified, .$Data$Raw
will be built excluding these indicators, as will all subsequent build operations. However the full data set
will still be stored in .$Log$new_coin
. The codes here should correspond to entries in the iMeta$iCode
.
This option is useful e.g. in generating alternative coins with different indicator sets, and can be included
as a variable in a sensitivity analysis.
The split_to
argument allows panel data to be used. Panel data must have a Time
column in iData
, which
consists of some numerical time variable, such as a year. Panel data has multiple observations for each uCode
,
one for each unique entry in Time
. The Time
column is required to be numerical, because it needs to be
possible to order it. To split panel data, specify split_to = "all"
to split to a single coin for each
of the unique entries in Time
. Alternatively, you can pass a vector of entries in Time
which allows
to split to a subset of the entries to Time
.
Splitting panel data results in a so-called "purse" class, which is a data frame of COINs, indexed by Time
.
See vignette("coins")
for more details.
This function replaces the now-defunct assemble()
from COINr < v1.0.
A "coin" object or a "purse" object.
# build a coin using example data frames ASEM_coin <- new_coin(iData = ASEM_iData, iMeta = ASEM_iMeta, level_names = c("Indicator", "Pillar", "Sub-index", "Index")) # view coin contents ASEM_coin # build example purse class ASEM_purse <- new_coin(iData = ASEM_iData_p, iMeta = ASEM_iMeta, split_to = "all", quietly = TRUE) # view purse contents ASEM_purse # see vignette("coins") for further info
# build a coin using example data frames ASEM_coin <- new_coin(iData = ASEM_iData, iMeta = ASEM_iMeta, level_names = c("Indicator", "Pillar", "Sub-index", "Index")) # view coin contents ASEM_coin # build example purse class ASEM_purse <- new_coin(iData = ASEM_iData_p, iMeta = ASEM_iMeta, split_to = "all", quietly = TRUE) # view purse contents ASEM_purse # see vignette("coins") for further info
This is a generic function for normalising variables and indicators, i.e. bringing them onto a common scale. Please see individual method documentation depending on your data class:
Normalise(x, ...)
Normalise(x, ...)
x |
Object to be normalised |
... |
Further arguments to be passed to methods. |
See also vignette("normalise")
for more details.
This function replaces the now-defunct normalise()
from COINr < v1.0.
# See individual method documentation.
# See individual method documentation.
Creates a normalised data set using specifications specified in global_specs
. Columns of dset
can also optionally be
normalised with individual specifications using the indiv_specs
argument. If indicators should have their
directions reversed, this can be specified using the directions
argument. Non-numeric columns are ignored
automatically by this function. By default, this function normalises each indicator using the "min-max" method, scaling indicators to lie between
0 and 100. This calls the n_minmax()
function. COINr has a number of built-in normalisation functions of the form n_*()
. See online documentation
for details.
## S3 method for class 'coin' Normalise( x, dset, global_specs = NULL, indiv_specs = NULL, directions = NULL, out2 = "coin", write_to = NULL, write2log = TRUE, ... )
## S3 method for class 'coin' Normalise( x, dset, global_specs = NULL, indiv_specs = NULL, directions = NULL, out2 = "coin", write_to = NULL, write2log = TRUE, ... )
x |
A coin |
dset |
A named data set found in |
global_specs |
Specifications to apply to all columns, apart from those specified by |
indiv_specs |
Specifications applied to specific columns, overriding those specified in |
directions |
An optional data frame containing the following columns:
|
out2 |
Either |
write_to |
Optional character string for naming the data set in the coin. Data will be written to
|
write2log |
Logical: if |
... |
arguments passed to or from other methods. |
The global_specs
argument is a list which specifies the normalisation function and any function parameters
that should be used to normalise the indicators found in the data set. Unless indiv_specs
is specified, this will be applied
to all indicators. The list should have two entries:
.$f_n
: the name of the function to use to normalise each indicator
.$f_n_para
: any further parameters to pass to f_n
, apart from the numeric vector (each column of the data set)
In this list, f_n
should be a character string which is the name of a normalisation
function. For example, f_n = "n_minmax"
calls the n_minmax()
function. f_n_para
is a list of any
further arguments to f_n
. This means that any function can be passed to Normalise()
, as long as its
first argument is x
, a numeric vector, and it returns a numeric vector of the same length. See n_minmax()
for an example.
f_n_para
is required to be a named list. So e.g. if we define a function f1(x, arg1, arg2)
then we should
specify f_n = "f1"
, and f_n_para = list(arg1 = val1, arg2 = val2)
, where val1
and val2
are the
values assigned to the arguments arg1
and arg2
respectively.
The default list for global_specs
is: list(f_n = "n_minmax", f_n_para = list(l_u = c(0,100)))
, i.e.
min-max normalisation between 0 and 100.
Note, all COINr normalisation functions (passed to f_n
) are of the form n_*()
. Type n_
in the R Studio console and press the Tab key to see a list.
For some normalisation methods we may use the same function for all indicators but use different parameters - for example, using
distance to target normalisation or goalpost normalisation. COINr now supports specifying these parameters in the iMeta
table.
To enable this, set f_n_para = "use_iMeta"
within the global_specs
list.
For this to work you will also need to add the correct-named columns in the iMeta
table. To see which column names to add, check the
function documentation of the normalisation function you wish to use (e.g. n_goalposts()
). See also examples in the
normalisation vignette. These columns should be added before construction of
the coin.
To give full individual control, indicators can be normalised with different normalisation functions and parameters using the
indiv_specs
argument. This must be specified as a named list e.g. list(i1 = specs1, i2 = specs2)
where
i1
and i2
are iCode
s to apply individual normalisation to, and specs1
and specs2
are
respectively lists of the same format as global_specs
(see above). In other words, indiv_specs
is a big
list wrapping together global_specs
-style lists. Any iCode
s not named in indiv_specs
(
i.e. those not in names(indiv_specs)
) are normalised using the specifications from global_specs
. So
indiv_specs
lists the exceptions to global_specs
.
See also vignette("normalise")
for more details.
An updated coin
# build example coin coin <- build_example_coin(up_to = "new_coin") # normalise the raw data set coin <- Normalise(coin, dset = "Raw")
# build example coin coin <- build_example_coin(up_to = "new_coin") # normalise the raw data set coin <- Normalise(coin, dset = "Raw")
Normalises a data frame using specifications specified in global_specs
. Columns can also optionally be
normalised with individual specifications using the indiv_specs
argument. If variables should have their
directions reversed, this can be specified using the directions
argument. Non-numeric columns are ignored
automatically by this function. By default, this function normalises each indicator using the "min-max" method, scaling indicators to lie between
0 and 100. This calls the n_minmax()
function. COINr has a number of built-in normalisation functions of the form n_*()
. See online documentation
for details.
## S3 method for class 'data.frame' Normalise(x, global_specs = NULL, indiv_specs = NULL, directions = NULL, ...)
## S3 method for class 'data.frame' Normalise(x, global_specs = NULL, indiv_specs = NULL, directions = NULL, ...)
x |
A data frame |
global_specs |
Specifications to apply to all columns, apart from those specified by |
indiv_specs |
Specifications applied to specific columns, overriding those specified in |
directions |
An optional data frame containing the following columns:
|
... |
arguments passed to or from other methods. |
The global_specs
argument is a list which specifies the normalisation function and any function parameters
that should be used to normalise the columns of x
. Unless indiv_specs
is specified, this will be applied
to all numeric columns of x
. The list should have two entries:
.$f_n
: the name of the function to use to normalise each column
.$f_n_para
: any further parameters to pass to f_n
, apart from the numeric vector (each column of x
)
In this list, f_n
should be a character string which is the name of a normalisation
function. For example, f_n = "n_minmax"
calls the n_minmax()
function. f_n_para
is a list of any
further arguments to f_n
. This means that any function can be passed to Normalise()
, as long as its
first argument is x
, a numeric vector, and it returns a numeric vector of the same length. See n_minmax()
for an example.
f_n_para
is required to be a named list. So e.g. if we define a function f1(x, arg1, arg2)
then we should
specify f_n = "f1"
, and f_n_para = list(arg1 = val1, arg2 = val2)
, where val1
and val2
are the
values assigned to the arguments arg1
and arg2
respectively.
The default list for global_specs
is: list(f_n = "n_minmax", f_n_para = list(l_u = c(0,100)))
.
Note, all COINr normalisation functions (passed to f_n
) are of the form n_*()
. Type n_
in the R Studio console and press the Tab key to see a list.
Optionally, columns of x
can be normalised with different normalisation functions and parameters using the
indiv_specs
argument. This must be specified as a named list e.g. list(i1 = specs1, i2 = specs2)
where
i1
and i2
are column names of x
to apply individual normalisation to, and specs1
and specs2
are
respectively lists of the same format as global_specs
(see above). In other words, indiv_specs
is a big
list wrapping together global_specs
-style lists. Any numeric columns of x
not named in indiv_specs
(
i.e. those not in names(indiv_specs)
) are normalised using the specifications from global_specs
. So
indiv_specs
lists the exceptions to global_specs
.
See also vignette("normalise")
for more details.
A normalised data frame
iris_norm <- Normalise(iris) head(iris_norm)
iris_norm <- Normalise(iris) head(iris_norm)
Normalise a numeric vector using a specified function f_n
, with possible reversal of direction
using direction
.
## S3 method for class 'numeric' Normalise(x, f_n = NULL, f_n_para = NULL, direction = 1, ...)
## S3 method for class 'numeric' Normalise(x, f_n = NULL, f_n_para = NULL, direction = 1, ...)
x |
Object to be normalised |
f_n |
The normalisation method, specified as string which refers to a function of the form |
f_n_para |
Supporting list of arguments for |
direction |
If |
... |
arguments passed to or from other methods. |
Normalisation is specified using the f_n
and f_n_para
arguments. In these, f_n
should be a character
string which is the name of a normalisation
function. For example, f_n = "n_minmax"
calls the n_minmax()
function. f_n_para
is a list of any
further arguments to f_n
. This means that any function can be passed to Normalise()
, as long as its
first argument is x
, a numeric vector, and it returns a numeric vector of the same length. See n_minmax()
for an example.
COINr has a number of built-in normalisation functions of the form n_*()
. See online documentation
for details.
f_n_para
is required to be a named list. So e.g. if we define a function f1(x, arg1, arg2)
then we should
specify f_n = "f1"
, and f_n_para = list(arg1 = val1, arg2 = val2)
, where val1
and val2
are the
values assigned to the arguments arg1
and arg2
respectively.
See also vignette("normalise")
for more details.
A normalised numeric vector
# example vector x <- runif(10) # normalise using distance to reference (5th data point) x_norm <- Normalise(x, f_n = "n_dist2ref", f_n_para = list(iref = 5)) # view side by side data.frame(x, x_norm)
# example vector x <- runif(10) # normalise using distance to reference (5th data point) x_norm <- Normalise(x, f_n = "n_dist2ref", f_n_para = list(iref = 5)) # view side by side data.frame(x, x_norm)
This creates normalised data sets for each coin in the purse. In most respects, this works in a similar way
to normalising on a coin, for which reason please see Normalise.coin()
for most documentation. There is however
a special case in terms of operating on a purse of coins. This is because, when
dealing with time series data, it is often desirable to normalise over the whole panel data set at once
rather than independently for each time point. This makes the resulting index and aggregates comparable
over time. Here, the global
argument controls whether to normalise each coin independently or to normalise
across all data at once. In other respects, this function behaves the same as Normalise.coin()
.
## S3 method for class 'purse' Normalise( x, dset, global_specs = NULL, indiv_specs = NULL, directions = NULL, global = TRUE, write_to = NULL, ... )
## S3 method for class 'purse' Normalise( x, dset, global_specs = NULL, indiv_specs = NULL, directions = NULL, global = TRUE, write_to = NULL, ... )
x |
A purse object |
dset |
The data set to normalise in each coin |
global_specs |
Default specifications |
indiv_specs |
Individual specifications |
directions |
An optional data frame containing the following columns:
|
global |
Logical: if |
write_to |
Optional character string for naming the data set in each coin. Data will be written to
|
... |
arguments passed to or from other methods. |
The same specifications are passed to each coin in the purse. This means that each coin is normalised using the same set of specifications and directions. If you need control over individual coins, you will have to normalise coins individually.
An updated purse with new normalised data sets added at .$Data$Normalised
in each coin
# build example purse purse <- build_example_purse(up_to = "new_coin", quietly = TRUE) # normalise raw data set purse <- Normalise(purse, dset = "Raw", global = TRUE)
# build example purse purse <- build_example_purse(up_to = "new_coin", quietly = TRUE) # normalise raw data set purse <- Normalise(purse, dset = "Raw", global = TRUE)
Constructs an outranking matrix based on a data frame of indicator data and corresponding weights.
outrankMatrix(X, w = NULL)
outrankMatrix(X, w = NULL)
X |
A data frame or matrix of indicator data, with observations as rows and indicators as columns. No other columns should be present (e.g. label columns). |
w |
A vector of weights, which should have length equal to |
A list with:
.$OutRankMatrix
the outranking matrix with nrow(X)
rows and columns (matrix class).
.$nDominant
the number of dominance/robust pairs
.$fracDominant
the percentage of dominance/robust pairs
# get a sample of a few indicators ind_data <- COINr::ASEM_iData[12:16] # calculate outranking matrix outlist <- outrankMatrix(ind_data) # see fraction of dominant pairs (robustness) outlist$fracDominant
# get a sample of a few indicators ind_data <- COINr::ASEM_iData[12:16] # calculate outranking matrix outlist <- outrankMatrix(ind_data) # see fraction of dominant pairs (robustness) outlist$fracDominant
Plot bar charts of single indicators. Bar charts can be coloured by an optional grouping variable by_group
, or if
iCode
points to an aggregate, setting stack_children = TRUE
will plot iCode
coloured by its underlying scores.
plot_bar( coin, dset, iCode, ..., uLabel = "uCode", axes_label = "iCode", by_group = NULL, filter_to_ends = NULL, dset_label = FALSE, log_scale = FALSE, stack_children = FALSE, bar_colours = NULL, flip_coords = FALSE )
plot_bar( coin, dset, iCode, ..., uLabel = "uCode", axes_label = "iCode", by_group = NULL, filter_to_ends = NULL, dset_label = FALSE, log_scale = FALSE, stack_children = FALSE, bar_colours = NULL, flip_coords = FALSE )
coin |
A coin object. |
dset |
Data set from which to extract the variable to plot. Passed to |
iCode |
Code of variable or indicator to plot. Passed to |
... |
Further arguments to pass to |
uLabel |
How to label units: either |
axes_label |
How to label the y axis and group legend: either |
by_group |
Optional group variable to use to colour bars. Cannot be used if |
filter_to_ends |
Optional way to filter the bar chart to only display the top/bottom N units. This is useful in cases
where the number of units is large. Specify as e.g. |
dset_label |
Logical: whether to include the data set in the y axis label. |
log_scale |
Logical: if |
stack_children |
Logical: if |
bar_colours |
Optional vector of colour codes for colouring bars. |
flip_coords |
Logical; if |
This function uses ggplot2 to generate plots, so the plot can be further manipulated using ggplot2 commands.
See vignette("visualisation
) for more details on plotting.
A ggplot2 plot object.
# build example coin coin <- build_example_coin(up_to = "new_coin", quietly = TRUE) # bar plot of CO2 by GDP per capita group plot_bar(coin, dset = "Raw", iCode = "CO2", by_group = "GDPpc_group", axes_label = "iName")
# build example coin coin <- build_example_coin(up_to = "new_coin", quietly = TRUE) # bar plot of CO2 by GDP per capita group plot_bar(coin, dset = "Raw", iCode = "CO2", by_group = "GDPpc_group", axes_label = "iName")
Generates heatmaps of correlation matrices using ggplot2, which can be tailored according to the grouping and structure
of the index. This enables correlating any set of indicators against any other,
and supports calling named aggregation groups of indicators. The withparent
argument generates tables of correlations only with
parents of each indicator. Also supports discrete colour maps using flagcolours
, different types of correlation, and groups
plots by higher aggregation levels.
plot_corr( coin, dset, iCodes = NULL, Levels = 1, ..., cortype = "pearson", withparent = FALSE, grouplev = NULL, box_level = NULL, showvals = TRUE, flagcolours = FALSE, flagthresh = NULL, pval = 0.05, insig_colour = "#F0F0F0", text_colour = NULL, discrete_colours = NULL, box_colour = NULL, order_as = NULL, use_directions = FALSE )
plot_corr( coin, dset, iCodes = NULL, Levels = 1, ..., cortype = "pearson", withparent = FALSE, grouplev = NULL, box_level = NULL, showvals = TRUE, flagcolours = FALSE, flagthresh = NULL, pval = 0.05, insig_colour = "#F0F0F0", text_colour = NULL, discrete_colours = NULL, box_colour = NULL, order_as = NULL, use_directions = FALSE )
coin |
The coin object |
dset |
The target data set. |
iCodes |
An optional list of character vectors where the first entry specifies the indicator/aggregate codes to correlate against the second entry (also a specification of indicator/aggregate codes) |
Levels |
The aggregation levels to take the two groups of indicators from. See |
... |
Optional further arguments to pass to |
cortype |
The type of correlation to calculate, either |
withparent |
If |
grouplev |
The aggregation level to group correlations by if |
box_level |
The aggregation level to draw boxes around if |
showvals |
If |
flagcolours |
If |
flagthresh |
A 3-length vector of thresholds for highlighting correlations, if |
pval |
The significance level for plotting correlations. Correlations with |
insig_colour |
The colour to plot insignificant correlations. Defaults to a light grey. |
text_colour |
The colour of the correlation value text (default white). |
discrete_colours |
An optional 4-length character vector of colour codes or names to define the discrete
colour map if |
box_colour |
The line colour of grouping boxes, default black. |
order_as |
Optional list for ordering the plotting of variables. If specified, this must be a list of length 2, where each entry of the list is
a character vector of the iCodes plotted on the x and y axes of the plot. The plot will then follow the order of these character vectors. Note this must
be used with care because the |
use_directions |
Logical: if |
This function calls get_corr()
.
Note that this function can only call correlations within the same data set (i.e. only one data set in .$Data
).
This function uses ggplot2 to generate plots, so the plot can be further manipulated using ggplot2 commands.
See vignette("visualisation")
for more details on plotting.
This function replaces the now-defunct plotCorr()
from COINr < v1.0.
A plot object generated with ggplot2, which can be edited further with ggplot2 commands.
# build example coin coin <- build_example_coin(up_to = "Normalise", quietly = TRUE) # plot correlations between indicators in Sust group, using Normalised dset plot_corr(coin, dset = "Normalised", iCodes = list("Sust"), grouplev = 2, flagcolours = TRUE)
# build example coin coin <- build_example_coin(up_to = "Normalise", quietly = TRUE) # plot correlations between indicators in Sust group, using Normalised dset plot_corr(coin, dset = "Normalised", iCodes = list("Sust"), grouplev = 2, flagcolours = TRUE)
Plots indicator distributions using box plots, dot plots, violin plots, violin-dot plots, and histograms. Supports plotting multiple indicators by calling aggregation groups.
plot_dist( coin, dset, iCodes, ..., type = "Box", normalise = FALSE, global_specs = NULL )
plot_dist( coin, dset, iCodes, ..., type = "Box", normalise = FALSE, global_specs = NULL )
coin |
The coin object, or a data frame of indicator data |
dset |
The name of the data set to apply the function to, which should be accessible in |
iCodes |
Indicator code(s) to plot. See details. |
... |
Further arguments passed to |
type |
The type of plot. Currently supported |
normalise |
Logical: if |
global_specs |
Specifications for normalising data if |
This function uses ggplot2 to generate plots, so the plot can be further manipulated using ggplot2 commands.
See vignette("visualisation
) for more details on plotting.
This function replaces the now-defunct plotIndDist()
from COINr < v1.0.
A ggplot2 plot object.
# build example coin coin <- build_example_coin(up_to = "new_coin") # plot all indicators in P2P group plot_dist(coin, dset = "Raw", iCodes = "P2P", Level = 1, type = "Violindot")
# build example coin coin <- build_example_coin(up_to = "new_coin") # plot all indicators in P2P group plot_dist(coin, dset = "Raw", iCodes = "P2P", Level = 1, type = "Violindot")
Plots a single indicator as a line of dots, and optionally highlights selected units and statistics.
This is intended for showing the relative position of units to other units, rather than as a statistical
plot. For the latter, use plot_dist()
.
plot_dot( coin, dset, iCode, Level = NULL, ..., usel = NULL, marker_type = "circle", add_stat = NULL, stat_label = NULL, show_ticks = TRUE, plabel = NULL, usel_label = TRUE, vert_adjust = 0.5 )
plot_dot( coin, dset, iCode, Level = NULL, ..., usel = NULL, marker_type = "circle", add_stat = NULL, stat_label = NULL, show_ticks = TRUE, plabel = NULL, usel_label = TRUE, vert_adjust = 0.5 )
coin |
The coin |
dset |
The name of the data set to apply the function to, which should be accessible in |
iCode |
Code of indicator or aggregate found in |
Level |
The level in the hierarchy to extract data from. See |
... |
Further arguments to pass to |
usel |
A subset of units to highlight. |
marker_type |
The type of marker, either |
add_stat |
A statistic to overlay, either |
stat_label |
An optional string to use as label at the point specified by |
show_ticks |
Set |
plabel |
Controls the labelling of the indicator. If |
usel_label |
If |
vert_adjust |
Adjusts the vertical height of text labels and stat lines, which matters depending on plot size. Takes a value between 0 to 2 (higher will probably remove the label from the axis space). |
This function uses ggplot2 to generate plots, so the plot can be further manipulated using ggplot2 commands.
See vignette("visualisation
) for more details on plotting.
This function replaces the now-defunct plotIndDot()
from COINr < v1.0.
A ggplot2 plot object.
# build example coin coin <- build_example_coin(up_to = "new_coin") # dot plot of LPI, highlighting two countries and with median shown plot_dot(coin, dset = "Raw", iCode = "LPI", usel = c("JPN", "ESP"), add_stat = "median", stat_label = "Median", plabel = "iName+unit")
# build example coin coin <- build_example_coin(up_to = "new_coin") # dot plot of LPI, highlighting two countries and with median shown plot_dot(coin, dset = "Raw", iCode = "LPI", usel = c("JPN", "ESP"), add_stat = "median", stat_label = "Median", plabel = "iName+unit")
Plots the hierarchical indicator framework. If type = "sunburst"
(default), the framework is plotted as a
sunburst plot. If type = "stack"
it is plotted as a linear stack. In both cases, the size of each component
is reflected by its weight and the weight of its parent, i.e. its "effective weight" in the framework.
plot_framework( coin, type = "sunburst", colour_level = NULL, text_colour = NULL, text_size = NULL, transparency = TRUE, text_label = "iCode" )
plot_framework( coin, type = "sunburst", colour_level = NULL, text_colour = NULL, text_size = NULL, transparency = TRUE, text_label = "iCode" )
coin |
A coin class object |
type |
Either |
colour_level |
The framework level, as an integer, to colour from. See details. |
text_colour |
Colour of label text - default |
text_size |
Text size of labels, default 2.5 |
transparency |
If |
text_label |
Text labelling of segments: either |
The colouring of the plot is defined to some extent by the colour_level
argument. This should be specified
as an integer between 1 and the highest level in the framework (i.e. the maximum of the iMeta$Level
column).
Levels higher than and including colour_level
are coloured with individual colours from the standard colour
palette. Any levels below colour_level
are coloured with the same colours as their parents, to emphasise
that they belong to the same group, and also to avoid repeating the colour palette. Levels below colour_level
can be additionally differentiated by setting transparency = TRUE
which will apply increasing transparency
to lower levels.
This function returns a ggplot2 class object. If you want more control over the appearance of the plot, pass
return the output of this function to a variable, and manipulate this further with ggplot2 commands to e.g.
change colour palette, individual colours, add titles, etc.
See vignette("visualisation
) for more details on plotting.
This function replaces the now-defunct plotframework()
from COINr < v1.0.
A ggplot2 plot object
# build example coin coin <- build_example_coin(up_to = "new_coin", quietly = TRUE) # plot framework as sunburst, colouring at level 2 upwards plot_framework(coin, colour_level = 2, transparency = TRUE)
# build example coin coin <- build_example_coin(up_to = "new_coin", quietly = TRUE) # plot framework as sunburst, colouring at level 2 upwards plot_framework(coin, colour_level = 2, transparency = TRUE)
This is a convenient quick scatter plot function for plotting any two variables x and y in a coin against each other.
At a minimum, you must specify the data set and iCode of both x and y using the dsets
and iCodes
arguments.
plot_scatter( coin, dsets, iCodes, ..., by_group = NULL, alpha = 0.5, axes_label = "iCode", dset_label = TRUE, point_label = NULL, check_overlap = TRUE, nudge_y = 5, log_scale = c(FALSE, FALSE) )
plot_scatter( coin, dsets, iCodes, ..., by_group = NULL, alpha = 0.5, axes_label = "iCode", dset_label = TRUE, point_label = NULL, check_overlap = TRUE, nudge_y = 5, log_scale = c(FALSE, FALSE) )
coin |
A coin object |
dsets |
A 2-length character vector specifying the data sets to extract v1 and v2 from,
respectively (passed as |
iCodes |
A 2-length character vector specifying the |
... |
Optional further arguments to be passed to |
by_group |
A string specifying an optional group variable. If specified, the plot will be coloured by this grouping variable. |
alpha |
Transparency value for points between 0 and 1, passed to ggplot2. |
axes_label |
A string specifying how to label axes and legend. Either |
dset_label |
Logical: if |
point_label |
Specifies whether and how to label points. If |
check_overlap |
Logical: if |
nudge_y |
Parameter passed to ggplot which controls the vertical adjustment of the text labels if present. |
log_scale |
A 2-length logical vector specifying whether to use log axes for x and y respectively: if |
Optionally, the scatter plot can be coloured by grouping variables specified in the coin (see by_group
). Points
and axes can be labelled using other arguments.
This function is powered by ggplot2 and outputs a ggplot2 object. To further customise the plot, assign the output
of this function to a variable and use ggplot2 commands to further edit. See vignette("visualisation
) for more details on plotting.
A ggplot2 object.
# build example coin coin <- build_example_coin(up_to = "new_coin") # scatter plot of Flights against Population # coloured by GDP per capita # log scale applied to population plot_scatter(coin, dsets = c("uMeta", "Raw"), iCodes = c("Population", "Flights"), by_group = "GDPpc_group", log_scale = c(TRUE, FALSE))
# build example coin coin <- build_example_coin(up_to = "new_coin") # scatter plot of Flights against Population # coloured by GDP per capita # log scale applied to population plot_scatter(coin, dsets = c("uMeta", "Raw"), iCodes = c("Population", "Flights"), by_group = "GDPpc_group", log_scale = c(TRUE, FALSE))
Plots sensitivity indices as bar or pie charts.
plot_sensitivity(SAresults, ptype = "bar")
plot_sensitivity(SAresults, ptype = "bar")
SAresults |
A list of sensitivity/uncertainty analysis results from |
ptype |
Type of plot to generate - either |
To use this function you first need to run get_sensitivity()
. Then enter the resulting list as the
SAresults
argument here.
See vignette("sensitivity")
.
This function replaces the now-defunct plotSA()
from COINr < v1.0.
A plot of sensitivity indices generated by ggplot2.
get_sensitivity()
Perform global sensitivity or uncertainty analysis on a COIN
plot_uncertainty()
Plot confidence intervals on ranks following a sensitivity analysis
# for examples, see `vignette("sensitivity")` # (this is because package examples are run automatically and sensitivity analysis # can take a few minutes to run at realistic settings)
# for examples, see `vignette("sensitivity")` # (this is because package examples are run automatically and sensitivity analysis # can take a few minutes to run at realistic settings)
Plots the ranks resulting from an uncertainty and sensitivity analysis, in particular plots the median, and 5th/95th percentiles of ranks.
plot_uncertainty( SAresults, plot_units = NULL, order_by = "nominal", dot_colour = NULL, line_colour = NULL )
plot_uncertainty( SAresults, plot_units = NULL, order_by = "nominal", dot_colour = NULL, line_colour = NULL )
SAresults |
A list of sensitivity/uncertainty analysis results from |
plot_units |
A character vector of units to plot. Defaults to all units. You can also set
to |
order_by |
If set to |
dot_colour |
Colour of dots representing median ranks. |
line_colour |
Colour of lines connecting 5th and 95th percentiles. |
To use this function you first need to run get_sensitivity()
. Then enter the resulting list as the
SAresults
argument here.
See vignette("sensitivity")
.
This function replaces the now-defunct plotSARanks()
from COINr < v1.0.
A plot of rank confidence intervals, generated by 'ggplot2'.
get_sensitivity()
Perform global sensitivity or uncertainty analysis on a coin
plot_sensitivity()
Plot sensitivity indices following a sensitivity analysis.
# for examples, see `vignette("sensitivity")` # (this is because package examples are run automatically and sensitivity analysis # can take a few minutes to run at realistic settings)
# for examples, see `vignette("sensitivity")` # (this is because package examples are run automatically and sensitivity analysis # can take a few minutes to run at realistic settings)
Calculates the percentage change in a time series from the initial value. The time series is defined by
y
the response variable, indexed by x
, the time variable. The per
argument can optionally be used
to scale the result according to a period of time. E.g. if the units of x
are years, setting x = 10
will measure the percentage change per decade.
prc_change(y, x, per = 1)
prc_change(y, x, per = 1)
y |
A numeric vector |
x |
A numeric vector of the same length as |
per |
Numeric value to scale the change according to a period of time. See description. |
This function operates in two ways, depending on the number of data points. If x
and y
have two non-NA
observations, percentage change is calculated using the first and last values. If three or more points are
available, a linear regression is used to estimate the average percentage change. If fewer than two points
are available, the percentage change cannot be estimated and NA
is returned.
If all y
values are equal, it will return a change of zero.
Percentage change as a scalar value.
# a time vector x <- 2011:2020 # some random points y <- runif(10) # find percentage change per decade prc_change(y, x, 10)
# a time vector x <- 2011:2020 # some random points y <- runif(10) # find percentage change per decade prc_change(y, x, 10)
Some details about the coin
## S3 method for class 'coin' print(x, ...)
## S3 method for class 'coin' print(x, ...)
x |
A coin |
... |
Arguments to be passed to or from other methods. |
Text output
Some details about the purse
## S3 method for class 'purse' print(x, ...)
## S3 method for class 'purse' print(x, ...)
x |
A purse |
... |
Arguments to be passed to or from other methods. |
Text output
This is a generic wrapper function for Normalise()
, which offers a simpler syntax but less flexibility.
qNormalise(x, ...)
qNormalise(x, ...)
x |
Object to be normalised |
... |
arguments passed to or from other methods. |
See individual method documentation:
A normalised object
This is a wrapper function for Normalise()
, which offers a simpler syntax but less flexibility. It
normalises a data set within a coin using a specified function f_n
which is used to normalise each indicator, with
additional function arguments passed by f_n_para
. By default, f_n = "n_minmax"
and f_n_para
is
set so that the indicators are normalised using the min-max method, between 0 and 100.
## S3 method for class 'coin' qNormalise( x, dset, f_n = "n_minmax", f_n_para = list(l_u = c(0, 100)), directions = NULL, ... )
## S3 method for class 'coin' qNormalise( x, dset, f_n = "n_minmax", f_n_para = list(l_u = c(0, 100)), directions = NULL, ... )
x |
A coin |
dset |
Name of data set to normalise |
f_n |
Name of a normalisation function (as a string) to apply to each indicator. Default |
f_n_para |
Any further arguments to pass to |
directions |
An optional data frame containing the following columns:
|
... |
arguments passed to or from other methods. |
Essentially, this function is similar to Normalise()
but brings parameters into the function arguments
rather than being wrapped in a list. It also does not allow individual normalisation.
See Normalise()
documentation for more details, and vignette("normalise")
.
An updated coin with normalised data set.
# build example coin coin <- build_example_coin(up_to = "new_coin", quietly = TRUE) # normalise raw data set using min max, but change to scale 1-10 coin <- qNormalise(coin, dset = "Raw", f_n = "n_minmax", f_n_para = list(l_u = c(1,10)))
# build example coin coin <- build_example_coin(up_to = "new_coin", quietly = TRUE) # normalise raw data set using min max, but change to scale 1-10 coin <- qNormalise(coin, dset = "Raw", f_n = "n_minmax", f_n_para = list(l_u = c(1,10)))
This is a wrapper function for Normalise()
, which offers a simpler syntax but less flexibility. It
normalises a data frame using a specified function f_n
which is used to normalise each column, with
additional function arguments passed by f_n_para
. By default, f_n = "n_minmax"
and f_n_para
is
set so that the columns of x
are normalised using the min-max method, between 0 and 100.
## S3 method for class 'data.frame' qNormalise(x, f_n = "n_minmax", f_n_para = NULL, directions = NULL, ...)
## S3 method for class 'data.frame' qNormalise(x, f_n = "n_minmax", f_n_para = NULL, directions = NULL, ...)
x |
A numeric data frame |
f_n |
Name of a normalisation function (as a string) to apply to each column of |
f_n_para |
Any further arguments to pass to |
directions |
An optional data frame containing the following columns:
|
... |
arguments passed to or from other methods. |
Essentially, this function is similar to Normalise()
but brings parameters into the function arguments
rather than being wrapped in a list. It also does not allow individual normalisation.
See Normalise()
documentation for more details, and vignette("normalise")
.
A normalised data frame
# some made up data X <- data.frame(uCode = letters[1:10], a = runif(10), b = runif(10)*100) # normalise (defaults to min-max) qNormalise(X)
# some made up data X <- data.frame(uCode = letters[1:10], a = runif(10), b = runif(10)*100) # normalise (defaults to min-max) qNormalise(X)
This is a wrapper function for Normalise()
, which offers a simpler syntax but less flexibility. It
normalises data sets within a purse using a specified function f_n
which is used to normalise each indicator, with
additional function arguments passed by f_n_para
. By default, f_n = "n_minmax"
and f_n_para
is
set so that the indicators are normalised using the min-max method, between 0 and 100.
## S3 method for class 'purse' qNormalise( x, dset, f_n = "n_minmax", f_n_para = list(l_u = c(0, 100)), directions = NULL, global = TRUE, ... )
## S3 method for class 'purse' qNormalise( x, dset, f_n = "n_minmax", f_n_para = list(l_u = c(0, 100)), directions = NULL, global = TRUE, ... )
x |
A purse |
dset |
Name of data set to normalise |
f_n |
Name of a normalisation function (as a string) to apply to each indicator. Default |
f_n_para |
Any further arguments to pass to |
directions |
An optional data frame containing the following columns:
|
global |
Logical: if |
... |
arguments passed to or from other methods. |
Essentially, this function is similar to Normalise()
but brings parameters into the function arguments
rather than being wrapped in a list. It also does not allow individual normalisation.
Normalisation can either be performed independently on each coin, or over the entire panel data set
simultaneously. See the discussion in Normalise.purse()
and vignette("normalise")
.
An updated purse with normalised data sets
# build example purse purse <- build_example_purse(up_to = "new_coin", quietly = TRUE) # normalise using min-max, globally purse <- qNormalise(purse, dset = "Raw", global = TRUE)
# build example purse purse <- build_example_purse(up_to = "new_coin", quietly = TRUE) # normalise using min-max, globally purse <- qNormalise(purse, dset = "Raw", global = TRUE)
This is a generic wrapper function for Treat()
. It offers a simpler syntax but less flexibility.
qTreat(x, ...)
qTreat(x, ...)
x |
Object to be normalised. |
... |
arguments passed to or from other methods. |
See individual method documentation:
A treated object
# See individual method examples
# See individual method examples
A simplified version of Treat()
which allows direct access to the default parameters. This has less flexibility,
but is an easier interface and probably more convenient if the objective is to use the default treatment process
but with some minor adjustments.
## S3 method for class 'coin' qTreat( x, dset, winmax = 5, skew_thresh = 2, kurt_thresh = 3.5, f2 = "log_CT", ... )
## S3 method for class 'coin' qTreat( x, dset, winmax = 5, skew_thresh = 2, kurt_thresh = 3.5, f2 = "log_CT", ... )
x |
A coin |
dset |
Name of data set to treat for outliers |
winmax |
Maximum number of points to Winsorise for each indicator. Default 5. |
skew_thresh |
Absolute skew threshold - default 2. |
kurt_thresh |
Kurtosis threshold - default 3.5. |
f2 |
Function to call if Winsorisation does not bring skew and kurtosis within limits. Default |
... |
arguments passed to or from other methods. |
This function treats each indicator in the data set targeted by dset
using the following process:
First, it checks whether skew and kurtosis are within the specified limits of skew_thresh
and kurt_thresh
If the indicator is not within the limits, it applies the winsorise()
function, with maximum number of winsorised
points specified by winmax
.
If winsorisation does not bring the indicator within the skew/kurtosis limits, it is instead passed to f2
, which is
a second outlier treatment function, default log_CT()
.
The arguments of qTreat()
are passed to Treat()
.
See Treat()
documentation for more details, and vignette("treat")
.
An updated coin with treated data set at .$Data$Treated
.
# build example coin coin <- build_example_coin(up_to = "new_coin", quietly = TRUE) # treat with winmax = 3 coin <- qTreat(coin, dset = "Raw", winmax = 3)
# build example coin coin <- build_example_coin(up_to = "new_coin", quietly = TRUE) # treat with winmax = 3 coin <- qTreat(coin, dset = "Raw", winmax = 3)
A simplified version of Treat()
which allows direct access to the default parameters. This has less flexibility,
but is an easier interface and probably more convenient if the objective is to use the default treatment process
but with some minor adjustments.
## S3 method for class 'data.frame' qTreat(x, winmax = 5, skew_thresh = 2, kurt_thresh = 3.5, f2 = "log_CT", ...)
## S3 method for class 'data.frame' qTreat(x, winmax = 5, skew_thresh = 2, kurt_thresh = 3.5, f2 = "log_CT", ...)
x |
A numeric data frame |
winmax |
Maximum number of points to Winsorise for each column. Default 5. |
skew_thresh |
Absolute skew threshold - default 2. |
kurt_thresh |
Kurtosis threshold - default 3.5. |
f2 |
Function to call if Winsorisation does not bring skew and kurtosis within limits. Default |
... |
arguments passed to or from other methods. |
This function treats each column in x
using the following process:
First, it checks whether skew and kurtosis are within the specified limits of skew_thresh
and kurt_thresh
If the column is not within the limits, it applies the winsorise()
function, with maximum number of winsorised
points specified by winmax
.
If winsorisation does not bring the column within the skew/kurtosis limits, it is instead passed to f2
, which is
a second outlier treatment function, default log_CT()
.
The arguments of qTreat()
are passed to Treat()
.
See Treat()
documentation for more details, and vignette("treat")
.
A list
# select three indicators df1 <- ASEM_iData[c("Flights", "Goods", "Services")] # treat data frame, changing winmax and skew/kurtosis limits l_treat <- qTreat(df1, winmax = 1, skew_thresh = 1.5, kurt_thresh = 3) # Now we check what the results are: l_treat$Dets_Table
# select three indicators df1 <- ASEM_iData[c("Flights", "Goods", "Services")] # treat data frame, changing winmax and skew/kurtosis limits l_treat <- qTreat(df1, winmax = 1, skew_thresh = 1.5, kurt_thresh = 3) # Now we check what the results are: l_treat$Dets_Table
A simplified version of Treat()
which allows direct access to the default parameters. This has less flexibility,
but is an easier interface and probably more convenient if the objective is to use the default treatment process
but with some minor adjustments.
## S3 method for class 'purse' qTreat( x, dset, winmax = 5, skew_thresh = 2, kurt_thresh = 3.5, f2 = "log_CT", ... )
## S3 method for class 'purse' qTreat( x, dset, winmax = 5, skew_thresh = 2, kurt_thresh = 3.5, f2 = "log_CT", ... )
x |
A purse |
dset |
Name of data set to treat for outliers in each coin |
winmax |
Maximum number of points to Winsorise for each indicator. Default 5. |
skew_thresh |
Absolute skew threshold - default 2. |
kurt_thresh |
Kurtosis threshold - default 3.5. |
f2 |
Function to call if Winsorisation does not bring skew and kurtosis within limits. Default |
... |
arguments passed to or from other methods. |
This function simply applies the same data treatment to each coin. See documentation for Treat.coin()
,
qTreat.coin()
and vignette("treat")
.
An updated purse
#
#
Replaces all numerical columns of a data frame with their ranks. Uses sport ranking, i.e. ties
share the highest rank place. Ignores non-numerical columns. See rank()
. Optionally, returns in-group ranks
using a specified grouping column.
rank_df(df, use_group = NULL)
rank_df(df, use_group = NULL)
df |
A data frame |
use_group |
An optional column of df (specified as a string) to use as a grouping variable. If specified, returns ranks inside each group present in this column. |
This function replaces the now-defunct rankDF()
from COINr < v1.0.
A data frame equal to the data frame that was input, but with any numerical columns replaced with ranks.
# some random data, with a column of characters df <- data.frame(RName = c("A", "B", "C"), Score1 = runif(3), Score2 = runif(3)) # convert to ranks rank_df(df) # grouped ranking - use some example data df1 <- ASEM_iData[c("uCode", "GDP_group", "Goods", "LPI")] rank_df(df1, use_group = "GDP_group")
# some random data, with a column of characters df <- data.frame(RName = c("A", "B", "C"), Score1 = runif(3), Score2 = runif(3)) # convert to ranks rank_df(df) # grouped ranking - use some example data df1 <- ASEM_iData[c("uCode", "GDP_group", "Goods", "LPI")] rank_df(df1, use_group = "GDP_group")
Methods for regenerating coins and purses. Regeneration is re-running all the functions used to build
the coin/purse, using the order and parameters found in the .$Log
list of the coin.
Regen(x, from = NULL, quietly = TRUE)
Regen(x, from = NULL, quietly = TRUE)
x |
A coin or purse object to be regenerated |
from |
Optional: a construction function name. If specified, regeneration begins from this function, rather than re-running all functions. |
quietly |
If |
Please see individual method documentation:
See also vignette("adjustments")
.
This function replaces the now-defunct regen()
from COINr < v1.0.
A regenerated object
# see individual method examples
# see individual method examples
Regenerates the .$Data
entries in a coin by rerunning the construction functions according to the specifications in .$Log
.
This effectively regenerates the results. Different variations of coins can be quickly achieved by editing the
saved arguments in .$Log
and regenerating.
## S3 method for class 'coin' Regen(x, from = NULL, quietly = TRUE, ...)
## S3 method for class 'coin' Regen(x, from = NULL, quietly = TRUE, ...)
x |
A coin class object |
from |
Optional: a construction function name. If specified, regeneration begins from this function, rather than re-running all functions. |
quietly |
If |
... |
arguments passed to or from other methods. |
The from
argument allows partial regeneration, starting from a
specified function. This can be helpful to speed up regeneration in some cases. However, keep in mind that
if you change a .$Log
argument from a function that is run before the point that you choose to start running
from, it will not affect the results.
Note that while sets of weights will be passed to the regenerated COIN, anything in .$Analysis
will be removed
and will have to be recalculated.
See also vignette("adjustments")
for more info on regeneration.
Updated coin object with regenerated results (data sets).
# build full example coin coin <- build_example_coin(quietly = TRUE) # copy coin coin2 <- coin # change to prank function (percentile ranks) # we don't need to specify any additional parameters (f_n_para) here coin2$Log$Normalise$global_specs <- list(f_n = "n_prank") # regenerate coin2 <- Regen(coin2) # compare index, sort by absolute rank difference compare_coins(coin, coin2, dset = "Aggregated", iCode = "Index", sort_by = "Abs.diff", decreasing = TRUE)
# build full example coin coin <- build_example_coin(quietly = TRUE) # copy coin coin2 <- coin # change to prank function (percentile ranks) # we don't need to specify any additional parameters (f_n_para) here coin2$Log$Normalise$global_specs <- list(f_n = "n_prank") # regenerate coin2 <- Regen(coin2) # compare index, sort by absolute rank difference compare_coins(coin, coin2, dset = "Aggregated", iCode = "Index", sort_by = "Abs.diff", decreasing = TRUE)
Regenerates the .$Data
entries in all coins by rerunning the construction functions according to the specifications in
.$Log
, for each coin in the purse. This effectively regenerates the results.
## S3 method for class 'purse' Regen(x, from = NULL, quietly = TRUE, ...)
## S3 method for class 'purse' Regen(x, from = NULL, quietly = TRUE, ...)
x |
A purse class object |
from |
Optional: a construction function name. If specified, regeneration begins from this function, rather than re-running all functions. |
quietly |
If |
... |
arguments passed to or from other methods. |
The from
argument allows partial regeneration, starting from a
specified function. This can be helpful to speed up regeneration in some cases. However, keep in mind that
if you change a .$Log
argument from a function that is run before the point that you choose to start running
from, it will not affect the results.
Note that for the moment, regeneration of purses is only partially supported. This is because usually, in the
normalisation step, it is necessary to normalise across the full panel data set (see the global
argument in
Normalise()
). At the moment, purse regeneration is performed by regenerating each coin individually, but this
does not allow for global normalisation which has to be done at the purse level. This may be fixed in future
releases.
See also documentation for Regen.coin()
and vignette("adjustments")
.
Updated purse object with regenerated results.
# see examples from Regen.coin() and vignette("adjustments")
# see examples from Regen.coin() and vignette("adjustments")
This is an analysis function for seeing what happens when elements of the composite indicator are removed. This can help with "what if" experiments and acts as different measure of the influence of each indicator or aggregate.
remove_elements(coin, Level, dset, iCode, quietly = FALSE)
remove_elements(coin, Level, dset, iCode, quietly = FALSE)
coin |
A coin class object, which must be constructed up to and including the aggregation step, i.e. using |
Level |
The level at which to remove elements. For example, |
dset |
The name of the data set to take |
iCode |
A character string indicating the indicator or aggregate code to extract from each iteration. I.e. normally this would be set to
the index code to compare the ranks of the index upon removing each indicator or aggregate. But it can be any code that is present in
|
quietly |
Logical: if |
One way of looking at indicator "importance" in a composite indicator is via correlations. A different way is to see what happens if we remove the indicator completely from the framework. If removing an indicator or a whole aggregation of indicators results in very little rank change, it is one indication that perhaps it is not necessary to include it. Emphasis on one: there may be many other things to take into account.
This function works by successively setting the weight of each indicator or aggregate to zero. If the analysis is performed at the indicator level, it creates a copy of the coin, sets the weight of the first indicator to zero, regenerates the results, and compares to the nominal results (results when no weights are set to zero). It repeats this for each indicator in turn, such that each time one indicator is set to zero weights, and the others retain their original weights. The output is a series of tables comparing scores and ranks (see Value).
Note that "removing the indicator" here means more precisely "setting its weight to zero". In most cases the first implies the second, but check that the aggregation method that you are using satisfies this relationship. For example, if the aggregation method does not use any weights, then setting the weight to zero will have no effect.
This function replaces the now-defunct removeElements()
from COINr < v1.0.
A list with elements as follows:
.$Scores
: a data frame where each column is the scores for each unit, with indicator/aggregate corresponding to the column name removed.
E.g. .$Scores$Ind1
gives the scores resulting from removing "Ind1".
.$Ranks
: as above but ranks
.$RankDiffs
: as above but difference between nominal rank and rank on removing each indicator/aggregate
.$RankAbsDiffs
: as above but absolute rank differences
.$MeanAbsDiffs
: as above, but the mean of each column. So it is the mean (over units) absolute rank change resulting from removing each
indicator or aggregate.
# build example coin coin <- build_example_coin(quietly = TRUE) # run function removing elements in level 2 l_res <- remove_elements(coin, Level = 3, dset = "Aggregated", iCode = "Index") # get summary of rank changes l_res$MeanAbsDiff
# build example coin coin <- build_example_coin(quietly = TRUE) # run function removing elements in level 2 l_res <- remove_elements(coin, Level = 3, dset = "Aggregated", iCode = "Index") # get summary of rank changes l_res$MeanAbsDiff
Given a data frame (or vector), this function replaces values according to a look up table or dictionary. In COINr this may be useful for exchanging categorical data with numeric scores, prior to assembly. Or for changing codes.
replace_df(df, lookup)
replace_df(df, lookup)
df |
A data frame or a vector |
lookup |
A data frame with columns |
The lookup data frame must not have any duplicated values in the old
column. This function looks for exact matches of
elements of the old
column and replaces them with the corresponding value in the new
column. For each row of lookup
,
the class of the old value must match the class of the new value. This is to keep classes of data frames columns consistent.
If you wish to replace with a different class, you should convert classes in your data frame before using this function.
This function replaces the now-defunct replaceDF()
from COINr < v1.0.
A data frame with replaced values
# replace sub-pillar codes in ASEM indicator metadata codeswap <- data.frame(old = c("Conn", "Sust"), new = c("SI1", "SI2")) # swap codes in both iMeta replace_df(ASEM_iMeta, codeswap)
# replace sub-pillar codes in ASEM indicator metadata codeswap <- data.frame(old = c("Conn", "Sust"), new = c("SI1", "SI2")) # swap codes in both iMeta replace_df(ASEM_iMeta, codeswap)
Tiny function just to round down a data frame for display in a table, ignoring non-numeric columns.
round_df(df, decimals = 2)
round_df(df, decimals = 2)
df |
A data frame to input |
decimals |
The number of decimal places to round to (default 2) |
This function replaces the now-defunct roundDF()
from COINr < v1.0.
A data frame, with any numeric columns rounded to the specified amount.
round_df( as.data.frame(matrix(runif(20),10,2)), decimals = 3)
round_df( as.data.frame(matrix(runif(20),10,2)), decimals = 3)
Post process a sample to obtain sensitivity indices. This function takes a univariate output
which is generated as a result of running a Monte Carlo sample from SA_sample()
through a system.
Then it estimates sensitivity indices using this sample.
SA_estimate(yy, N, d, Nboot = NULL)
SA_estimate(yy, N, d, Nboot = NULL)
yy |
A vector of model output values, as a result of a |
N |
The number of sample points per dimension. |
d |
The dimensionality of the sample |
Nboot |
Number of bootstrap draws for estimates of confidence intervals on sensitivity indices. If this is not specified, bootstrapping is not applied. |
This function is built to be used inside get_sensitivity()
.
A list with the output variance, plus a data frame of first order and total order sensitivity indices for
each variable, as well as bootstrapped confidence intervals if !is.null(Nboot)
.
get_sensitivity()
Perform global sensitivity or uncertainty analysis on a COIN
SA_sample()
Input design for estimating sensitivity indices
# This is a generic example rather than applied to a COIN (for reasons of speed) # A simple test function testfunc <- function(x){ x[1] + 2*x[2] + 3*x[3] } # First, generate a sample X <- SA_sample(500, 3) # Run sample through test function to get corresponding output for each row y <- apply(X, 1, testfunc) # Estimate sensitivity indices using sample SAinds <- SA_estimate(y, N = 500, d = 3, Nboot = 1000) SAinds$SensInd # Notice that total order indices have narrower confidence intervals than first order.
# This is a generic example rather than applied to a COIN (for reasons of speed) # A simple test function testfunc <- function(x){ x[1] + 2*x[2] + 3*x[3] } # First, generate a sample X <- SA_sample(500, 3) # Run sample through test function to get corresponding output for each row y <- apply(X, 1, testfunc) # Estimate sensitivity indices using sample SAinds <- SA_estimate(y, N = 500, d = 3, Nboot = 1000) SAinds$SensInd # Notice that total order indices have narrower confidence intervals than first order.
Generates an input sample for a Monte Carlo estimation of global sensitivity indices. Used in
the get_sensitivity()
function. The total sample size will be .
SA_sample(N, d)
SA_sample(N, d)
N |
The number of sample points per dimension. |
d |
The dimensionality of the sample |
This function generates a Monte Carlo sample as described e.g. in the Global Sensitivity Analysis: The Primer book.
A matrix with rows and
d
columns.
get_sensitivity()
Perform global sensitivity or uncertainty analysis on a COIN.
SA_estimate()
Estimate sensitivity indices from system output, as a result of input design from SA_sample().
# sensitivity analysis sample for 3 dimensions with 100 points per dimension X <- SA_sample(100, 3)
# sensitivity analysis sample for 3 dimensions with 100 points per dimension X <- SA_sample(100, 3)
This is a generic function for screening units/rows based on data availability. See method documentation for more details:
Screen(x, ...)
Screen(x, ...)
x |
Object to be screened |
... |
arguments passed to or from other methods. |
This function replaces the now-defunct checkData()
from COINr < v1.0.
An object of the same class as x
Screens units based on a data availability threshold and presence of zeros. Units can be optionally "forced" to be included or excluded, making exceptions for the data availability threshold.
## S3 method for class 'coin' Screen( x, dset, unit_screen, dat_thresh = NULL, nonzero_thresh = NULL, Force = NULL, out2 = "coin", write_to = NULL, ... )
## S3 method for class 'coin' Screen( x, dset, unit_screen, dat_thresh = NULL, nonzero_thresh = NULL, Force = NULL, out2 = "coin", write_to = NULL, ... )
x |
A coin |
dset |
The data set to be checked/screened |
unit_screen |
Specifies whether and how to screen units based on data availability or zero values.
|
dat_thresh |
A data availability threshold ( |
nonzero_thresh |
As |
Force |
A data frame with any additional countries to force inclusion or exclusion. Required columns |
out2 |
Where to output the results. If |
write_to |
If specified, writes the aggregated data to |
... |
arguments passed to or from other methods. |
The two main criteria of interest are NA
values, and zeros. The summary table gives percentages of
NA
values for each unit, across indicators, and percentage zero values (as a percentage of non-NA
values).
Each unit is flagged as having low data or too many zeros based on thresholds.
See also vignette("screening")
.
An updated coin with data frames showing missing data in .$Analysis
, and a new data set .$Data$Screened
.
If out2 = "list"
wraps missing data stats and screened data set into a list.
# build example coin coin <- build_example_coin(up_to = "new_coin", quietly = TRUE) # screen units from raw dset coin <- Screen(coin, dset = "Raw", unit_screen = "byNA", dat_thresh = 0.85, write_to = "Filtered_85pc") # some details about the coin by calling its print method coin
# build example coin coin <- build_example_coin(up_to = "new_coin", quietly = TRUE) # screen units from raw dset coin <- Screen(coin, dset = "Raw", unit_screen = "byNA", dat_thresh = 0.85, write_to = "Filtered_85pc") # some details about the coin by calling its print method coin
Screens units (rows) based on a data availability threshold and presence of zeros. Units can be optionally "forced" to be included or excluded, making exceptions for the data availability threshold.
## S3 method for class 'data.frame' Screen( x, id_col = NULL, unit_screen, dat_thresh = NULL, nonzero_thresh = NULL, Force = NULL, ... )
## S3 method for class 'data.frame' Screen( x, id_col = NULL, unit_screen, dat_thresh = NULL, nonzero_thresh = NULL, Force = NULL, ... )
x |
A data frame |
id_col |
Name of column of the data frame to be used as the identifier, e.g. normally this would be |
unit_screen |
Specifies whether and how to screen units based on data availability or zero values.
|
dat_thresh |
A data availability threshold ( |
nonzero_thresh |
As |
Force |
A data frame with any additional units to force inclusion or exclusion. Required columns |
... |
arguments passed to or from other methods. |
The two main criteria of interest are NA
values, and zeros. The summary table gives percentages of
NA
values for each unit, across indicators, and percentage zero values (as a percentage of non-NA
values).
Each unit is flagged as having low data or too many zeros based on thresholds.
See also vignette("screening")
.
Missing data stats and screened data as a list.
# example data iData <- ASEM_iData[40:51, c("uCode", "Research", "Pat", "CultServ", "CultGood")] # screen to 75% data availability (by row) l_scr <- Screen(iData, unit_screen = "byNA", dat_thresh = 0.75) # summary of screening head(l_scr$DataSummary)
# example data iData <- ASEM_iData[40:51, c("uCode", "Research", "Pat", "CultServ", "CultGood")] # screen to 75% data availability (by row) l_scr <- Screen(iData, unit_screen = "byNA", dat_thresh = 0.75) # summary of screening head(l_scr$DataSummary)
Screens units based on a data availability threshold and presence of zeros. Units can be optionally "forced" to be included or excluded, making exceptions for the data availability threshold.
## S3 method for class 'purse' Screen( x, dset, unit_screen, dat_thresh = NULL, nonzero_thresh = NULL, Force = NULL, write_to = NULL, ... )
## S3 method for class 'purse' Screen( x, dset, unit_screen, dat_thresh = NULL, nonzero_thresh = NULL, Force = NULL, write_to = NULL, ... )
x |
A purse object |
dset |
The data set to be checked/screened |
unit_screen |
Specifies whether and how to screen units based on data availability or zero values.
|
dat_thresh |
A data availability threshold ( |
nonzero_thresh |
As |
Force |
A data frame with any additional countries to force inclusion or exclusion. Required columns |
write_to |
If specified, writes the aggregated data to |
... |
arguments passed to or from other methods. |
The two main criteria of interest are NA
values, and zeros. The summary table gives percentages of
NA
values for each unit, across indicators, and percentage zero values (as a percentage of non-NA
values).
Each unit is flagged as having low data or too many zeros based on thresholds.
See also vignette("screening")
.
An updated purse with coins screened and updated.
# see vignette("screening") for an example.
# see vignette("screening") for an example.
Tiny function just to round down a data frame by significant figures for display in a table, ignoring non-numeric columns.
signif_df(df, digits = 3)
signif_df(df, digits = 3)
df |
A data frame to input |
digits |
The number of decimal places to round to (default 3) |
A data frame, with any numeric columns rounded to the specified amount.
signif_df( as.data.frame(matrix(runif(20),10,2)), digits = 3)
signif_df( as.data.frame(matrix(runif(20),10,2)), digits = 3)
Calculates skewness of the values of a numeric vector. This uses the same definition of skewness as
the "skewness()" function in the "e1071" package where type == 2
, which is equivalent to the definition of skewness used in Excel.
skew(x, na.rm = FALSE)
skew(x, na.rm = FALSE)
x |
A numeric vector. |
na.rm |
Set |
A skewness value (scalar).
x <- runif(20) skew(x)
x <- runif(20) skew(x)
Generic function for treating outliers using a two-step process. See individual method documentation:
Treat(x, ...)
Treat(x, ...)
x |
Object to be treated |
... |
arguments passed to or from other methods. |
See also vignette("treat")
.
This function replaces the now-defunct treat()
from COINr < v1.0.
Treated object plus details.
Operates a two-stage data treatment process on the data set specified by dset
, based on two data treatment functions, and a pass/fail
function which detects outliers. The method of data treatment can be either specified by the global_specs
argument (which applies
the same specifications to all indicators in the specified data set), or else (additionally) by the indiv_specs
argument which allows different
methods to be applied for each indicator. See details. For a simpler function for data treatment, see the wrapper function qTreat()
.
## S3 method for class 'coin' Treat( x, dset, global_specs = NULL, indiv_specs = NULL, combine_treat = FALSE, out2 = "coin", write_to = NULL, write2log = TRUE, disable = FALSE, ... )
## S3 method for class 'coin' Treat( x, dset, global_specs = NULL, indiv_specs = NULL, combine_treat = FALSE, out2 = "coin", write_to = NULL, write2log = TRUE, disable = FALSE, ... )
x |
A coin |
dset |
A named data set available in |
global_specs |
A list specifying the treatment to apply to all columns. This will be applied to all columns, except any
that are specified in the |
indiv_specs |
A list specifying any individual treatment to apply to specific columns, overriding |
combine_treat |
By default, if |
out2 |
The type of function output: either |
write_to |
If specified, writes the aggregated data to |
write2log |
Logical: if |
disable |
Logical: if |
... |
arguments passed to or from other methods. |
An updated coin with a new data set .Data$Treated
added, plus analysis information in
.$Analysis$Treated
.
If the same method of data treatment should be applied to all indicators, use the global_specs
argument. This argument takes a structured
list which looks like this:
global_specs = list(f1 = ., f1_para = list(.), f2 = ., f2_para = list(.), f_pass = ., f_pass_para = list() )
The entries in this list correspond to arguments in Treat.numeric()
, and the meanings of each are also described in more detail here
below. In brief, f1
is the name of a function to apply at the first round of data treatment, f1_para
is a list of any additional
parameters to pass to f1
, f2
and f2_para
are equivalently the function name and parameters of the second round of data treatment, and
f_pass
and f_pass_para
are the function and additional arguments to check for the existence of outliers.
The default values for global_specs
are as follows:
global_specs = list(f1 = "winsorise", f1_para = list(na.rm = TRUE, winmax = 5, skew_thresh = 2, kurt_thresh = 3.5, force_win = FALSE), f2 = "log_CT", f2_para = list(na.rm = TRUE), f_pass = "check_SkewKurt", f_pass_para = list(na.rm = TRUE, skew_thresh = 2, kurt_thresh = 3.5))
This shows that by default (i.e. if global_specs
is not specified), each indicator is checked for outliers by the check_SkewKurt()
function, which
uses skew and kurtosis thresholds as its parameters. Then, if outliers exist, the first function winsorise()
is applied, which also
uses skew and kurtosis parameters, as well as a maximum number of winsorised points. If the Winsorisation function does not satisfy
f_pass
, the log_CT()
function is invoked.
To change the global specifications, you don't have to supply the whole list. If, for example, you are happy with all the defaults but
want to simply change the maximum number of Winsorised points, you could specify e.g. global_specs = list(f1_para = list(winmax = 3))
.
In other words, a subset of the list can be specified, as long as the structure of the list is correct.
The indiv_specs
argument allows different specifications for each indicator. This is done by wrapping multiple lists of the format of the
list described in global_specs
into one single list, named according to the column names of x
. For example, if the date set has indicators with codes
"x1", "x2" and "x3", we could specify individual treatment as follows:
indiv_specs = list(x1 = list(.), x2 = list(.) x3 = list(.))
where each list(.)
is a specifications list of the same format as global_specs
. Any indicators that are not named in indiv_specs
are
treated using the specifications from global_specs
(which will be the defaults if it is not specified). As with global_specs
,
a subset of the global_specs
list may be specified for
each entry. Additionally, as a special case, specifying a list entry as e.g. x1 = "none"
will apply no data treatment to the indicator "x1". See
vignette("treat")
for examples of individual treatment.
This function is set up to allow any functions to be passed as the
data treatment functions (f1
and f2
), as well as any function to be passed as the outlier detection
function f_pass
, as specified in the global_specs
and indiv_specs
arguments.
The arrangement of this function is inspired by a fairly standard data treatment process applied to indicators, which consists of checking skew and kurtosis, then if the criteria are not met, applying Winsorisation up to a specified limit. Then if Winsorisation still does not bring skew and kurtosis within limits, applying a nonlinear transformation such as log or Box-Cox.
This function generalises this process by using the following general steps:
Check if variable passes or fails using f_pass
If f_pass
returns FALSE
, apply f1
, else return x
unmodified
Check again using *f_pass
If f_pass
still returns FALSE
, apply f2
Return the modified x
as well as other information.
For the "typical" case described above f1
is a Winsorisation function, f2
is a nonlinear transformation
and f_pass
is a skew and kurtosis check. Parameters can be passed to each of these three functions in
a named list, for example to specify a maximum number of points to Winsorise, or Box-Cox parameters, or anything
else. The constraints are that:
All of f1
, f2
and f_pass
must follow the format function(x, f_para)
, where x
is a
numerical vector, and f_para
is a list of other function parameters to be passed to the function, which
is specified by f1_para
for f1
and similarly for the other functions. If the function has no parameters
other than x
, then f_para
can be omitted.
f1
and f2
should return either a list with .$x
as the modified numerical vector, and any other information
to be attached to the list, OR, simply x
as the only output.
f_pass
must return a logical value, where TRUE
indicates that the x
passes the criteria (and
therefore doesn't need any (more) treatment), and FALSE
means that it fails to meet the criteria.
See also vignette("treat")
.
# build example coin coin <- build_example_coin(up_to = "new_coin") # treat raw data set coin <- Treat(coin, dset = "Raw") # summary of treatment for each indicator head(coin$Analysis$Treated$Dets_Table)
# build example coin coin <- build_example_coin(up_to = "new_coin") # treat raw data set coin <- Treat(coin, dset = "Raw") # summary of treatment for each indicator head(coin$Analysis$Treated$Dets_Table)
Operates a two-stage data treatment process, based on two data treatment functions, and a pass/fail
function which detects outliers. The method of data treatment can be either specified by the global_specs
argument (which applies
the same specifications to all columns in x
), or else (additionally) by the indiv_specs
argument which allows different
methods to be applied for each column. See details. For a simpler function for data treatment, see the wrapper function qTreat()
.
## S3 method for class 'data.frame' Treat(x, global_specs = NULL, indiv_specs = NULL, combine_treat = FALSE, ...)
## S3 method for class 'data.frame' Treat(x, global_specs = NULL, indiv_specs = NULL, combine_treat = FALSE, ...)
x |
A data frame. Can have both numeric and non-numeric columns. |
global_specs |
A list specifying the treatment to apply to all columns. This will be applied to all columns, except any
that are specified in the |
indiv_specs |
A list specifying any individual treatment to apply to specific columns, overriding |
combine_treat |
By default, if |
... |
arguments passed to or from other methods. |
A treated data frame of data
If the same method of data treatment should be applied to all the columns, use the global_specs
argument. This argument takes a structured
list which looks like this:
global_specs = list(f1 = ., f1_para = list(.), f2 = ., f2_para = list(.), f_pass = ., f_pass_para = list() )
The entries in this list correspond to arguments in Treat.numeric()
, and the meanings of each are also described in more detail here
below. In brief, f1
is the name of a function to apply at the first round of data treatment, f1_para
is a list of any additional
parameters to pass to f1
, f2
and f2_para
are equivalently the function name and parameters of the second round of data treatment, and
f_pass
and f_pass_para
are the function and additional arguments to check for the existence of outliers.
The default values for global_specs
are as follows:
global_specs = list(f1 = "winsorise", f1_para = list(na.rm = TRUE, winmax = 5, skew_thresh = 2, kurt_thresh = 3.5, force_win = FALSE), f2 = "log_CT", f2_para = list(na.rm = TRUE), f_pass = "check_SkewKurt", f_pass_para = list(na.rm = TRUE, skew_thresh = 2, kurt_thresh = 3.5))
This shows that by default (i.e. if global_specs
is not specified), each column is checked for outliers by the check_SkewKurt()
function, which
uses skew and kurtosis thresholds as its parameters. Then, if outliers exist, the first function winsorise()
is applied, which also
uses skew and kurtosis parameters, as well as a maximum number of winsorised points. If the Winsorisation function does not satisfy
f_pass
, the log_CT()
function is invoked.
To change the global specifications, you don't have to supply the whole list. If, for example, you are happy with all the defaults but
want to simply change the maximum number of Winsorised points, you could specify e.g. global_specs = list(f1_para = list(winmax = 3))
.
In other words, a subset of the list can be specified, as long as the structure of the list is correct.
The indiv_specs
argument allows different specifications for each column in x
. This is done by wrapping multiple lists of the format of the
list described in global_specs
into one single list, named according to the column names of x
. For example, if x
has column names
"x1", "x2" and "x3", we could specify individual treatment as follows:
indiv_specs = list(x1 = list(.), x2 = list(.) x3 = list(.))
where each list(.)
is a specifications list of the same format as global_specs
. Any columns that are not named in indiv_specs
are
treated using the specifications from global_specs
(which will be the defaults if it is not specified). As with global_specs
,
a subset of the global_specs
list may be specified for
each entry. Additionally, as a special case, specifying a list entry as e.g. x1 = "none"
will apply no data treatment to the column "x1". See
vignette("treat")
for examples of individual treatment.
This function is set up to allow any functions to be passed as the
data treatment functions (f1
and f2
), as well as any function to be passed as the outlier detection
function f_pass
, as specified in the global_specs
and indiv_specs
arguments.
The arrangement of this function is inspired by a fairly standard data treatment process applied to indicators, which consists of checking skew and kurtosis, then if the criteria are not met, applying Winsorisation up to a specified limit. Then if Winsorisation still does not bring skew and kurtosis within limits, applying a nonlinear transformation such as log or Box-Cox.
This function generalises this process by using the following general steps:
Check if variable passes or fails using f_pass
If f_pass
returns FALSE
, apply f1
, else return x
unmodified
Check again using *f_pass
If f_pass
still returns FALSE
, apply f2
Return the modified x
as well as other information.
For the "typical" case described above f1
is a Winsorisation function, f2
is a nonlinear transformation
and f_pass
is a skew and kurtosis check. Parameters can be passed to each of these three functions in
a named list, for example to specify a maximum number of points to Winsorise, or Box-Cox parameters, or anything
else. The constraints are that:
All of f1
, f2
and f_pass
must follow the format function(x, f_para)
, where x
is a
numerical vector, and f_para
is a list of other function parameters to be passed to the function, which
is specified by f1_para
for f1
and similarly for the other functions. If the function has no parameters
other than x
, then f_para
can be omitted.
f1
and f2
should return either a list with .$x
as the modified numerical vector, and any other information
to be attached to the list, OR, simply x
as the only output.
f_pass
must return a logical value, where TRUE
indicates that the x
passes the criteria (and
therefore doesn't need any (more) treatment), and FALSE
means that it fails to meet the criteria.
See also vignette("treat")
.
# select three indicators df1 <- ASEM_iData[c("Flights", "Goods", "Services")] # treat the data frame using defaults l_treat <- Treat(df1) # details of data treatment for each column l_treat$Dets_Table
# select three indicators df1 <- ASEM_iData[c("Flights", "Goods", "Services")] # treat the data frame using defaults l_treat <- Treat(df1) # details of data treatment for each column l_treat$Dets_Table
Operates a two-stage data treatment process, based on two data treatment functions, and a pass/fail
function which detects outliers. This function is set up to allow any functions to be passed as the
data treatment functions (f1
and f2
), as well as any function to be passed as the outlier detection
function f_pass
.
## S3 method for class 'numeric' Treat( x, f1, f1_para = NULL, f2 = NULL, f2_para = NULL, f_pass, f_pass_para = NULL, combine_treat = FALSE, ... )
## S3 method for class 'numeric' Treat( x, f1, f1_para = NULL, f2 = NULL, f2_para = NULL, f_pass, f_pass_para = NULL, combine_treat = FALSE, ... )
x |
A numeric vector. |
f1 |
First stage data treatment function e.g. as a string. |
f1_para |
First stage data treatment function parameters as a named list. |
f2 |
First stage data treatment function as a string. |
f2_para |
First stage data treatment function parameters as a named list. |
f_pass |
A string specifying an outlier detection function - see details. Default |
f_pass_para |
Any further arguments to pass to |
combine_treat |
By default, if |
... |
arguments passed to or from other methods. |
The arrangement of this function is inspired by a fairly standard data treatment process applied to indicators, which consists of checking skew and kurtosis, then if the criteria are not met, applying Winsorisation up to a specified limit. Then if Winsorisation still does not bring skew and kurtosis within limits, applying a nonlinear transformation such as log or Box-Cox.
This function generalises this process by using the following general steps:
Check if variable passes or fails using f_pass
If f_pass
returns FALSE
, apply f1
, else return x
unmodified
Check again using *f_pass
If f_pass
still returns FALSE
, apply f2
(by default to the original x
, see combine_treat
parameter)
Return the modified x
as well as other information.
For the "typical" case described above f1
is a Winsorisation function, f2
is a nonlinear transformation
and f_pass
is a skew and kurtosis check. Parameters can be passed to each of these three functions in
a named list, for example to specify a maximum number of points to Winsorise, or Box-Cox parameters, or anything
else. The constraints are that:
All of f1
, f2
and f_pass
must follow the format function(x, f_para)
, where x
is a
numerical vector, and f_para
is a list of other function parameters to be passed to the function, which
is specified by f1_para
for f1
and similarly for the other functions. If the function has no parameters
other than x
, then f_para
can be omitted.
f1
and f2
should return either a list with .$x
as the modified numerical vector, and any other information
to be attached to the list, OR, simply x
as the only output.
f_pass
must return a logical value, where TRUE
indicates that the x
passes the criteria (and
therefore doesn't need any (more) treatment), and FALSE
means that it fails to meet the criteria.
See also vignette("treat")
.
A treated vector of data.
# numbers between 1 and 10 x <- 1:10 # two outliers x <- c(x, 30, 100) # check whether passes skew/kurt test check_SkewKurt(x) # treat using winsorisation l_treat <- Treat(x, f1 = "winsorise", f1_para = list(winmax = 2), f_pass = "check_SkewKurt") # plot original against treated plot(x, l_treat$x)
# numbers between 1 and 10 x <- 1:10 # two outliers x <- c(x, 30, 100) # check whether passes skew/kurt test check_SkewKurt(x) # treat using winsorisation l_treat <- Treat(x, f1 = "winsorise", f1_para = list(winmax = 2), f_pass = "check_SkewKurt") # plot original against treated plot(x, l_treat$x)
This function calls Treat.coin()
for each coin in the purse. See the documentation of that function for
details. See also vignette("treat")
.
## S3 method for class 'purse' Treat( x, dset, global_specs = NULL, indiv_specs = NULL, combine_treat = FALSE, write_to = NULL, disable = FALSE, ... )
## S3 method for class 'purse' Treat( x, dset, global_specs = NULL, indiv_specs = NULL, combine_treat = FALSE, write_to = NULL, disable = FALSE, ... )
x |
A purse object |
dset |
The data set to treat in each coin. |
global_specs |
Default specifications. See details in |
indiv_specs |
Individual specifications. See details in |
combine_treat |
By default, if |
write_to |
If specified, writes the aggregated data to |
disable |
Logical: if |
... |
arguments passed to or from other methods. |
An updated purse with new treated data sets added at .$Data$Treated
in each coin, plus
analysis information at .$Analysis$Treated
# See `vignette("treat")`.
# See `vignette("treat")`.
Convert uCodes to uNames
ucodes_to_unames(coin, uCodes)
ucodes_to_unames(coin, uCodes)
coin |
A coin |
uCodes |
A vector of uCodes |
Vector of uNames
Follows a "standard" Winsorisation approach: points are successively Winsorised in order to bring skew and kurtosis thresholds within specified limits. Specifically, aims to bring absolute skew to below a threshold (default 2.25) and kurtosis below another threshold (default 3.5).
winsorise( x, na.rm = FALSE, winmax = 5, skew_thresh = 2, kurt_thresh = 3.5, force_win = FALSE )
winsorise( x, na.rm = FALSE, winmax = 5, skew_thresh = 2, kurt_thresh = 3.5, force_win = FALSE )
x |
A numeric vector. |
na.rm |
Set |
winmax |
Maximum number of points to Winsorise. Default 5. Set |
skew_thresh |
A threshold for absolute skewness (positive). Default 2.25. |
kurt_thresh |
A threshold for kurtosis. Default 3.5. |
force_win |
Logical: if |
Winsorisation here is defined as reassigning the point with the highest/lowest value with the value of the
next highest/lowest point. Whether to Winsorise at the high or low end of the scale is decided by the direction
of the skewness of x
.
This function replaces the now-defunct coin_win()
from COINr < v1.0.
A list containing winsorised data, number of winsorised points, and the individual points that were treated.
# numbers between 1 and 10 x <- 1:10 # two outliers x <- c(x, 30, 100) # winsorise l_win <- winsorise(x, skew_thresh = 2, kurt_thresh = 3.5) # see treated vector, number of winsorised points and details l_win
# numbers between 1 and 10 x <- 1:10 # two outliers x <- c(x, 30, 100) # winsorise l_win <- winsorise(x, skew_thresh = 2, kurt_thresh = 3.5) # see treated vector, number of winsorised points and details l_win
A small selection of common denominator indicators, which includes GDP, Population, Area, GDP per capita and income group. All data sourced from the World Bank as of Feb 2021 (data is typically from 2019). Note that this is intended as example data, and it would be a good idea to use updated data from the World Bank when needed. In this data set, country names have been altered slightly so as to include no accents - this is simply to make it more portable between distributions.
WorldDenoms
WorldDenoms
A data frame with 249 rows and 7 variables.