--- title: "Unit Screening" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Unit Screening} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` Unit screening is a screening or filtering of units based on data availability rules. Just like with indicators (columns), when a unit (row) has very few data points available, it may make sense to remove it. This avoids drawing conclusions on units with very few data points. It will also increase the percentage data availability of each indicator once the units have been removed. The COINr function `Screen()` is a generic function with methods for data frames, coins and purses. It is a *building* function in that it creates a new data set in `$.Data` as its output. # Data frames We begin with data frames. Let's take a subset of the inbuilt example data for demonstration. I cherry-pick some rows and columns which have some missing values. ```{r} library(COINr) # example data iData <- ASEM_iData[40:51, c("uCode", "Research", "Pat", "CultServ", "CultGood")] iData ``` The data has four indicators, plus an identifier column "uCode". Looking at each unit, the data availability is variable. We have 12 units in total. Now let's use `Screen()` to screen out some of these units. Specifically, we will remove any units that have less than 75% data availabilty (3 of 4 indicators with non-`NA` values): ```{r} l_scr <- Screen(iData, unit_screen = "byNA", dat_thresh = 0.75) ``` The output of `Screen()` is a list: ```{r} str(l_scr, max.level = 1) ``` We can see already that the "RemovedUnits" entry tells us that three units were removed based on our specifications. We now have our new screened data set: ```{r} l_scr$ScreenedData ``` And we have a summary of data availability and some other things: ```{r} head(l_scr$DataSummary) ``` This table is in fact generated by `get_data_avail()` - some more details can be found in the [Analysis](analysis.html) vignette. Other than data availability, units can also be screened based on the presence of zeros, or on both - this is specified by the `unit_screen` argument. Use the `Force`^[Luke. Sorry.] argument to override the screening rules for specified units if required (either to force inclusion or force exclusion). # Coins Screening on coins is very similar to data frames, because the coin method extracts the relevant data set, passes it to the data frame method, and then then puts the output back as a new data set. This means the arguments are almost the same. The only thing different is to specify which data set to screen, the name to give the new data set, and whether to output a coin or a list. We'll build the example coin, then screen the raw data set with a threshold of 85% data availability and also name the new data set something different rather than "Screened" (the default): ```{r} # build example coin coin <- build_example_coin(up_to = "new_coin", quietly = TRUE) # screen units from raw dset coin <- Screen(coin, dset = "Raw", unit_screen = "byNA", dat_thresh = 0.85, write_to = "Filtered_85pc") # some details about the coin by calling its print method coin ``` The printed summary shows that the new data set only has 48 units, compared to the raw data set with 51. We can find which units were filtered because this is stored in the coin's "Analysis" sub-list: ```{r} coin$Analysis$Filtered_85pc$RemovedUnits ``` The Analysis sub-list also contains the data availability table that is output by `Screen()`. As with the data frame method, we can also choose to screen units by presence of zeroes, or a combination of zeroes and missing values. # Purses For completion we also demonstrate the purse method. Like most purse methods, this is simply applying the coin method to each coin in the purse, without any special features. Here, we perform the same example as in the coin section, but on a purse of coins: ```{r} # build example purse purse <- build_example_purse(up_to = "new_coin", quietly = TRUE) # screen units in all coins to 85% data availability purse <- Screen(purse, dset = "Raw", unit_screen = "byNA", dat_thresh = 0.85, write_to = "Filtered_85pc") ```