fullmatch {optmatch} R Documentation

## Optimal full matching

### Description

Given two groups, such as a treatment and a control group, and a treatment-by-control discrepancy matrix indicating desirability and permissibility of potential matches, create optimal full matches of members of the groups. Optionally, incorporate restrictions on matched sets' ratios of treatment to control units.

### Usage

```fullmatch(distance, min.controls = 0, max.controls = Inf,
omit.fraction = NULL, tol = 0.001, subclass.indices = NULL)
```

### Arguments

 `distance` A matrix of nonnegative discrepancies, each indicating the permissibility and desirability of matching the unit corresponding to its row (a 'treatment') to the unit corresponding to its column (a 'control'); or a list of such matrices. Finite discrepancies indicate permissible matches, with smaller discrepancies indicating more desirable matches. Matrix `distance`, or the matrix elements of `distance`, must have row and column names. `min.controls` The minimum ratio of controls to treatments that is to be permitted within a matched set: should be nonnegative and finite. If `min.controls` is not a whole number, the reciprocal of a whole number, or zero, then it is rounded down to the nearest whole number or reciprocal of a whole number. When `distance` is a list of matrices (or `subclass.indices` is given), `min.controls` may be a named numeric vector separately specifying the minimum permissible ratio of controls to treatments for each subclass. The names of this vector should include names of all matrices in the list `distance`. `max.controls` The maximum ratio of controls to treatments that is to be permitted within a matched set: should be positive and numeric. If `max.controls` is not a whole number, the reciprocal of a whole number, or `Inf`, then it is rounded up to the nearest whole number or reciprocal of a whole number. When `distance` is a list of matrices (or `subclass.indices` is given), `max.controls` may be a named numeric vector separately specifying the maximum permissible ratio of controls to treatments in each subclass. `omit.fraction` Optionally, specify what fraction of controls or treated subjects are to be rejected. If `omit.fraction` is a positive fraction less than one, then `fullmatch` leaves up to that fraction of the control reservoir unmatched. If `omit.fraction` is a negative number greater than -1, then `fullmatch` leaves up to |`omit.fraction`| of the treated group unmatched. Positive values are only accepted if `max.controls` >= 1; negative values, only if `min.controls` <= 1. If `omit.fraction` is not specified, then only those treated and control subjects without permissible matches among the control and treated subjects, respectively, are omitted. When `distance` is a list of matrices (or `subclass.indices` has been given), `omit.fraction` specifies the fraction of controls to be rejected in each subproblem, a parameter that can be made to differ by subclass by setting `omit.fraction` equal to a named numeric vector of fractions. `tol` Because of internal rounding, `fullmatch` may solve a slightly different matching problem than the one specified, in which the match generated by `fullmatch` may not coincide with an optimal solution of the specified problem. `tol` times the number of subjects to be matched specifies the extent to which `fullmatch`'s output is permitted to differ from an optimal solution to the original problem, as measured by the sum of discrepancies for all treatments and controls placed into the same matched sets. `subclass.indices` An old argument included for back-compatibility; no longer needed.

### Details

Consider using `makedist` to generate the distances, particularly on large problems.

If `distance` is a list of matrices, each matrix is treated as a separate matching problem unto itself. In this case, one has to give names to the matrices in order to specify arguments `min.controls` or `max.controls`.

`fullmatch` tries to guess the order in which units would have been given in a data frame, and to order the factor that it returns accordingly. If the dimnames of `distance`, or the matrices it lists, are not simply row numbers of the data frame you're working with, then you should compare the names of fullmatch's output to your row names in order to be sure things are in the proper order. You can relieve yourself of these worries by using `makedist` to produce the distances, as it passes the ordering of units to `fullmatch`, which then uses it to order its outputs.

The value of `tol` can have a substantial effect on computation time; with smaller values, computation takes longer. Not every tolerance can be met, and how small a tolerance is too small varies with the machine and with the details of the problem. If `fullmatch` can't guarantee that the tolerance is as small as the given value of argument `tol`, then matching proceeds but a warning is issued.

### Value

Primarily, a named vector of class ```c('optmatch', 'factor')```. Elements of this vector correspond to members of the treatment and control groups in reference to which the matching problem was posed, and are named accordingly; the names are taken from the row and column names of `distance`. Each element of the vector is the concatenation of: (i) a character abbreviation of `subclass.indices`, if that argument was given, or the string '`m`' if it was not; (ii) the string `.`; and (iii) a nonnegative integer or the string `NA`. In this last place, positive whole numbers indicate placement of the unit into a matched set, zero indicates a unit that was not matched, and `NA` indicates that all or part of the matching problem given to `fullmatch` was found to be infeasible.
In some cases, only proper subsets of the initial treatment and/or control groups will be represented in the value of `fullmatch`. Whether this occurs is determined by the status of argument `subclass.indices`. If `subclass.indices` is null, then all elements of the treatment and control groups, i.e. the rows and columns of `distance`, are represented in the value of `fullmatch`. Otherwise, the vector has an element for each unit represented in `subclass.indices`, i.e. for each element of factor `subclass.indices` or for each row of data frame `subclass.indices`.
Secondarily, `fullmatch` returns various data about the matching process and its result, stored as attributes of the named vector which is its primary output. In particular, the `exceedances` attribute gives upper bounds, not necessarily sharp, for the amount by which the sum of distances between matched units in the result of `fullmatch` exceeds the least possible sum of distances between matched units in a feasible solution to the matching problem given to `fullmatch`. Such a bound is also printed by `print.optmatch`.

### Note

`fullmatch` is based on an algorithm developed by Stephanie Olsen Klopfer. Her algorithm translates full matching problems into network flow problems; in the present implementation, the latter are handled by Bertsekas and Tseng's RELAX-IV codes.

Ben Hansen

### References

Hansen, B.B. (2004), ‘Full Matching in an Observational Study of Coaching for the {SAT}’, Journal of the American Statistical Association, 99, 609–618. (Cf. especially the Appendix.)

Rosenbaum, P. (1991), ‘A Characterization of Optimal Designs for Observational Studies’, Journal of the Royal Statistical Society, Series B, 53, 597–610.

`matched`, `makedist`

### Examples

```data(plantdist)
plantsfm <- fullmatch(plantdist) # A full match with unrestricted
# treatment-control balance
pr <- logical(26)
pr[match(dimnames(plantdist)[[1]], names(plantsfm))] <- TRUE

table(plantsfm,                        # treatment-control balance,
ifelse(pr,'treated', 'control'))       # by matched set

tapply(names(plantsfm),                # largest treatment-control
plantsfm, FUN= function(x, dmat) {     # distances, by matched set
max(
dmat[match(x, dimnames(dmat)[[1]]),
match(x, dimnames(dmat)[[2]])],
na.rm=TRUE )
}, dmat=plantdist)

plantsfm1 <- fullmatch(plantdist, # A full match with
min.controls=2, max.controls=3)   # restrictions on matched sets'
# treatment-control balance

table(plantsfm1,                  # treatment-control balance is
ifelse(pr,'treated','control'))   # improved by restrictions

tapply(names(plantsfm1),                # but distances between
plantsfm1, FUN= function(x, dmat) {     # matched units increase
max(                                    # slightly
dmat[match(x, dimnames(dmat)[[1]]),
match(x, dimnames(dmat)[[2]])],
na.rm=TRUE )
}, dmat=plantdist)
```

[Package optmatch version 0.2-3 Index]