**Contents:**

Meta analysis is a type of research in which the data being analyzed are obtained from previous studies. These studies, taken individually, may be somewhat ambiguous. The goal of meta analysis is to synthesize them so as to arrive at a more definitive result. Sophisticated statistical methods have been developed for conducting meta analyses, but a simple meta analysis can be carried out using elementary statistical techniques. Meta analyses usually involve a substantial amount of data management, in addition to the statistical analysis.

We will illustrate a meta analysis using the data from the article A Meta-Analysis of Probiotic Efficacy for Gastrointestinal Diseases, which appeared in the journal PLoS ONE in 2012. This meta analysis considered the effectiveness of probiotic preparations for preventing or treating gastrointestinal (GI) diseases. The authors identified 84 published studies addressing this question. Each of these studies followed a treatment group (who took the probiotic), and a control group (who did not). The number and proportion of GI disease events in each of the two groups was used to judge the effectiveness of the probiotic treatment. To the extent that the rate of disease incidence in the probiotic treatment group is lower than in the control group, this suggests that taking the probiotic protects subjects from disease.

Besides the crude disease rate, each study in the meta-analysis was also characterized in terms of other factors, including the specific GI disease that was tracked, the specific probiotic type that was given, the year of publication, and the probiotic dose. A meta analysis typically looks at the average effectiveness of the treatment over all studies, and also within certain subsets of studies. The range of treatment effectiveness over the studies is also of interest.

The data are available from the journal's web site as an Excel spreadsheet. This spreadsheet can be converted to a csv file (using Excel, Google Docs, Libre Office, etc.). Here we will use Python and NumPy to read the data, but it would also be possible to read these data using Pandas, which would likely require fewer lines of code. It is also possible to read the data directly into Python using the openpyxl library, but we will not cover that here.

The data have mixed types (numbers and text). So first we read everything as strings (text) into a list of lists. Later we will convert certain columns to numeric values as needed.

```
import numpy as np
import csv
import os
fname = "journal.pone.0034938.s001 - Supplementary material.csv"
with open(os.path.join("../Data", fname)) as fid:
R = csv.reader(fid)
H = R.next() ## The column titles
H = [x.strip() for x in H] ## Clean up the titles
D = [x for x in R] ## The data (a list of lists)
```

Next, we create NumPy arrays of the numeric values that determine the event rates.

```
## The number of events in the probiotic arm
ii = H.index("Event (probiotic)")
PE = [float(x[ii]) for x in D]
PE = np.array(PE)
## The sample size in the probiotic arm
ii = H.index("Group Size (probiotic)")
PN = [float(x[ii]) for x in D]
PN = np.array(PN)
## The number of events in the control arm
ii = H.index("Event (control)")
CE = [float(x[ii]) for x in D]
CE = np.array(CE)
## The sample size in the control arm
ii = H.index("Group size (control)")
CN = [float(x[ii]) for x in D]
CN = np.array(CN)
```

This can be accomplished more compactly as follows.

```
VN = ["Event (probiotic)", "Group Size (probiotic)",
"Event (control)", "Group size (control)"]
Q = []
for v in VN:
ii = H.index(v)
U = np.array([float(x[ii]) for x in D])
Q.append(U)
PE,PN,CE,CN = tuple(Q)
```

The event rates are proportions, and the standard error of a sample
proportion can be estimated using the fact that the variance of a
binomial proportion with success probability *p* and sample size *n*
is *p*(1-p)/n*.

```
## Event rates in the two groups
PR = PE / PN
CR = CE / CN
## Standard errors of event rates
PSD = np.sqrt(PR*(1-PR) / PN)
CSD = np.sqrt(CR*(1-CR) / CN)
## Standard error of the difference between event rates in the
## treatment and control arms
SED = np.sqrt(PSD**2 + CSD**2)
```

We can now check how well the treatment is performing. We see that in around 85% of the studies, the event rate is higher in the control arm than in the treatment arm.

```
In [17]: np.mean(CR > PR)
Out[17]: 0.84523809523809523
```

We can also see that a typical study has a standard error around 0.06-0.07 for the event rate within each of the two arms, and the standard error of the difference is around 0.1.

```
In [18]: np.mean(PSD)
Out[18]: 0.060905295362373935
In [19]: np.mean(CSD)
Out[19]: 0.071003924046456168
In [20]: np.mean(SED)
Out[20]: 0.095087329115036853
```

Next we can look at the treatment effects in terms of the individual studies. First we form a Z-score for the difference in event rates between the treatment and control arms.

```
In [21]: Z = (PR - CR) / SED
```

A Z-score exceeding 2 in magnitude is usually considered as indicating that an effect might be different from zero. Around half of these 84 studies have such a Z-score, and all studies with Z-scores exceeding 2 in magnitude have lower event rates in the treatment arm.

```
In [22]: np.mean(np.abs(Z) > 2)
Out[22]: 0.42857142857142855
In [23]: np.sign(Z[np.abs(Z) > 2])
Out[23]:
array([-1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
-1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
-1., -1., -1., -1., -1., -1., -1., -1., -1., -1.])
```

A typical individual study has only marginal evidence for an effect.

```
In [24]: np.mean(Z)
Out[24]: -1.7715397180895727
In [25]: np.mean(np.abs(Z))
Out[25]: 1.968623497123593
```

The effects also appear to be heterogeneous. This means that the estimated treatment effects are so different from each other that something other than sampling variation must be influencing their values. If the studies are perfectly homogeneous, the Z-scores for the treatment effects should have standard deviation 1 across the studies. The greater this standard deviation, the greater the evidence for heterogeneity.

```
In [26]: Z.std()
Out[26]: 1.9453720802552095
```

One way to summarize the treatment effects across the studies is to use the inverse variance weighted average of the within-study treatment effect estimates.

```
## Inverse variance weights
W = 1/SED**2
W /= W.sum()
## The inverse variance weighted average treatment effect
AVG = np.sum(W * (PR - CR))
## The standard error of the average treatment effect
SE_AVG = np.sqrt(np.sum(W**2 * SED**2))
## 95% confidence interval for the average treatment effect
LCB = AVG - 2*SE_AVG
UCB = AVG + 2*SE_AVG
```

This analysis shows that the rate of GI disease is about 8 percentage points lower in the treatment arm. This average effect is very precisely estimated, with a standard error around 0.006.

```
In [29]: AVG
Out[29]: -0.079022268154595926
In [30]: SE_AVG
Out[30]: 0.0057189326751776563
In [31]: LCB,UCB
Out[31]: (-0.090460133504951246, -0.067584402804240606)
```

We can repeat the analysis of the average treatment effect on subgroups of the studies. For example, different doses of probiotic were used in different studies. The doses were coded as 1, 2, 3, and 4. Some of the studies do not report dose information, so we need to be prepared for that.

```
def float1(x):
"""
Convert a string x to a float. If the conversion fails,
return nan.
"""
try:
return float(x)
except:
return float("nan")
## Get the dose information
ii = H.index("Dose")
U = np.array([float1(x[ii]) for x in D])
## Estimate the overall treatment effect for each dose level
for j in 1,2,3,4:
## The indices of the studies with dose level j
ix = np.flatnonzero(U == j)
## Restrict the data to the studies with dose level j
PR1 = PR[ix]
CR1 = CR[ix]
SED1 = SED[ix]
## Inverse variance weights
W = 1/SED1**2
W /= W.sum()
## The inverse variance weighted average treatment effect
AVG = np.sum(W * (PR1 - CR1))
## The standard error of the average treatment effect
SE_AVG = np.sqrt(np.sum(W**2 * SED1**2))
## 95% confidence interval for the average treatment effect
LCB = AVG - 2*SE_AVG
UCB = AVG + 2*SE_AVG
## Z-scores for the average treatment effect at a dose level
Z = (PR1 - CR1) / SED1
print "%4d %4d %5.2f (%5.2f,%5.2f) %5.2f" %\
(j, len(ix), AVG, LCB, UCB, Z.std())
```

Here is the result of running this analysis.

```
In [30]: run probiotic_meta_analysis.py
1 20 -0.14 (-0.17,-0.11) 1.50
2 25 -0.05 (-0.06,-0.03) 1.17
3 12 -0.11 (-0.15,-0.07) 2.01
4 7 -0.07 (-0.13,-0.00) 1.57
```

We can do several subgroup analyses together without bloating the code by using a loop over the variables that can be used to define subgroups.

```
## Variables to subset on
VN = ['Patients Disease', 'Probiotic Species', 'Probiotic (single or group)',
'Dose', 'Treatment lengths', 'Age group']
print "\n"
for vn in VN:
ii = H.index(vn)
## These are the values of the variable we will subset on. Note
## that trailing whitespace is common, so we use 'strip' to remove
## it. If we didn't do this, 'children' and 'children ' would
## belong to different groups.
U = np.array([x[ii].strip() for x in D])
## The unique values of the grouping variable
S = list(set(U))
S.sort()
## Estimate the overall treatment effect for each subgroup
for s in S:
## The indices of the studies in the current subgroup
ix = np.flatnonzero(U == s)
## Don't analyze very small subgroups
if len(ix) < 5:
continue
## Restrict the data to the studies in the subgroup
PR1 = PR[ix]
CR1 = CR[ix]
SED1 = SED[ix]
## Inverse variance weights
W = 1/SED1**2
W /= W.sum()
## The inverse variance weighted average treatment effect
AVG = np.sum(W * (PR1 - CR1))
## The standard error of the average treatment effect
SE_AVG = np.sqrt(np.sum(W**2 * SED1**2))
## 95% confidence interval for the average treatment effect
LCB = AVG - 2*SE_AVG
UCB = AVG + 2*SE_AVG
## Z-scores for the average treatment effect at a dose level
Z = (PR1 - CR1) / SED1
print "%-30s %4d %5.2f (%5.2f,%5.2f) %5.2f" %\
("%s=%s" % (vn, s), len(ix), AVG, LCB, UCB, Z.std())
```

Here is part of the output from running this part of the script.

```
Treatment lengths=-- 9 -0.03 (-0.05,-0.01) 1.33
Treatment lengths=1-2 weeks 30 -0.11 (-0.14,-0.09) 1.57
Treatment lengths=3-4 weeks 21 -0.08 (-0.11,-0.04) 1.08
Treatment lengths=5-8 weeks 17 -0.09 (-0.12,-0.07) 1.71
Treatment lengths=>2 months 7 -0.34 (-0.40,-0.29) 3.28
Age group=-- 6 -0.12 (-0.16,-0.07) 1.82
Age group=adults 53 -0.12 (-0.14,-0.10) 2.10
Age group=children 14 -0.18 (-0.21,-0.14) 1.45
Age group=infants 8 -0.03 (-0.05,-0.01) 0.98
```