Home > Uncategorized > Wading Through the Golden Grapes

Wading Through the Golden Grapes

This is a further post to Purulent Seas.

If you look at the full data set. It is clear that the number of infections relate to the number of patient days. The following present naive linear, quadratic and exponential models with 95% confidence prediction intervals. The adjusted R^2 are: 0.82, 0.83 and 0.71 respectively. There can be no dispute with the positive relationship. Ranking is a more difficult story.

Linear Model

Linear Model

Quadratic Model

Quadratic Model

Exponential Model

Exponential Model

If the process is modeled as an underlying  Poisson distribution (correcting for peer group size)  and hospital peer group as an independent explanatory variable then the following emerges:

Generalized linear  model:  Poisson

Generalized linear model: Poisson

If we model each peer group as  Poisson distributed  with estimate of rate per 10000 based on aggregate  in peer group.


Simulations (1000)  of  peer group provides insight likelihood of observed rates for peer group:

Histograms of simulated peer groups (1000 in each group)

Histograms of simulated peer groups (1000 in each group)

The probability of the observed data given the model can be used to rank. In the graphic the  three lowest ranked are displayed.


This “ranking” hides a number of issues. The Poisson distribution is discrete and the metric used here to rank is the same for many values.

If tied values are labeled as one:

"Ranking": ties accounted for

“Ranking”: ties accounted for


Further, the peer groups are relevant. The following unpacks the peer groups and tied metrics are given the same rank. The outlier points still ‘pop out’.

"Rankings" by peer group.

“Rankings” by peer group.

I have made these posts to motivate deeper consideration of data, particularly to argue against facile conclusions based on point estimates. There are some simple (expected) findings: larger volumes are associated with high numbers of infections; hospitals with more vulnerable patients have high rates of infection.  Ranking is a more complex exercise.  I am merely advocating a deep and robust thoughtful assessment of data to generate testable hypotheses rather than reflexive responses. I am not making any conclusions. I do not have all the data, full understanding of the methodologies and particularly the limitations, esp in data collection (and any potential selection biases). I merely present a number of ways (not that I am agreeing with any of them) to look at what was publicly available (to discourage naive pejorative but stimulating headlines that may influence those in power).

Categories: Uncategorized
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: