I agree with AbideWithMe. I’m not seeing any obvious bias or any methodological flaws.
Whether there is bias or misrepresentation or not, the experimental method used is seriously flawed. They made certain assumptions during the collection of their data that really influence the results. Primarily, the results they plotted assumed that all churches respond to inquiries. Only sending a single inquiry to a single church you cannot infer that the church was biased in any way. They simply could be very poor at responding to emails or emails go to a black hole.
I’m not sure how it is you came to these conclusions given that the article mentions why fleshing out a conclusion from a single datum (one email to one church) wouldn’t suffice, and they even mention the very alternative explanation for a lack of response that you mentioned:
Here’s the logic of our study. Each church would receive one email.
Some churches would reply and some would not. We couldn’t know if the actions of any single church were discriminatory. For example, suppose Jose Hernandez sent a letter to St. Theodore’s in Des Moines and a church staff member didn’t reply. What would this mean? It could mean that St. Theodore’s had a really busy week launching their mixed martial arts–themed children’s ministry, and they forgot to write back. Or maybe they never reply to anyone. Or maybe they were put off by Jose’s apparent Hispanic ethnicity. We don’t know.
So no single email response tells us if a particular church harbors implicit racial bias. Where we can detect churches’ bias is in the aggregate. Since each church got the same letter at about the same time, if churches replied differently to different names, it had to be due to the names themselves. And if Jong Soo Kim’s letters received significantly fewer replies than Greg Murphy’s, the disparity reveals a difference in the status conferred upon them.
This is how
all experiments work, no matter what field of study you’re in. Individual data points mean nothing on their own. It’s the
aggregate that matters. When you send out over 3000 emails to churches all across the country, representing every congressional district,
that were randomly selected, then the
gross differences in response can indeed be chalked up to the lone independent variable: the ethnicity of the name of the email sender. This is why the researchers can be confident that Mainline Protestant churches in America are 15% less likely to respond to an email from a Hispanic person than a White person (see the first bar graph in the article)
due to them being Hispanic, and not due, say, to carelessness, forgetfulness, or some other reason. Given the large sample size the effects of such mitigating factors would be normalized over all sets.
They wanted a quick result and took some shortcuts to achieve it. A valid sample would have included emails from multiple “races” to the same church. You really cannot assume just because a church didn’t respond to a particular email that it indicates any sort of preference. I don’t deny that preferences don’t exist but I think the experiment leads to inaccurate conclusions.
This is absolutely how
not to perform an experiment of any kind. The race differential among the emails is the
independent variable. It’s the one and only one thing that is
supposed to be different for each subject. Sending each and every church an email from each of the represented races would mean that the race differential is
no longer variable. The whole purpose of experimentation is to see how some expected result (the dependent variable) varies as the independent variable varies.
Using your logic in biomedical research (which I’m more familiar with) you’d have us give both a placebo and the drug of interest to all patients. This would obviously give us useless data since we’d have no way of knowing whether the experimental drug had any effect. If I’m testing the efficacy of three different new cancer drugs against a placebo,
and relative to one another, I wouldn’t give all four pills to every patient in the study. Similarly you wouldn’t send an email from each of the three “experimental races” (Black, Hispanic, and Asian) along with an email from the “control race” (White) to each and every church.
By doing so you’ve inadvertently introduced another variable to the experiment: sequence. What if the church secretary only has time to answer one or two emails and doesn’t get around to the other three or four? What if the church secretary grows suspicious by the fact that she has just received four seemingly identical requests in tandem when it’s quite possible this church rarely if ever gets such requests by email? This would be an even worse result since the test subject has been unblinded and you’d no longer see the effects of
implicit racial bias, which if you recall is the very object of this experiment!
Furthermore, even if a particular church responds to some but not all of the emails and you can rule out any effect due to sequence since the first and last emails were responded to but one or both of the middle two were not responded to, the best we’d be able to conclude from this is that there’s evidence of
explicit racial bias, which isn’t the objective of the experiment.
Take this lesson to your stats students: We don’t control for extraneous variables by administering every single one of the independent variables to each and every one of the test subjects. Doing so would render the independent variables
nonvariable! Instead we make sure that our sample is sufficiently large, and that the administration of the independent variables are sufficiently randomized among all test subjects.