A cautionary tale: Don't knowingly omit data

Jun 02 2011 Published by under [Education&Careers]

There can be a lot of gray areas in research, where decisions can be made that seem okay at the time, but in retrospect look pretty questionable. One of those cases is the act of just leaving available data out of an analysis. I'm not talking about leaving outliers out of an analysis or getting a little over-zealous in reducing "noise" in the dataset. No, I mean "forgetting" to include relevant data that are publicly available.

In a lot of fields there are data repositories of some kind, which house published data. Certainly GenBank is one of the biggest examples, but many others are out there. But if there happens to be such a database available to you, it is essential that you comb those data for something that might directly impact your findings - especially if you are aware those exact data exist because you cite the paper they come from in the text of your manuscript.

Sure, you can leave the data out, write your paper, submit it to a small journal and hope that your transgression goes unnoticed. But sometimes editors chose reviewers by the data you have in your analyses and that manuscript lands on the desk of the person whose data you "forgot".

This is not good for you, my friend.

It suggests that either you are unaware of the recent literature with direct implications for your own work or that you left the data out because it messes up the story you want to tell. Not exactly a good impression on the reviewer or the editor. Not only will the reviewer take a steaming hot dump on your manuscript, but if your first submit was to a small journal in a small field and you just pulled that kind of shit, pretty soon you'll be submitting it as a note to the Journal of Plow Science.

Sometimes the story is messy and you can't always trust data in public repositories, but you can't ignore it. Especially. If. You. Cite. The. Paper. At best it makes you look like you don't know what you're doing, and at worst, like you're trying to pull a fast one.

6 responses so far

  • DrugMonkey says:

    How is this any different from failing to cite a paper that may "complicate" the story you wish to tell? Or from mischaracterizing the data in such a paper? Doesn'tthis happen all the time?

  • Ragamuffin says:

    What seems to be most common in my field is that labs will omit some of their own findings which don't compliment the rest of the story. People tend to make decent arguments in rebuttal letters if the hole(s) is blatant and work is "rejected pending revisions", because I see these types of papers published all the time. This is why so many secrets of the trade are known only between collaborators, which I've always thought was kind of unfair to the scientific community. The data are the data, and a story will rise from them whatever they are; publish everything you've got that isn't extraneous so that we can all move forward.

  • GMP says:

    My eyes, they burn -- there's a humongous *cuationary* typo in the title ...
    Sorry, I couldn't help it. ' Tis all.

  • proflikesubstance says:

    DM - It may not be. I suppose the difference in my mind is that when one has access to raw data it can be incorporated into the analysis and treated exactly the same as the new data, giving it more power. Data from a different study with different methods could plausibly, in some cases, be discounted based on those differences.

    Ragamuffin - That's a different problem altogether and extremely shady, IMO.

  • namnezia says:

    Cautionary because... it happened to you?

  • proflikesubstance says:

    Cautionary because... it happened to you?

    If it did, you can be damn sure it wasn't from the perspective of the manuscript writer.

Leave a Reply