There can be a lot of gray areas in research, where decisions can be made that seem okay at the time, but in retrospect look pretty questionable. One of those cases is the act of just leaving available data out of an analysis. I'm not talking about leaving outliers out of an analysis or getting a little over-zealous in reducing "noise" in the dataset. No, I mean "forgetting" to include relevant data that are publicly available.
In a lot of fields there are data repositories of some kind, which house published data. Certainly GenBank is one of the biggest examples, but many others are out there. But if there happens to be such a database available to you, it is essential that you comb those data for something that might directly impact your findings - especially if you are aware those exact data exist because you cite the paper they come from in the text of your manuscript.
Sure, you can leave the data out, write your paper, submit it to a small journal and hope that your transgression goes unnoticed. But sometimes editors chose reviewers by the data you have in your analyses and that manuscript lands on the desk of the person whose data you "forgot".
This is not good for you, my friend.
It suggests that either you are unaware of the recent literature with direct implications for your own work or that you left the data out because it messes up the story you want to tell. Not exactly a good impression on the reviewer or the editor. Not only will the reviewer take a steaming hot dump on your manuscript, but if your first submit was to a small journal in a small field and you just pulled that kind of shit, pretty soon you'll be submitting it as a note to the Journal of Plow Science.
Sometimes the story is messy and you can't always trust data in public repositories, but you can't ignore it. Especially. If. You. Cite. The. Paper. At best it makes you look like you don't know what you're doing, and at worst, like you're trying to pull a fast one.