I was recently talking to a colleague who is in a field of genomics and we got on the topic of data release policies and I learned something interesting that I didn't know the sharing of genomic data: almost all major genomics centers are going to a zero-embargo data release policy. Essentially, once the sequencing is done and the annotation has been run, the data is on the web in a searchable and downloadable format.
How many other fields put their data directly on the web before those who produced it have the opportunity to analyze it? Now, obviously no one is going to yank a genome paper right out from under the group working on it, but what about comparative studies? What about searching out specific genes for multi-gene phylogenetics? Where is the line for what is permissible to use before the genome is published? How much of a grace period do people get with data that has gone public, but that they* paid for?
It seems to me this is a very slippery slope because every genome paper has a different focus and it is no longer Glamour Mag worthy to just describe the genome of an organism. There has to be a hook and that hook is almost always related to the interesting biology of an organism or to resolution of a broader long-standing question based on the new data from the genome. However, these are the very things that people who are not part of the genome project would be interested in once the data are released.
The colleague I was talking to had the opinion that the (in her mind) small risks on someone scooping a major theme of the resulting paper were small compared to the benefit of the data to the community, fresh off the machine. However, she is a tenured prof with an impressive CV and a name that might scare off the vultures and I wondered whether she would have the same opinion if she was untenured.
Having my data pitched onto the internet the second I had it in my own hands would make me exceedingly nervous, even if my data were on the scale of a full genome. Stories of unscrupulous researchers more than willing to snap up any data they can find abound and I have seen blatant cases of it myself. Is the genomics community and anyone who can benefit from their data just that much more principled? Somehow I find that a hard sell. And how does one make a complaint about someone else publishing your* data if it is sitting in a public database?
I will be interested to see whether there are any high-profile dust-ups over this in the near future or whether a genome really is big enough for the whole community.
*Obviously we are talking about grant-funded projects, so the money is tax payer money not any one person's. Nevertheless, someone came up with the idea and got it funded, so there is some ownership there.