Data management is one of the most important skills we learn as researchers. Without it we are well and truly screwed and can spend days slogging through a mess of our own making or have to re-do experiments altogether. But other than training I have seen on either databasing programs or as a small section of course in Unix, where does training in good data management come from?
In my own experience data management was something learned on the job that became more complex as the amount and breadth of data grew. Often this was a gradual process that allowed for the trainee to scale up from a base and each person came up with a system that worked well for them. An obvious issue with this is that multiple systems in a lab can become problematic for data sharing, but as long as the final product is in a "sharable" format this is not a major problem in most circumstances.
In my field however, things have changed. Yes, my students no longer have to walk to work uphill both ways in the snow like I did and nothing costs a nickel anymore. Now, the sheer volume of data we produce for a project is enormous compared with the data I dealt with as a student, so the gradual building of a database is not so gradual these days. Data management is more critical than ever, but it never occurred to me until recently that simply saying "Make sure you keep your datasets organized and well labeled." about 100 times isn't enough.
So, as part of my supervisory tasks in the coming year I'm going to start sitting down with all my students to go over their data management and ensure they have a good system in place. It may take me some time, but ultimately it'll save a lot of person-hours down the road.
I'm sure others have dealt with this in the past as either the teacher or the trainee. Any particularly effective strategies? I realize that the type of data matters, but I think it's worth discussing.