Thursday, February 07, 2008

The Role of Blogging in Open Notebook Science

Transcript of "The Role of Blogging in Open Notebook Science" at the North Carolina Science Blogging Conference January 2008.

Jean-Claude Bradley: OK. So I'd like to talk to you about what we're doing in my lab at Drexel University in recording the raw data from the experiments that my students are performing.

I'll be referring to this term "open notebook science." What that means is basically everything that we do is put up in real time -- and we use a number of tools to do that, wikis are one but there's a number of other ones -- and I'll show you how all of that is sort of interconnected.

So just one slide to give you the big picture of why we're doing this. If you notice, in the past couple of years, there's been a trend from closed information sources to open information sources. Your traditional closed research would be a traditional lab notebook.

Basically, no one ever gets to see that. It's not published after the student leaves, all the field experiments are not usually reported to anyone; so it basically stays there -- if part of it never gets converted to a traditional journal article.

Of course, one level off from that is the traditional journal article. That's still not completely open because you have all the failed experiments. You also have selected data sets that the researcher put together to write this specific article to make a point.

The problem here in terms of the closed/open is that, of course, the traditional journal articles are not free, typically. We discussed that extensively this morning -- how open access journal articles are even more open.

But I'd like to focus here this afternoon on this open notebook science concept, which is where we're aiming for full transparency in the research process.

So how does a researcher use Web 2.0 tools to also do science? Well, I will basically start with the blog aspect -- because this is the Scientific Blogging Conference after all -- and I will show you how the rest of it connects with the blogging. We will see wikis, Google Docs and [indecipherable].

What I tried to do is look, for the past couple of months, at the things that I'd actually recorded in my blog, to show you what it is that I'm thinking about and what's relevant to my lab. So of course, recently I'm looking for funding, so I put this blog post and uploaded the proposal up on "Nature Proceedings" which is a fantastic source if you haven't come across it to basically pre-publish pretty much any document that you want.

Another thing that's happened here is I supported another researcher in his efforts for getting funding -- this is actually Cameron Neylon, I'll show you his blog a little bit later -- and he's trying to get some funding to have people travel to talk about open science.

But it's kind of neat -- the way that we all have these blogs and we can support each other, really with minimal efforts.

Some other things that I blog about is if we get some media coverage, or there's somebody who talks about Science 2.0 stuff. So there's a recent "Scientific American" article which is really interesting.

The other thing that I can blog about is when we get referred to from the peer review literature. I see basically the Peer Review Literature being paralleled with this open source initiatives but there's going to be a lot of cross-talk. So I try to make a point of it, that when I do see that cross talk...

And of course how do I know? Because I see all of a sudden the link someone has referred to me -- through, in this case, Chemistry Central. So I think that that's going to be more and more important as we move on.

Man: I have a question. This goes back to where [indecipherable] back from one source to another source. In your experience, [indecipherable] people linking to you and finding those links and what they've taken so that the link connectivity between blogs and other sources [indecipherable].

Jean-Claude: OK. I think that the idea to try to quantify the importance or the relevance of a particular post through citation I think is very misleading. I think though what you can say is, "will you find a link somewhere in your blog, somewhere in your wiki that leads you to meet somebody?" I think that's pretty solid.

Man: [indecipherable] finding someone.

Jean-Claude: Then here's a case right here. Gus Rosania, he's a faculty who basically went on my blog and was interested in open notebook science and thought, "Hey, I should put my lab online!"

He has, in fact. In the past few weeks he's created wikis; all of the students are uploading stuff. So that's exactly how it happens, and it only takes one event for that to actually transpire.

So people ask me, "Why do you bother doing all this?" The bottom line is it produces results. And the main result is collaborators. I managed to find some people who are willing to work openly as well. There are not a lot of people who are willing to do that, but there's enough that we can actually do constructive science.

We have an x-ray crystallographer, Matthew Zeller; he has a nice crystal structure he did for [indecipherable].

It's also a way for me just to announce all these collaborative. Phil Rosenthal is a researcher who's [indecipherable] enzyme. I work in malaria, trying to make anti-malaria compounds, and basically he is willing to test our compounds to tell us if they're active or not. So that's how his collaborations actually happen.

What else do I use my blog for? To discuss presentations in real life. So I'm talking here obviously. And as will most of you, we're going to blog about this event. I'm also recording it and I will put it on this separate blog here that has links to the PowerPoints and the recordings and all of that.

The other thing that I discuss is presentations that are not in real life. I think I recognize a few of you who were at this Sci-Foo Lives On meeting. This was on Second Life, and there was a session in October on science blogging.

I don't have a lot of time here. But just to give you a rough idea, Second Life is a virtual world where you have an avatar. This is me here with the blue cap, and some of you probably recognize yourselves here. So we got posters here, and we've been talking about open science, and all of the transcripts for these meetings are actually on a wiki. So if you're interested in looking it up, I can point you to that.

What else can I do as a scientist? I can basically showcase science in new media. Here's an example of... I was talking to you about the enzymes that we're trying to inhibit with our anti-malarial work. This is Peter Miller's work, this is an enzyme represented in Second Life, and the enzyme is much bigger than you are. You can climb it, you can sit on it, go can go through the receptor site, you can do all kinds of things that you can't do with a lot of other software. So when I see stuff like that I'd like to blog about it.

In Chemistry last term I had my students do so much credit work in Second Life, so this is one of my students sitting on the molecule of camphor and you'd become the molecule and actually fly around. I'm actually also here flying one. So basically, this is not enough for me to write a full paper and send it out to tell the world. This is a blog post, "Hey, this is what I do with my students; maybe you want to do something similar. If so, here's the information to do it and contact me." So that's really what a blog can be use for.

Other things I'd never even anticipated, people contact me because they know I work in malaria and here's a Run for Malaria in Philly where it was very simple for me to simply put that announcement on my blog and maybe they put a track to put a bit more interest. Sometimes I like to talk about science's philosophy, so yes, I do talk about my actual research in malaria. But I also think that some of the things that I'm doing have broader applications and I do like to talk about that occasionally as well.

Now this is really the bottom-line as far as I'm concern in terms of doing the science. Here's a post about the fact that we just shift these targets to Phil Rosenthal and it goes into some detail. This is about this molecule that we will make, that's quite beautiful, it crystallizes like a snowflake and [indecipherable] in the flask, and this is all well and good, but the thing is why would you trust me? You have no reason to.

So what we do is we link here to the actual experiment where this compound was made. That's just the link if you click on it; it takes you to the lab notebook page of this particular experiment. So this is an actual lab notebook page and it looks very similar to what on it would on paper. I can't show the whole thing here on PowerPoint because there was not enough room so I'm going to show it in sections.

The top part is the objective, so this tells you what we're doing, this year it tells you why we're doing it, and see you've got a link here to a compound, you've got links to an experimental plan, you've got links to libraries of compounds. So let me take you through each one of those links to show you how much you can draw a [indecipherable] each of this information.

So if I click on that first molecule link, you end up in ChemSpider. ChemSpider is a company that basically--there's about 20 million molecules that they're currently managing. They predict all kinds of properties of that film and what they enable me to do is to upload my molecules there and manage the searches, manage the archiving so I don't have to worry about it.

So most of the things that I'm showing you, if I can at all I'm going to get it off my server, put it into the cloud, have other people worry about it and optimally, do it in a redundant way. So if something goes wrong, I still have access to it. So this is a perfect example, ChemSpider, I can link to it and now if you want to know the molecular weights, whatever, there's also a bunch of information underneath here.

Now, there was one of these little links to an experimental plan, so there are pages on Useful Chem where this is me basically outlining a protocol for my students to follow. Now it may turn out that they won't follow it exactly and that's why they have to recall a log every single time you do it. But this is a way of streamlining that so I can ask them to do something.

Then there was a link to the molecules that we're trying to make. So why are we trying to make them? Because we have a collaborator and in this case, Rajarshi Guha at Indiana and he did docking studies. So he tries to take some of our molecules and fit them in the enzyme of the malarial parasite, and then he told us which molecules are most likely to be active. So he returned the list back to us.

Well, why should we trust him? We have no reason to. So here's the procedure he's using and there's further links you can keep digging down to see the receptor areas that he used, why he used them. Over here, you see these links; these are links to the actual molecules. All right, in some way your Chemistry recognizes your molecule codes. This enable you to represent a molecule and you're just using text, so you don't have to draw it out.

So you can see here, molecule 1, this is what Rajarshi thought was the most useful molecule for us to make. So you can draw down and access the information as deeply as you wish. There's a procedure section which looks like what you would paper. And you can link to our data that there are spectrum. All right, so in Chemistry we take [indecipherable] which is basically a spectrum that helps you identify the molecule and helps you identify your [indecipherable].

All we do is we put a link here and the person really doesn't even need to have a software installed, this is all browser-based. You click this link, it pops up the spectrum and then you can expand any of these areas to look for the fine detail.

That's something that's actually been a little bit annoying if you're trying to repeat something, you may on the supplementary section of the paper actually get--this is a PDF. But it's kind of convenient because if you were to expand it, you'd see that there's a lot more [indecipherable] there than it shows in the full picture. So that finally, when you see this conclusion--that the [indecipherable] product is obtained with 39% yield--you can actually see what evidence I'm bringing forth to make that statement. You can see all the way.

All right. The other thing that we do is we indexed the experiments with the molecules using different kinds of tags. There's something called an InChi which is a way of representing a molecule. There's also something called InChiki which for a number of reasons, I'm going to be switching more and more towards than--I can discuss that with a chemist at some of the time--but the point is that all these tags are indexed in Google, so if somebody is searching for that molecule they will show up the experiments that where we've used it.

OK, comparing experiments. So a bunch of pages is not really that handy to compare experiments. So one of the things that we started to do is to put similar experiments in a master table and this was using Google Docks which you saw earlier on, very convenient. You can share this Google Dock to anybody, it's a spreadsheet, you can collaborate, you can make it public, you can make it not public, you can have some people be able to edit it and some people not.

What we're basically recording here is whether or not we get a precipitate from this reaction. So if we get a precipitate, it means that it's really easy to do the reaction so we're trying to find a reactions where we do get a precipitate and actually, this table is a lot wider than I'm able to show you here. But you can try to find patterns in here and that's another collaboration we're having with people who are trying to plan models to this data to predict which compounds will precipitate. So everything here is up for grabs in terms of contributing to this project.

The most important section is the lab section because if you don't have this, you can't really be completely certain that you really did all the things that you were claiming you did in the other sections. So that's something that I try to really reiterate with my students, have to have this. Now, the lab the way that it looks right here certainly a human being can read this, but if we're going to be doing lots and lots of these experiments in order to be really searchable in an intelligent way, it'd be nice to represent it in a better way as we're mentioning.

So a number of ways, I'll show you two right here, one of them is to basically define a series of actions and to write a script with parameters for each one of these actions. So instead of having all the stuff typed out in a narrative, you basically have a list of actions where you're allowed to add a compound, your allowed to weights, you're allowed to vortex, you're allowed to centrifuge. There are about 10 actions that you can take and these you certainly could have a machine read this and you can have them imported into whatever database that was available. So we can do the same thing but do it in a table format.

OK, so here what we're doing is we're taking the experiment and we're stripping out each individual result. So let's say we did an experiment and after the first day we take a picture. Now, we wait another day and we take another picture. And then on the third day, you drop the vial and you lose the compound. OK, so normally you would just x-ed out that whole experiment. But by recording each individual result, every result that was obtained up to the point of failure is usable if it's represented properly. So that's why we're taking a result-centric view here, and also table content is very familiar than you want [indecipherable] have a lab notebook.

Just a couple of things on why we're using the Wiki to do all these. For one thing, I can just hit the Recent Changes button on the Wiki and I can see exactly who's been contributing, which experiments they've been working on, exactly what they've done, and I can compare any two versions. So if I'm looking at an experiment, these experiments can last for months and months and months as new information comes in and so I can basically compare two versions.

The stuff that's new will appear in green and the stuff that got deleted will appear in red in these version comparisons. So in terms of proving that you that knew something at a specific point in time or you changed your mind into that something, you can prove that in a [indecipherable] usually using a Wiki.

We use other free services like site meter to tell how people are finding these experiments. So I have some idea basically what people are using this information for. A lot of it is for debugging their reactions. They're looking for problems of reactions, stuff that you wouldn't necessarily find in a journal article from a traditional search. Of course, this enables me to tell the story of our failures.

If I were to write a paper that was all about how we failed, it'll be very difficult to get it accepted. But here, I can actually lay this out which I'm writing from a chemist to another chemist and saying, "Look, this is what we're trying to do and it took us these 50 attempts but this is why, this is where we went into problems." We also use mailing lists, that's useful for lots and lots of small activities that you need to go back and forth. You wouldn't use a block for that, you wouldn't necessarily use a Wiki and it's just that other component that works well.

So when you look at the whole story, together basically, we're using all of these tools and my lab is right here in the middle, Drexel [indecipherable] Chemistry Group, so we know how to make compounds. But we need help from people who can do [indecipherable] so who could do theoretical studies to tell us which compounds to make and we definitely need help with testing. So right now, I have Phil Rosenthal who's testing for malaria and National Cancer Institute's Dan Zaharevitz who's testing for anti-tumor activity.

So this was another very nice development, basically Dan was seeing the kinds of things we were doing online and he said, "Why don't you send me a couple of compounds and we'll test them for anti-tumor activity?" So this is what's happening and this is what's possible, and until you actually do it, you won't know what kind of people you actually find that will agree to collaborate with you.

And I guess sort of to lead you with the big picture here, I've shown you mainly--this is about communicating what's done in my lab to a fellow chemist and certainly that's our first priority. But as you see that at the end, we're trying to format these results in such a way that machines will be able to read them. Ultimately, I see we're going to go towards a world where it will be machines talking to machines basically formulating hypothesis, executing experiments, and having other machines analyze them.

I think a way to get to that is to take the lab notebook to go down to the absolute fundamental unit of action in a lab and make sure that we can record those in a way that can be read by anyone who needs that access.

I think that's it. This is a blog of another person, Cameron and I, who basically put his lab fully open recently. So there are a couple of examples of people who try to do this.

Labels: ,