Wednesday, November 26, 2008

Open Notebook Science in 15 minutes

Open Notebook Science in 15 minutes

Jean-Claude Bradley: All right. So, I will try to explain to you these two concepts of the synthesis of anti-malarial compounds and Open Notebook Science in the next 15 minutes. Well, this is actually a pretty good time to give this talk. This week we actually got our Wikipedia entry for Open Notebook Science. And it turns out that it required a lot of peoples coordinate efforts. And it required a body of work before we were able to do this. But, if you go on Wikipedia, you can learn a lot more that I don't have time to explain today.

The idea of Open Notebook Science is basically to report the work that you do in the laboratory in real time or as close as you can to real time, so that the entire world knows as much as you do about your research. Like I said, there are a number of references here that you can take a look at the background of this. But, the motivation is that - well, it should be self evident that it's a way to do faster science compared to either not disclosing some things or significantly delaying them.

And I think, it's also a way of doing better science, which is not immediately obvious, but hopefully I will show you some examples of how that can be.

OK, now to the synthesis part. So, we are a synthetic organic chemistry group and our target is malaria, specifically Falcipain-2. Malaria, as you should know, is a disease that is spread by mosquitoes. And here's actually the malarial parasite inside of the red blood cell. And it uses the enzyme Falcipain-2 to metabolize hemoglobin. So, if we can inhibit that enzyme, it could be a way to basically stop the process of it replicating.

And so, what we have done is we have collaborated with a group at the Indiana University, Rajarshi Guha. And he does the docking, which basically means that he takes the Falcipain-2 in the computer and tries to dock molecules and see if they fit or not. If they fit, there's a chance that it might inhibit it. And so, he tells us which compound to make. And then we make them and we ship them off to UCSF where Phil Rosenthal does the testing.

So, this is a collaboration done completely openly, and people can join in or they can follow what we are doing well before the publications come out. So, I am not going to talk too much about the nuts and bolts of it, but, suffice to say that we use blogs, wikis and all these different social networking sites to try to make this system fully hosted and fully replicatable by anyone else in the world who might want to do a similar thing. And that's happened. And I can surely talk to you if you are interested in seeing the different groups that have done that.

I was telling you about it's a way of doing better science and really comes down to where's the beef when you talk about your experiments. So, this is a blog post here, where we are talking about doing different things. And it says "See experiment 150 for more information." So, this is Ugi reaction that I will be mentioning over and over in this talk.

And if you click on that link, it takes you to the lab notebook page experiment 150. And this actually looks very similar to what it would look like in a paper notebook. And that's on purpose. We wanted to make things easy as possible for people to get involved with Open Notebook Science.

So, you have an objective, and you have all these different hyperlinks. So, one of the things that you can link to - and then this is a pretty long page, I am just going to skip through it giving you examples. You can hit that Ugi edit link and it takes you to an entry in ChemSpider. ChemSpider is a free database. It has over 21 million compounds. You can do such sorts of searching. You can do all kinds of things for free. And I don't have to worry about that on my server.

So, that's what we are trying to achieve here. We are trying to get high quality information processing without having to become computer scientists to do it. And it's becoming really possible to do.

We also link to the docking procedure that our collaborator Rajarshi uses. Again, here the idea is that this is replicatable. Someone who has done docking before should be able to get enough information from this page to generate the same compounds in the same order; all right? So, these are called SMILES codes and they are convenient ways of representing molecular structures, and you can just dump them in spreadsheets. So, it's a pretty convenient way.

Again, this is all made explicit, so you don't have to ask the researcher for permission. You can just go and look at the results.

Another very helpful thing is our spectra. If you know anything about organic chemistry you know that the basis of it is spectra, especially NMR spectra. And there's actually a very neat way - if you have your NMR spectrum in a JCAMP format, you can run JSPEC view so that someone who does not know anything about the Java or anything, just hits this link and this spectrum pops up, and it's actually interactive.

So, you can use your mouse and drag across any peak and it will expand. Again, here - this is what I am talking about doing better science, you know. May be, you didn't expand that peak in your paper. May be you didn't talk about it. But, if I am trying to replicate this where I am trying to extend your research, maybe I am interested in that peak. Maybe I want to measure it. And so there are just more details.

So, by the time we end up with the final conclusions and it's says "This Ugi product was within 59% yield." You don't have to take our word for it. It's all backed up - either well or poorly - but it's all backed up, exactly what's supporting our statement.

If you are not familiar with the wiki, the reason that we use it for a live notebook is that every time there's a change made, it tells you who made the change and exactly when. And we have a third party time stamp for it. So, we can claim that we knew what we knew exactly when. And we are not running the time stamp. It's run on a third party that's well respected. So, that could be interesting down the road to settle claims.

We can compare any two versions, and using wiki spaces it lets you - basically shows you the stuff in green is the stuff that was added, and the stuff in red was deleted. So, it's a really nice way to understand what people are good at, right? Because this is a collaboration, many people in the lab working together, certain people are good at some thing and other people are good at other things. And this is a really good way to keep a track of all that.

Now, to find information, that's actually a big issue. Obliviously, if we just left it in the wiki like that - I mean we have tags. We have ways for searching for the information. But, you don't want to have to do that if you are interested in seeing the collection of experiments that we have run.

So, we've run this Ugi reaction several times and we have modified the conditions. So, we have used different staring materials, different solvent amounts, and different concentrations. And we have sometimes gotten a nice precipitate that was pure product and sometimes we don't. So, we are trying to understand that. And we are using these Google docs as a way of sharing that information in a very convenient way.

So, this is a spreadsheet. It works very similar to Excel but it's free and it's hosted. So, I will show you an example of an opportunity we had recently to use a robot from Mettler-Toledo. And we are able to actually automate this optimization of this reaction. This was done in collaboration with Dr. Owens. He did some statistical analysis, which I want have time to get into. But, the idea here is that we wanted to find the highest yields - the condition for the highest yields.

So, we modify concentration, we modify the solvent, and we modify the excess of some of the reagents. So, we actually did these reactions in little tubes that had a filter at the tip. So, the robot added the four different components. And then it precipitated or it didn't. And if did. We just washed it and then weighed the results. And of course took an NMR to make sure that we actually got the compound.

So, this is a picture of the robot. And it's basically just a syringe that goes and takes the liquid out and puts them. An interesting thing about using a robot is that you get automatically the log of what the robot did. And it pays attention all the time. So, it will record what it is it think it did. That's a double edged sword. It gives you a lot of information. If you want to debug things, yeah, you absolutely have some good data to look at.

But, it also means because you are able to do so many more experiments, you have to be even more vigilant about systematic errors. And we've had that problem. And so, you end up doing a thousand experiments before you find the problem, all right.

But, once you get it working, actually, this can be extremely useful. So, just to go to the final results here. So, we did these experiments and we had enough material to publish a paper. So, here's another use of the wiki where we actually wrote the paper in the wiki. So, every single draft was saved. And we can go back and see exactly how the paper was written.

And the really nice thing about having a notebook to point to is... See, I can have reference nine to 11 be the melting point of the compound, and I can specify the batch that it was taken from - from experiment 99, whereas the proton NMR was taken from experiment 203, sample A 11. So, that information is typically not part of a typical publication. You assume that the guy knows what he is doing and that he actually characterizes his compounds properly. While that is not always the case as we find out painfully. So here we can actually go and see if there is a problem with the specific batch if we are not getting the same information.

Now, where we actually submitted this paper is kind of interesting. It's called the Journal of Visualized Experiments, JoVE. So, there is a written part to this that I just showed you; that is what we wrote on the wiki. And they actually sent some camera people to record our experiments. And so, this is now under peer review. And we should hear back shortly about this. And I don't see any problems and I don't expect any problems.

So, this will be a nice way to communicate with video as well. So, there are so many tools now that make communicating your science faster without losing anything. Another thing that the physicists have been using for a while is pre-print servers. So, chemistry really didn't have a good pre-print server - well, they did, but that's a whole other story; it's no longer working. So, Nature actually recently came up with this Nature Precedings, which is a pre-print server and it's backed with the editorial filter of the Nature Publishing Group. If you are not familiar with Nature, it's one of the most well-respected publishers out there.

So, if they basically say that this has good scientific quality, it's probably true. And so, we can before publication in JoVE or any peer review journal that we choose to publish in, we can actually link to this document. People can comment on this document. They can vote on it. They can give us feedback. You can have versioning on here. All kinds of things you can do.

Normally, we have a paper out, you just tell people "Well, it's going to come out next week," and when it does "Here's the link." Well here, now, you can actually give the link and you can have [inaudible 10:39].

So, the bottom line here is we did find a maximum yield - 66%. We went in with a yield of about 49 to 50% - we got some increase. But, the major result of this was really to prove that we could optimize the reactions in robotics.

Now, so far as the malaria project, that's actually important because that's how we make our compounds with Ugi reaction. Recently, we've actually gotten some results about this. We have four compounds that actually are active in inhibiting the enzymes, and they are also effective in inhibiting the infection of plasmodium falciparum. And these are in the micromolar range. So, it's not bad. I mean, it's definitely publishable stuff.

And there are different stories here. We used one receptor area on the enzyme here. We used another receptor here. I don't have time to get into it, but it's kind of interesting the results that are coming out of this. And again, this is out into the open. And we never know who is going to stop by and collaborate.

A last little story. I recently did a little trip in the UK. And my friend here Cameron Neylon who also does open notebook science - although, he uses a different system than I do, we had the chance to spend a day in the lab to do experiments. And one of the things that evolved from my trip is a very simple project using open notebooks. And we spent the day measuring solubilities. So, we took a bunch of compounds and we took a bunch of organic solvents, and measured the solubilities. And then, we reported these solubilities in a Google doc.

Now, this is actually very interesting. So, for Boc-glycine and methanol, we are measuring 4.4 molar. And you notice that that's in green. And down here, for D-glucose and methanol, we do get a number, right, and it's 0.05. But, I put it in red and I don't actually include that number in my final results, because I am not satisfied that I am going to stand behind these. I don't think that 1.8 milligrams in the way that we were measuring it is good enough to report this.

But, what if you want a ballpark estimate? You can still access my number and you have all the details of the context in which it was taken. So, again, that's better science, I think. And what we are trying to do with this project - it's actually related to the malarial project in the sense that we can measure solubilities, report them publicly, and then build models; and Rajarshi Guha is going to help us build models of solubility - we should be able to predict the yields of these Ugi reactions in different solvents.

So, the idea is, for this Ugi product, you should do it in 51% methanol, 4% ethanol and the rest is acetonitrile. So, that will be a very powerful thing that can be used not just for our project, but really anyone could. And sort of to get this ball rolling, I set up this Open Notebook Science Challenge. And what it is, it is essentially we are asking people from around the world to contribute their measurements so long as they link them to a well maintained notebook. And if they do that then we can use these results, and we can publish with them, and we can do everything that we do as scientists.

And we have a sponsor. Aldrich's is actually volunteered to ship compounds anywhere in the world to encourage people to do this. So, I am very excited about this. It's a new initiative. And I think it has a good chance of working.

And there are so many people to thank here. Khalid is my grad student. Kevin Owens you just heard from. Tim Bohinsksy is an undergrad who just started to working in my lab, his term measuring solubilities. James is also an undergrad. Tom Osborne is the Mettler-Toledo rep who was very patient and took a lot of time to bring us the robot for us to get these results.

Antony Williams is the guy who runs the ChemSpider, the database that I showed you for molecules. Andrew Lang actually put our results into Second Life. Because of the briefness of this talk, I wasn't able to get into that. But, you can visualize the optimization of the reaction using 3D plastic. You can rotate it in Second Life. So, Andy did that. And of course, Cameron from Southampton.

So, that's it. Any questions?



Post a Comment

Links to this post:

Create a Link

<< Home