UsefulChem Drexel MiniSymp
UsefulChem Drexel MiniSymp
Jean-Claude Bradley: All right! I want to talk about 15 minutes, so I'd like to give you an idea about the kind of things that are going on in my lab right now.
As you'll see all of what you see on this PowerPoint is actually available online, so you can click through and read a lot more in detail what's going on. First thing we are doing is that we are doing open-source chemical research. If you have been paying attention to C&E news, there was an article in July you can see here. This is me, and there is Khalid, and James. And this is actually methylene blue, which has nothing to do with our project. The photographer wanted something colorful. But this is actually something I think is very exciting. Which involves sharing all your experimental results and your thoughts with the Open Community as you get it. I'll show you how we can do that.
So, everybody has a different motivation with respect to Open Source Science. Mine is that I think we're heading towards a world where instead of it being humans collaborating with humans, it's going to be machines interacting with machines. And on the way to that, I think that we're going to have to get comfortable with humans interacting with machines. I don't think that we're there at this point, especially in chemistry. I think that we're on the way and chemistry is lagging behind bio - molecular biology. This, for example, is a robot scientist. This is a Ross King's invention and it actually will generate hypotheses, design experiments, execute the experiment and then re-design an experiment based on the results. Now this is what's happening in molecular biology, and there is no reason that the same thing cannot happen in chemistry. It's just that it is more conservative and there are different ways of doing things. But I can show you how we can start to approach that. I haven't got a lot of time to go into all the details of this - the bottom line is that I think that this is going to happen by a bottom up process, as opposed to a top down where somebody decides how things are going to happen. I think that it's going to happen because right now we live in an era where you can generate automated agents that can participate with the world with zero, or near zero, cost. And all the services that I'll be showing you today are free and hosted for you. So anyone in the world can do this if they have the desire.
So the first part of this is, you know if you wanted to interact with machines. How will they know what to do. I think that this is one of the hardest problems to solve. And the answer is "ask the humans." And so a year ago, I started this UsefulChem project by asking a simple question. Submitting these search terms "what is needed now," "a pressing need" - various terms - in articles that appeared in 2005 to see what it is that humans were saying is important to do in chemistry. And a number of things came up. One of the things that really impressed me was this: there is a pressing need for identifying new drug-based anti-malarial therapies. That's a theme that was recurrent and that's actually something that I hit across with people who are doing Open Source Science, in general. The other thing that happened later in the summer last year is, I found this site called Find a Drug. This is a non-profit that looked at enzymes of various diseases and had tens of thousands of people do computations on 500 million theoretical molecules to see if they could fit. So they used this kind of distributed processing approach. I contacted them and they sent me their library of 220 compounds that are predicted to be enoyl reductase inhibitors for malaria. And this is what they look like. They are diketopiperazines, and the interesting thing about this is that the whole point of this project is to do everything out in the open and so the very design of the synthesis was done in a blog and you can go back and see how those ideas developed. Initially, I was going to do a solid support synthesis and realized the limitations of that and then I came across this very simple Ugi synthesis followed by cyclization that's pretty general and it because all the compounds were diketopiperazines.
So I'm going to be introducing various components as they evolved over the course of the year. One of the first tings that we did is setup a Molecules blog, which is basically just a normal blog on Blogger - free and hosted - where you actually can put a SMILES code of the molecule that you are interested in. So this can be a molecule that we want to make. It can be a molecule that we need to purchase. It can be intermediate. Basically, any molecule that has anything to do with our research group is put in there automatically and it gets its UC number - UsefulChem number. And then what we did is we had an Experiments blog where we've linked from the blog to the molecules blog. So every time we used adrenaline we linked to that entry for adrenaline in the molecules blog. One of the intermediates that we needed to make is DOPAL, it's this catecholaldehyde and actually you can't purchase it commercially, you have to make it. And so we looked into the literature, and that also is fully detailed. If you go back you can see how we developed that and it turns out that there is a way from adrenaline by heating it in acid that you can actually make this. I don't have enough time to go into the full detail but it is actually kind of interesting. One of the things that happened as we started this project, because this is all done live and because blogs, especially something like Blogger, is indexed very quickly by Google is that other people can find out what you are doing very, very quickly. And so we started to get these comments by other chemists. Matt Todd is a chemist at the University of Sydney and he was making comments about the concentration that we were doing in this reaction. The interesting thing about this is that the reaction is that the reaction had not even been finished and already we were getting comments on it. And that is really where I see the power of this kind of Open Source chemistry.
Now eventually, it turned out that doing things in a blog was kind of limited when you start to accumulate a lot of information. And so, I created this Wiki - which is just a website that anyone can modify very quickly - to organize what was happening in the various blogs that we were using. And so here I can detail the history of what I just told you and have links to the actual blog entries.
Now this is the really nice thing as far as I am concerned is that almost everything we've done so far has been failures, which is actually typical for a research lab. But because we are recording absolutely everything that we're doing we can actually use those failures to tell a story. If you go on the wiki, we eventually did successfully make DOPAL, but there were a lot of problems, there was some experimental data that was incorrect in the literature. We didn't know that until we figured it out finally, but you know that whole story is available for anyone to benefit from, normally that would never make it into a standard journal article. Now what we start to do is, as we're using the wiki we realized that actually a wiki is a better way to manage raw experimental data compared to a blog because you can do things like this, you can actually click on a page and get a history of who contributed what at what time. And you can actually revert to any of these versions, so if something bad happens, you can actually go back and nothing's lost. And you can actually see for each edit exactly what was done. In case of this example, I guess this is Khalid who actually ended up putting how many milimoles he had of the material. So the thing is never quite done - it's always in a process of flux. But there is always information available to anyone who wants it.
So, the really nice thing about the wiki is that it has a third party timestamp that the blog doesn't. That means that you can actually refer to a specific version. So if you claim that you've done something first, there is a third-party timestamp and a link you can give to somebody and say "I did that on this day", and exactly what it was. That's something I think is very very powerful as more people get involved with this.
You can do all kinds of things. If you go on our wiki you can click on "Recent Changes", you can look across all experiments, what everybody did, and you can follow up and see when NMR was done or what happened.
A lot of other interesting things: again remember, I'm using only free and open hosted systems here. There is a little site meter that you can put in on your wiki, or on your blogs, that will tell you how people are finding your site. And this is something that I check every day. It's very interesting to see how people are actually finding our experiments. For example, on this one, someone typed in Schmoogle, which is a chemical search engine. Somebody typed in chemistry or protease inhibitors. Then we're linked from other blogs so we can track are actually finding us. And this is a very important component of understanding how your research is being disseminated.
The other thing is on the Molecules blog, we use various representations for molecules. One of them is InChI. And that's something that is a new way of representing molecules that has the advantage that it gets indexed on Google in a way that is unique for each molecule and people can find it very quickly. Here, for an example, is a search of the InChi code that finds our useful Chem-molecules blog. Now, Dave Strumfels in our group is doing a lot of automation work and so because we have all these feeds available at all times, we can actually have automation happen to them. For example, the Molecules blog has at the very minimum a SMILES code. Every day there's a script that runs that actually takes that SMILE codes, calculates the InChi, figures out the molecular weight and then goes online to find potential suppliers and converts that into various feeds so this is one of them. So the advantage here is that you can fully systematize the way of doing research where someone actually finding the molecule may have no clue how to find the chemical supplier but they just dump it in there. You can see how we can start to do automation. I would like to talk more about automation, but I don't have much time. There is something called CML or Assess, which is very, very new, a way of representing chemical information in a format that's blog-like, but it retains the chemical information - it's not just a picture.
So, this has been a very interesting project. What we've found by doing this is that we are automatically connected with some other Open Science people out there - the Synaptic Leap. A lot of these people are involved in doing diseases that don't have a lot of commercial interest and malaria is one of them. Most people who are sick can't afford the expensive therapy.
Another really exciting thing is again because we are making this fully open; we can start to collaborate with people who are not even in science. Here is a collaboration with the Lehigh Carbon Community College with Beth Ritter-Guth's students. She has English students and she has technical writing students that are actually going on our wikis and our blogs and are writing about how that work connects with what people would want to know about Malaria that don't understand chemistry. So her students are interviewing my students and trying to understand what we're doing in chemistry. They are also putting stuff up on wikis and blogs. So everything is being shared in real time with everyone. This is really the power. Remember if we were to wait to have enough information to publish in a regular journal none of this would ever have been made public.
So our next steps, basically we want to continue to extend our automation. There is a website called eMolecules that is basically catalogs molecules. They have about five million molecules written down in their database. We've just submitted our molecules to the database so automatically our molecules will end up in the public database. The nice thing about that is that is that if you do a substructure search, they will find our compounds.
And the other thing is we are moving our spectra to JCAMP format, so that for example, if you put in an MR, instead of having a picture, you can actually expand the range, expand the peaks and perhaps even redo the integration. And so ultimately though we want to make these anti-malaria compounds and we want to have them tested. A number of students are working on this project, Khalid and Alicia, both grad students doing experimental work.
Dave Strumfels is doing the Cheminformatics component. Remember, Cheminformatics is very different from Bioinformatics- don't get them confused. Bioinformatics has been around for a while. It has its own standards that apply because the information is very structured. In organic chemistry, it's a little bit different and you need another system. And, a couple of undergrads- James, Lin and Brett a while ago. And also I'd like to thank all the bloggers that contributed to our work; contributed code and contributed ideas. And we definitely will continue that, so if that's of any interest, come talk to me.
Transcription by CastingWords