SwarmCast

Amit Weiss from Numenos.ai – Development of foundational AI models to advance drug discoveries

In this episode of SwarmCast, Harel Boren our CEO is joined by our valued customer Amit Weiss, co-founder and CTO of Numenos.

Amit leads the development of foundational AI models to advance drug discoveries.

Join us to hear some of Numenos latest discoveries.

Listen

Watch

Read the Full transcript

Harel Boren: Welcome to SwarmCast, a podcast that’s dedicated to exploring the key AI and data science topics with industry leaders. SwarmOne.ai – the autonomous AI infrastructure platform for all AI workloads from training through evaluation and all the way to deployment, self setting and self optimizing in all compute environments, whether on premise or on cloud. So I’d like to introduce today, today’s guest, Amit Weiss. Amit is the co founder and CTO of Numenos AI where he leads the development and of foundational AI models to advance drug development. Today we’ll be talking, about the AI revolution in biotech and drug development. Will seek to get from Amit some insights and recommendations about how to do it. Right. So let’s kick off, Amit, with a quick introduction about your own background and expertise.

Guest introduction

Amit Weiss: Thank you. Okay, so, I’ll do this short, version of my intro. I’m originally a physicist. I loved physics and I still think it’s the, the most interesting area and of knowledge we have gained as human beings. But I enjoy more, more of an applicable research and as my profession and I moved quite fast to doing machine learning in the days where it was not called machine learning. And so I wrote my first neural network in 2011 was all, yeah, it was a pre, pre today revolution, the last one. And it was quite interesting because we were exploring the ideas of a fully connected neural network as a toy example of mechanical statistics and entropy. And it was all quite theoretical, but we actually implemented it in C and back propagation and everything ourselves and tested some capacities of the networks to learn random noise and it was quite fascinating. And that code base I have wrote myself, I’ve used again and again later on.

Harel Boren: A little masochistic, I would say.

Amit Weiss: But again, I think the packages you had back then were also masochistic a bit. So I’m not blaming myself for holding on to that code base until I think Pytorch came out and.

Harel Boren: Wow.

Amit Weiss: Yeah, yeah. And but it ran quite fast. I was running it all on CPUs and I would wait a month for the results. Like I had a lot of patience back then. Being a physicist, a computational physicist, you develop some stamina and patients, as well. I finished my master’s degree in physics and I did a bachelor in computer science in parallel. And that was a few years after I already wrote my first neural network. And then I moved officially to doing computer science and algorithms, in deep learning. Established a data science team in the air Force, the Israeli Air Force, and Then I was asked to establish a machine learning group in the equivalent of the Israeli darpa. Fascinating time. It was kind of like having your own startup in seven different areas where I had the personnel and I had the funding to do whatever I wanted to solve hard problems. And what I found myself doing in most of my career was dealing with the case where you don’t have a lot of labeled data but you have tons of unlabeled data. And I found myself building foundation models for the defense needs long before it was called the foundation model. And so we called it base models that we first trained using self supervised learning and then we fine tuned it for a given small cohort of labeled data and we use that successfully and deployed numerous models in this framework of mined and that are running today operationally and saving lives. Yeah. And so this is my background coming out of the defense industry and starting a PhD in biology. And I was curious enough on, on how these ideas and methodologies I’ve developed can be utilized to help humanity in a broader sense. And after a little exploration of maybe doing a tech transfer with my PhD, I decided to look for something that is more on the industry side and fully. And then I found my amazing co founder, Vitaly Fohman was a long veteran in the pharma industry that surprisingly just laid out that the exact problems, data problems I was dealing with in the defense industry were very prevalent and hindering success in the pharma industry as well. And so we just copy paste the way I know how to solve these kind of problems and applied it into the pharma industry and started building our own foundation models and for the type of data you can find on patients in clinical trials. And it was quite a fascinating journey.

Harel Boren: Yeah, let’s share more later as well.

Amit Weiss: Yeah.

Harel Boren: what I really like about what you said about the ability to implement one set of models and one way of thinking into another, that you probably, it seems you brought from physics the pervasiveness of how things work. so, so here’s a one materializing in front of our eyes.

Question:

Harel Boren: What is your connection with SwarmOne?

Which of course I know but I’d love to hear it from from you and how did it help you and how did it to perform in your hands?

Amit Weiss: So coming from a very protective and privacy LED industry and I’m used to training things on actual bare metals and I was not that familiar with all the DevOps needed to actually utilize and scale automatically training jobs. And so when I was facing building this outside of a private organization, the challenges of choosing the right machines and splitting the jobs and handling all this little DevOps that a lot of engineers have worked hard on was a challenge I was happily getting rid of for myself. And so SwarmOne just swept me and off my feet allowing me to really deal with the actual research and not with the MLOps or engineering needed around it. And I think I managed to scale my research as a single researcher in a way that I could not have.

Harel Boren: Imagined without SwarmOne thousands of architectures that you, that you managed to.

Amit Weiss: Yeah, that we’ve explored and tried on and it’s, it’s really. It was an enabler for us and we were very scrappy at the beginning. We bootstrapped the company for more than a year wanting to reach actual validation from clients before exploding, making sure we explode on the right problem. And so it was really an enabler and the things we’ve accomplished just ourselves we actually got into a paying customer before raising our first money thanks to the models we have trained on Swarm one. So it was quite the achievement.

“SwarmOne just swept me off my feet allowing me to really deal with the actual research and not with the MLOps or engineering needed around it. And I think I managed to scale my research as a single researcher in a way that I could not have“

Harel Boren: Yeah, I recall it also gave us a lot of pride that you went such a long way on your own and managed to accomplish such breathtaking achievements. So if we cycle back a little bit to your priority to your prior chat, what inspired your journey into biotech especially you said something very, very general. Okay. Which. But if you can be more granular about it and shed some more color about it.

Amit Weiss: Well it’s hard to talk about it without talking about what drove me into a ah, defense career and you know the, the baggage I have accumulated over the years. And so it’s very presence in my way of thinking that I would want my work to touch as many people as possible and to make the world a better place. And it was then very present in my day to day in my years in the defense industry and just waking up with this sense of mission and leading people with this sense of mission words. Making the world a better place with our skill set just felt amazing and kept me going in the hard times and there were a lot of hard times and I guess I was coming out of this industry into the startup world very much aware of how hard can it be to start your own startup and move on and achieve all the milestones and deal with all the shit that hits the fan and it was very obvious that the only thing that would just keep me going, that would justify suffering all these, hurdles would be knowing that I’m waking up and what I’m doing in the day to day will saves, will save people and will make people’s lives better. And I, I just, I couldn’t see any other way for me.

Question:

Harel Boren: So it was very clear that now I’m all, I’m all curious, what is the problem that you are solving at Numenos.ai? The very essence of it. And I, and, and I believe it’s, it’s, it’s a big, big essence. taking a guy like you and putting his brain into it, 24 7.

Amit Weiss: Yeah. I felt like it’s, it’s worth my time, let’s say this way. And I was very happy to discover this problem. I m would say the essence is that we currently have no idea whether a patient is going to respond to a specific treatment or not. And that is in early development, that is in later development, that is in the clinic or already approved drugs today. People are just too unpredictable in their, the way they will respond to a given treatment. And this creates tremendous hurdles and suffering and money spent and waste and death and, and it is just horrible. It’s a horrible state for the health industry and regulators trying to do whatever it can to solve it. But we haven’t. Even with all this new AI, even with all the data that is being accumulated, we have not moved the needle, not in development and not in the clinic. And many have tried. And people used to say that the problem is data. We don’t have enough data to reach the right conclusion. And really I don’t think we are missing data at this point in the world. And some companies are claiming that we haven’t measured the right things and that is why we haven’t found the responders. But there is so much data being generated, measured and collected today on individuals in the clinic, on individuals in clinical trials. But claiming that we haven’t measured the right thing is sometimes absurd given what data, they have. And so we really need to think about it using a different perspective and really thinking about the origin of the problem. If you look at ehmos on the clinic with millions of patients, the level of data being measured on them is quite sparse. It’s on the opportunistic side and not thought through. And the only place where data is being collected in a very consistent and thoughtful way is clinical trials.

Harel Boren: Okay.

Amit Weiss: And so many have tried to do this understanding of response to treatment from the clinical side of the data. And you have a lot of patients but not a lot of actually thought through data being collected. But on the other side each clinical trial is extremely small and it just doesn’t match up the amount of dimensions you have on each patient and the amount of patients you have on each clinical trial. And so this was crying for me at least to use the approach of building a foundation model right for the task that can be trained on everything that there is on all clinical trials, on all the data collected from the clinic. And so you can put it all together, learn whatever you can from it, generate a well represented patients in a much smaller latent space and then use this word 2vec on patients for the need of whatever you want, predicting specific treatment response, predicting benefit and picking the right patients for a new drug in development.

Harel Boren: So this seems so all this. Yeah, I’m sorry, please, please proceed.

Amit Weiss: So all this can be possible if you train the right thing I would claim. And we, we pretty proofed it on numerous use cases by now and it was quite amazing to discover and see how well it performs once you accumulate enough patience, once you use the right architecture, once you use the right loss functions. And.

Harel Boren: So this is an out of this world achievement, taking data which is, understand is from clinical trials which is rather sparse, not directed to a particular direction and finding within the AI industry building a foundational model that will have the ability to actually and encompass all this data as valid data on the input side to derive the right output on the output side and make predictions in regards to medicine which is inexistent completely. Do I get it right? just for our audience, I hope they get it right.

Amit Weiss: You got close. And I would say that the hardest part is predicting like in a in a drug in development. Only 100 patients have received this new drug ever in the world because it’s a new drug. And they only recruited 100 patients receiving this new drug. And so the ability of taking any biological signal out of everything there is about this patient that can be attributed to whether you are benefiting the new drug compared to what will happen to you and your outcomes on standard of care. This is extremely hard and it’s an extremely challenging problem that can only be solved if you have a good foundation model that represents patients. And I think by this point, and I’m surprised it didn’t came up until now, the foundation model we’ve built is not an LLM.

Question:

Harel Boren: Oh, are there any models not LLMs?

Amit Weiss: Apparently there are, and there used to be a lot more before, the LLM community stole the AI world from us. But we’ve trained a completely different architecture that fits the type of data you have on a patient. and the data on each patient can be the EHR data. Right. The file with all the records and blood tests and whatnot. And it can be genomic sequencing. And again, these are not text. They’re not, it’s not language. And large language models are not fit for this task, even though, you know, LLMs have been built on biological data, including DNA and RNA and proteins. But again, I think the, the best analogy I have for it is that for example ESM made by meta or alphafold that learns proteins. It can represent and well characterized a protein or a sequence of amino acids or a sequence of nucleotides that represent a gene or a specific biological function. And it can do it pretty well, even though the sequence can be tens of thousands of letters. But it is good for representing a protein or a gene. So it’s like a gene to vec or protein to vec.

Harel Boren: Right.

Amit Weiss: It’s not patient to vec. And we need a much larger view right. On the entire human, that is the patient rather than the specific gene that he’s expressing right now. And so it is like trying to use an LLM that represents sentences or phrases or paragraphs, even with a long write context window and use an LLM to represent the differences between different libraries with all the different books each library has. So I’m saying, you need a different tool to aggregate an entire library and get a useful representation for the library as opposed to how to represent a paragraph in a book or a question or a chat. And ah, so we are dealing with a different problem. And for that problem we had to develop and invent new architectures and new ways to deal with the data.

“The LLM community stole the AI world from us. But we’ve trained a completely different architecture that fits the type of data you have on a patient. and the data on each patient can be the EHR data. Right. The file with all the records and blood tests and whatnot.“

Question:

Harel Boren: So I kind of understand that there was actually no way to fine tune anything in existence, or ensemble or whatever. The only way, the only path was, and not the elephant trail path was to actually go foundational model

Amit Weiss: Yep.

Harel Boren: Yeah, this task is, this task is sky high. And how are you doing? How how are you doing in, in your, in your progress? and and maybe I’ll add to that what do normally a, AI professionals such as you face when they address a problem of this magnitude.

Amit Weiss: So it’s it’s a lot regarding engineering of how to do it right. Which warm one helps tremendously in this effort Moving forward, it’s a lot of trial and error. It’s a lot of properly doing experiments. And so as opposed to other companies I’m seeing in the field, we have a very large research group compared to different companies our stage. and I was even advised to keep more engineers than researchers. But just the amount of research we have and need is just so large that at the moment we have more researchers than any other profession in the company. And I think that puts us on a quite a special kind of startup companies, especially our stage. And it just, you know, doing it diligently, realizing what is missing. And so we invent a lot of what we need and we cannot find, you know, a paper describing how to solve this and this problem, because we are the first one pacing it. And it’s fascinating. I enjoy every second of it and I enjoy working with the people that came to the caller and to help the list with me.

Harel Boren: That sounds fascinating. And the fact that actually You actually made redundant one of my questions that I had and that if you have a recipe to grant but understand there is this is research at its most raw level, actually starting from pieces of land, so to speak, where the solution may or may not be in. And I guess It includes also being exasperated from time to time and not finding the solution where you feel it should be. So what would be if you were to integrate your own insights, and grant some insights for professionals at your stage, facing such m High objectives and challenges, what would be your insights into their workflow, into their thinking flow, anything you think to be relevant.

Amit Weiss: So I would say, you know, good research is done by good researchers and there is a way to conduct good research. And I really feel my time as a physicist really helped me build the structure of how to conduct good research. And I would say, and I’m sure physicists in the in the audience would agree that first you need to identify your assumptions and where they might be wrong and then come with a way of testing those hypotheses and then just doing it very methodology and methodologically where you have all the assumptions listed in a list, you go over them one by one. You plan on an experiment that tests these hypothesis or these hypotheses and just doing everything very methodologically removes a lot of the problems of how to navigate forward and what works and what doesn’t. And I see today a lot of researchers, especially in the Machine learning world or the LLM world where you just brute force everything and it might work, it might not work, it might work and cost a fortune. And so doing this on the cost effective side requires I would say good research and even, even without considering the cost. If you brute force things in machine learning it is so easy to not generalize and overfit that even this approach is limited at some points, especially when you don’t have as much data as you want, especially on the labeled size of the data sets that usually are not that big or are limited. And, and so like just be careful with brute force even in today’s realm of endless possibilities and just be thoughtful and on each experiment what are you testing? What are the different outcomes of the experiment can be, are they informative, can they disprove hypothesis you have. It is so important to do these things properly and if not you would just waste a lot of time and money.

Question:

Harel Boren: Do you feel that your background in implementing neural networks in C provides you also with insights into the actual neural M models that you are creating today at a higher language? I assume that you’re using Python

Amit Weiss: Yeah. So today I would really not envy on people doing machine learning and neural networks not in Python. I think you don’t want to go back to the old days. but I think understanding these components in detail is I wouldn’t say crucial but it does help me tremendously thinking of new layers, just feeling comfortable with playing with current layers and adapting layers to my needs, understanding each component and its additive value to what’s going on, picking random points in the network and just seeing what’s going on and because I’ve done it on a very low level.

Harel Boren: So you would find completely comfortable in tinkering with activation functions and what’s.

Amit Weiss: And I have.

Harel Boren: Yes. And you have. Yeah, you’re actually echoing one of my own personal feelings is that everything has become so chunked when you’re going all the way to Pytorch and you know the higher language, the higher languages and frameworks that it always, it almost puts a a hurdle on the way for true true innovation insofar as it, it is associated with neural networks because some things are okay just there. These are the activation functions you have. choose one and go ahead. that’s the optimizer you got. That’s it. And, and, and so on and so forth. And I have a sense and it seems that you kind of agree with it is the Fact that you enable yourself to tinker to any level within a neural network enables you actually opens to you new horizons for innovation.

Amit Weiss: I completely agree and it’s a great example of the hypotheses and assumptions I was talking about. So assuming right, it’s an assumption that this activation function is what’s best for this problem and having a list of all these assumptions including the ones that are trivial and everybody are using them in LLMs but they not necessarily matching my problem in my architecture and I shouldn’t just use just the best known practice for LLMs when I’m training a different model and different data for different problems. So just feeling comfortable questioning everything and just prioritizing. What’s the last question that is probably most ah useful to question now that is the way I think good research is done and I think I was the first to ask you guys to implement gradient clipping for example and to this day I was the only one using gradient clipping as far as I understand from your cdo. And it’s just an example of things that you know, I needed and felt like I need and they’re not the standard in today’s models. you take and download from hugging face and you know this is part of doing basic research and.

Harel Boren: Yes, yeah.

Amit Weiss: And actually working with Pytorch Lightning is also some higher level implementation that sometimes prevent you from doing very meticulous stuff down below. And so I was I was having a bit of trouble at the beginning turning everything I’m doing into Pytorch Lightning, but I found my ways and I understand what it enables on the mlops side and on your side.

Harel Boren: Yes, yeah. You mean the implementation of Python Lightning as the tool to support Pytorch within Swarmone, autonomous AI infrastructure platform. Yeah, yeah, certainly. By the way, some of the people working with the platform do use clipping, but they don’t even know that because now it’s performed by the platform on its own, in its own decision.

Amit Weiss: Amazing.

Question:

Harel Boren: Yes, yes. So, well, before I, before I wrap up our, our very interesting conversation I’d like just to ask you one question that I believe that everyone has in their, in their back, the back of their minds. are there any ethical considerations that professionals, including yourself in the biotech industry should keep in mind? and this, this stems from the fact that indeed you’re not using a you’re not doing one book or one text and trying to and trying to project several libraries. You are actually trying to project those libraries. And not only trying, but as much as I can hear succeeding. So I think you have even taken the ethical envelope, to become more interesting, and the way you treat it, are there any considerations you’d like to share with us?

Amit Weiss: So we are not making our models freely available for whatever use people want to have it used for. And I would say that of course there are a lot of possibilities once you have this way of looking at people and this can be used not for the people’s benefit. And I think a famous example I like to use is I remember a company, I forgot the name of the company. It was doing Gen AI on molecules and it was trying to help drug development find molecule candidates, drug candidates that would target a given protein and be not toxic. So they had their way of generating more and more molecules and they had a loss function. And that was adding to the context and pushing this gen AI model to generate more and more non toxic candidates.

Harel Boren: I can see where this is going. Proceed.

Amit Weiss: And I remember them giving a lecture and a DARPA guy approaching them after the lecture asking if they can put a minus in the toxicity.

Harel Boren: reverse loss.

Amit Weiss: Exactly. And, and then I remember them generating around 1,000 molecules that are the toxic, you know, the, the, the most toxic known to men. And out of those 1,200 were novel, like novel mechanisms of toxicity. And this is a great example of how these, and this is not even on a patient representation level, that’s on a molecule level.

Harel Boren: Right.

Amit Weiss: so obviously there are bad usages for this technology and that is why we don’t feel comfortable sharing the model with whomever is asking. And we are extremely focused on making people’s lives better, improving drugs and making clinical, clinically relevant endpoints better if it’s longevity, if it’s quality of life in numerous disease areas at the moment. So we are very clear in what we are doing and of course this can be used in a less preferable way. And we feel the, the responsibility and lying on our shoulders, whenever we touch. And again look at the data we are using. These people are people that have died getting some treatment. Almost all the rows in my data sets are people that have lived, have been doing something in their lives and then had their lives taken away from them. I think before we wanted it.

Question:

Harel Boren: I think this puts your research on, on an even higher level because all those people who, who contributed the the information of their own reaction to whatever has been whatever they’ve been undergoing are actually indirectly but very directly in your case contributing to a much higher ends And that is, and that is the result of your work. So yeah I can certainly feel the burden of responsibility that you’re carrying and also the pride and satisfaction in gaining the results that you are gaining. I can. Thank you very much Amit. I would love to share with our audience if you can recommend any resources or insights that have brought you to where you are. And I think that one particular issue is whether to where is the line drawn between fine tuning a model or an ensemble of models to making the decision and crossing the line and moving to a foundational model. Something that indeed is simpler done with swarmOne but still is a ah high hurdle to carry. So where does this from your point of view where does that line lie? and yeah, where does that line lie?

Amit Weiss: I guess if you can avoid training a foundation model from scratch for your task I would highly recommend you to not it’s a much bigger of a challenge than taking an existing model that performs quite well and making it work better on your task. But again I guess the way I think about it is if you have a new type of data you need the new type of representation to use this data. And so I see a lot of work done pushing different data sets into becoming a sequence of tokens of language and trying to use that. I think 40% of the papers in Europe’s that I’ve been at this year were 40% were hey look how cool this thing that we’ve achieved taking a ah, non sequence non tokenized data and pushing it through an LLM and it’s fine. it can even be interesting sometimes. But I don’t think this is what the industry and the research area needs right now. But, and that’s a big but because I’m not talking about academy and the research industry needs a lot of them can be sold using existing models available for fine tuning. And this has been the case in computer vision a long time ago with amazing things done by YOLO, fine tuned for different tasks. And I think we reached a point where NLP as an area of research reached the same level of eve of taking yellow and fine tuning it on detecting objects. And it is quite amazing. And this already solves tons of real world problems okay that don’t require anymore inventing a New model for sentiment analysis or for m. Understanding human ah needs. And so I think you know, the potential is vast, but it doesn’t translate to all areas. And I’m sure. And again I was working on this novel sensors for different tasks. And whenever you invent a new sensor you, you need something that would digest that and just taking an LLM and hoping to digest a new type of sensors making less sense for me. but again there’s tons of opportunities, especially today for people without training foundation models from scratch. And I am happy that some real world problems do need to train a foundation model from scratch. I enjoy it very much and, and I’m really glad this aligns with you know, favoring humanity or helping humanity. So I’m having a blast.

Question:

Harel Boren: Yeah, I envy you. actually, what if I can, if I can draw a line with physics? What your what, what you’re telling our audience is that wherever neutronian physics is enough, use it. No need to go relativity. but in some cases there’s no other way. Was that a good equivalent?

Amit Weiss: Right to the point. Right to the point.

Harel Boren: Wonderful. Okay.

Amit Weiss: Unless you’re building satellites, don’t go relative. That won’t be something you will enjoy or need.

Question:

Harel Boren: Yeah. so where can our audience follow your work? any social media or blogs or any publications.

Amit Weiss: so we just published our first publication in nominals.

Harel Boren: Okay.

Amit Weiss: And currently available in the archive and that I can share in the in the links. Ah below the episode. But we will start having an active blog soon on our website and until then you can just keep being updated using our LinkedIn. And my co founder is being very present in numerous conferences in the States, in Europe and I will start probably soon this year.

Harel Boren: Wonderful. Amit. I should thank you dearly for spending with us. the time and the attention and you left me completely envious but nevertheless stunned by some of the insights. And thank you very much. again, we will note to our audiences the name of your website and where your work can be followed. And we look forward to touching base say in a year or two and seeing where you guys have accomplished because you are now climbing the Everest with very good results. but climbing the Everest. So thank you very much.

Amit Weiss: Thank you very much. And for all the help and opportunity this have gave us to climb this Everest that we have decided to climb.

Harel Boren: We feel blessed to have taken even a small part in, in your effort. Thank you.

Amit Weiss: You’ve taken a big one.

Harel Boren: Okay. Thank you very much. Take good care and see you.

Amit Weiss from Numenos.ai – Development of foundational AI models to advance drug discoveries

Table of Contents

Listen

Watch

Read the Full transcript

Guest introduction

Question:

Question:

Question:

Question:

Question:

Question:

Question:

Question:

Question:

You might also like

AI costs are exploding – and most teams are just watching it happen

How many DevOps/MLOps personnel are shackled to your AI Workload?