Computing with the Building Blocks of Life, with James Banal

Computing with the Building Blocks of Life, with James Banal

The building blocks of life, deoxyribonucleic acid (DNA), can be used for computational advantage, posits Dr. James Banal, postdoctoral research associate at the Massachusetts Institute of Technology (MIT), Department of Biological Engineering, in the Bathe Lab.

โ€œI work on the wackiest things in computing and storage right now, which is quantum computing and DNA data storage,โ€ says James.

From ultra-dense, ultra-long storage of digital data (think: storing exabytes for fifty years) to building a ‘frozen zoo’ or ‘species time capsule’ to preserve living components of our planet in case of catastrophe, DNA storage and computing leverages the life within all of us to improve not only our lives, but those who will inherit our future Earth.

Watch:

Watch on Youtube

Listen:

Tune in wherever podcasts are available. (Buzzsprout)

Read:

  • Show notes ๐Ÿ‘‡

James Banal  00:01

I almost quit on this project for like, a lot of times, you know, because it wasn’t my comfort zone, right? I had to learn a lot of stuff. I had to fail a lot. So I was in the flow cytometry at MIT, the Koch Institute and it’s like this is it. Like, you know, it’s like, Michael Jordan, you know, making a shot 10 seconds of time… the flow cytometry happened like the populations are right, we went the sequence that’s like the final check. And if like any errors in the sequencing, because we didn’t really… we were not not thinking about the correction back in the day like to take any error on that base, In the sequence there. If there’s any sequence, we won’t get the picture of Lincoln, like we wouldn’t have… And so I saw the picture of Lincoln like, and I tell him, are you sure this is…? I couldn’t believe it like, yeah, we just pulled out Lincoln. That was the beginning of like, let’s just finish this and I think we got something

Announcer  00:59

Welcome to Tough Tech Today with Meyen and Miller. This is the premier show featuring trailblazers who are building technologies today to solve tomorrow’s toughest challenges.

Jonathan Miller  01:16

I am jmill of Tough Tech Today with Meyen and Miller. I run a research and venture company and so work very closely with Airbus Ventures onassessing the new technologies, frontier tech, that’s coming down the pipeline and seeing how that… what the world’s gonna be and what kind of teams are really building building that future. And so from autonomous systems to space and other forms of like sort of satellite systems and propulsion and everywhere in between

Forrest Meyen  01:52

I’m Forrest Meyen, the other co-host. Just two seconds on my background. I’m a senior member of the technical staff at Draper Labs in Cambridge, Massachusetts. And I’m also part of our basically, Space Systems Engineering Group. And I also do our startup outreach. I’m a program manager with our Sembler startup outreach office.

James Banal  02:16

Yeah, I’m James Banal,a postdoc in the Department of Biological Engineering at MIT, Professor Mark Bathe’s Lab. I work on, like the wackiest things and in computing and storage right now, which is on quantum computing, and on DNA data storage. So those are things I’m fascinated by right now what keeps me busy and keeps me awake.

Jonathan Miller  02:39

DNA storage is certainly a really interesting area, because of it’s unconventional computing, unconventional storage. And so looking at this is this is a couple years, several years, decade maybe in terms of the future that that you’re building, James, but you’ve got the front row seat on this, like you’re, you’re the, you’re the maker of this. So I think it would be helpful to understand… to paint a picture of what this world could look like as your, of what you’re working on right now.

James Banal  03:16

It’s pretty exciting area. So actually just just real quick stepping back a little bit like how I was introducing  DNA data store, he was like, I came in 2016 at MIT. And basically my PI told me you’re gonna work on this  and you’re going to work on data storage and I remember I had the Trello board where I put DNA data storage as like my backburner because it’s like too wacky, I don’t know if there’s anything useful there, I didn’t really believe it and then you know, I did a little little bit of reading following up on a lot of the papers and what I realized is that the data density is so immense that you can literally replace a Facebook data center that It occupies, you know, a huge, massive area, somewhere in Oregon into the size of a sugar cube, for example, all that data squeezes into the size of a sugar cube. And so that really fascinated me because, you know, those data centers, you know that most of them is just an empty space. And there’s like energy constraint, energy users, you know, I came from a solar cell background. So for me, energy is something I care about. And so like when I heard of DNA data storage, and what it can do, like as a passive data storage facility, I got super interested into building that and so what I imagined the future would be is like, you can put your all the data that you have on Facebook, put it in your pocket, you know, petabytes of data. Basically, you don’t have to delete anything again, right. So that’s that’s the promise of DNA data storage. I don’t know if you guys watch the the the show on Silicon Valley I was a big fan of that show and then one of the episodes there they were talking about compression and like datageddon and there was this white paper from the International Data Centre that predicts we’re gonna run out of silicon by like 2025 which is like pretty closer and then we’re not gonna have silicon to start all the data in the world so we need to start looking into alternative data storage approaches. So that  show spoke to me about you know, like data get in like I don’t want to have a future where we’re going to have data rations and  that and so yeah, for me the future would be like, you know, everyone democratize Everyone has their like, DNA, everyone will have their own data center in their pocket. Yeah, so that that to me is a worth worth investing my time and energy on.

Forrest Meyen  06:06

It’s super dense storage of data, but are there challenges with like, read and write speed? Like, how do you actually read the data off DNA?

James Banal  06:15

Yeah. So right now, the way to do it is you basically write data first is to, we use a very old chemistry. And it’s very old, some like the maybe in the 70s, or 80s chemistry, and then miniaturizing that instead of like flasks miniaturizing, that into droplets, and micro arrays, really, and you start writing that one letter at a time, but there’s a limit to how much length of data you can write so that you can like multiple strands of DNA, and you write it simultaneously, up to 200 base pairs. And beyond that, you sort of like that we have diminishing returns on the amount of DNA you get. Still a very slow process, right?

Forrest Meyen  07:02

So, you know, like, how slow if I want to save a 10 megapixel photo?

James Banal  07:07

Yeah. So I mean, so there’s there’s a lot of different approaches one, one approach that right now I think the numbers I’ve seen is like 1000 terabits a day of writing is now feasible. So that’s like some news from Catalog DNA that they’re doing that printing DNA at, like 1000 terabits a day, which is, you know, compared to modern CPU, quote, computing architectures. It’s really not, you know, not yet comparable, but, you know, we’re still at the early stages. So I’m hesitant to say that, you know, we won’t be faster, it won’t get faster, it won’t. It won’t necessarily be faster because, you know, you know, you can’t be faster than electrons at some point. But you know, chemistry and very hard engineering like miniaturizing the volume significantly to squeeze a lot more unique DNA into tiny, tiny spaces would be the next thing to go parallization. So it’s really, really slow. So thousand terabits a day, is I think what’s feasible right now.

Jonathan Miller  08:17

The upside to that, though, I mean, despite the slow read speed or to the write, the slow write speed is that DNA storage has the say that the evidence shows that it could stay as a storage medium for measured in decades, right, like, half a century or so with with not as much energy input as you would need with say, the magnetic disks that we would commonly use, or perhaps even, like, comparable to tape, perhaps, right?

James Banal  08:50

Yeah, so that’s, that’s the promise of DNA data storage is really for archival, like it’s still probably first in class to archival storage, right? You know, I don’t know if you’ve got such a drastic part. But like, that’s like an analogy I usually use to people who don’t depend on. Like, want to have a grasp of like, like how long DNA can last, like, you know, like it’s you can you can get DNA out of fossils. So that’s how long DNA can I say that like, that’s one of the like, biggest advantage of DNA is really the longevity. That’s the upside of that as a storage platform. So yeah, definitely archival. You know, there’s not a lot of the storage approaches right now that can even come close to how long DNA data like how long story data can be stored on DNA. So that’s, that’s basically a different category of DNA data storage.

Forrest Meyen  09:49

Where do you store the DNA like is it in like a big vat of DNA fluid or is it like laid out on like a silicon wafer, like, what does that even look like?

James Banal  10:02

Yeah, so so that’s that’s a good question. So there’s a lot of different flavors. So that the answer to that question is, we have to go start with how do you first access the data? Because that’s actually how you figure out how you’re going to store them in a very different way. So right now, the most common way to access data using DNA data storage is using polymerase chain reaction, or PCR, which basically is you basically have a homing molecule that targets a specific ID on specific data, you want to target it, that ID could be the metadata of that data you want to access and then you basically do a polymerase chain reaction, you amplify that target many, many times. And then you put in the sequencer. And so that process of that molecule homing into that thing that you want to access is, you know, based on the Watson Crick base pairing, basically, you know, the A-T-G-C sort of base pairing that we all know from from, you know, high school biology, I guess. And so that that process is, you know, there’s a certain limit to like, all the specificity of that. And, like, in a large vat of like, you can imagine a large amount of DNA and you put that homing like, there will be some words that will be partially like, you know, sticking together, even though they’re not perfectly complimentary. In other words, they’re like the bass pairings aren’t always perfect, but there will be some partial interactions and actually it goes, go makes your retrieval system out of whack. So there’s the number of molecules you can put in in that vat is limited, because of that. So one way to…like the Microsoft research team, who has really pioneered a lot of this in the area is to use droplets of DNA with like a certain amount of data and barcoding in there. And then separating them into really this tiny, tiny droplets. And then merging. Yeah, so it’s like micro fluidics approaches. Oh, yeah. And the other way we were doing it right now is differently. So in our lab we did differently. So what if we really just limit the, you know, the interaction of the target molecule and the homing molecule to just the metadata itself, like the internal data is limited, so therefore, the number of the probability of finding like you know, partial interactions becomes slower because of that. That’s, that’s, that’s our sort of like innovation recently in the lab that you know, we’ve been pushing for. And so that’s, and so that, and so what it looks like now all of a sudden, instead of like large number of tiny, tiny pools, You can have very small tiny bowls, but there’s a limit still, you know that how many unique data you can have in that tiny pool. So that’s how it will look like in the future. It’s like it’s basically a wet computer, it’s not going to be dry computer.

Jonathan Miller  13:16

I almost picture like an aquarium sort of thing are full of the sea monkeys are like an ant colony, like almost like a zoo. But yeah, I guess it’s more like, you know, not quite as animated as that. Is it E. Coli ultimately, or is even more fundamental than that?

James Banal  13:38

Like, in terms of writing of like, what’s the what’s the DNA you want to write on? So yeah, it’s highly customized. So it’s not it’s not definitely like so the way the like the Microsoft team has done and I’m just gonna talk about them because they’re really the pioneers in this. They what they did was highly customized sequences that you know, like I was telling earlier about microarrays literally a silicon wafer with tiny, tiny strands of DNA poking out, and then synthesizing that through microarray synthesis. And so it’s highly customized. It’s not, you know, doesn’t encode, specifically for an organism. The way we did it is  we use the bacterial machinery, just because we want to make the beauty of DNA data storage is also, it’s not just archival, because you can make millions and billions of copies of that, but the age of the bacterial machinery, right?

Forrest Meyen  14:31

If you want to, like, you know, share some music really fast and rip a bunch of CDs, tou could use a DNA method to make just billions of DNA CDs really quick.

James Banal  14:44

Yeah, like for example, like if I go to say, if I’m the like, the Large Hadron Collider, like, you know, like they they have like, serious, probably close to exabytes of data right now come to those experiments. Like I want to make copies of this, like hundreds of copies of this, try doing that in today’s technology. I don’t know, I can’t even imagine the the difficulty of doing that. But with biology, thinking about that, like, makes millions of copies like it’s nobody’s business with a very low energy like tried to do that with current technology, it’s gonna take a lot of energy.

Forrest Meyen  15:21

Awesome.

Jonathan Miller  15:22

The Pirate Bay of DNA.

James Banal  15:24

Exactly!

Jonathan Miller  15:28

James, what’s driving you to work on this? Because this is awesome.

James Banal  15:34

Oh, it’s because it’s super hard. Like there’s a lot of like, like, you know, I think it’s like the common  theme at MIT is like, you do very hard problems. Like the solutions are not very obvious. And so like, it’s super hard. Like, it’s not like, you need a Nobel Prize winning idea to solve the problem. No, it’s not necessarily that it’s just the it’s very hard engineering problem right now, I think, for data storage to get to into the market. It’s really just figuring out how do we miniaturizing thing. How do you miniaturize things? How do you make reaction volumes tiny enough that, you know, we’re not using all the reagents and stuff like that? And so there’s nothing Nobel Prize winning about because it’s just really hard engineering. And that’s really, you know, that’s what keeps me interested in the area and what keeps wanting me to keep pushing, getting this out into the market. I think there’s there’s a lot of potential here. It’s just, you know, fairly hard engineering. Yeah.

Forrest Meyen  16:30

So, so you mentioned pushing to get it out in the market. So you’re trying to form this or you’ve already formed this into a venture and you’re pursuing commercialization of this technology?

James Banal  16:42

Yeah, we’re definitely thinking about that, you know, making a venture out of it. You know, the, I think the next step for data storage is not just, you know, you’ve talked about earlier about reading, writing. You know, there are players that are working on that, but by the really I think it’s, it needs to be in a point where we start thinking about the full stack approach where you have not only the reading, writing, but the access to random access, right? Like, if it’s just reading and writing, that’s going to be bad. Because, you know, even if you can write an exabyte of data, you need to find a kilobyte of data out of an exabyte by reading the entire exabyte of data that’s going to be super bad and wasteful. So you need to be able to somehow random access some of those in a building, sort of like that interface, to me is an interesting problem. The other interesting thing that, that needs, like needs to be worked out, like I think that’s why we’re thinking about adventures like, how will they actually how will it be customer like yourself with interface with it will be the same, like as the Amazon Web server, will it be something like, you know, like how Dropbox is and like, you know, so those are the things that that I think is worthwhile, you know, it’s not going to be interesting from elaborate On the lab, you know, academic lab, but we’ll be interesting for venture like, like what we’re planning to launch?

Jonathan Miller  18:06

Yeah. When you came to MIT in 2016 It sounds like there’s some the the early stages of this project which was sort of like okay, James you’re gonna work on this How is it different than what your application in your like your statement or research statement? Was?

James Banal  18:24

A Yes sir. evolved really. So you know, when you’re going to work when when I was given this project, it was it was definitely out of my comfort zone. And because I came from solar photovoltaics and but but, you know, I’m always willing to do like, go out of my way out of my way and try to figure out something new. But sort of like the funniest thing is that I started to like, learn about computer science a little bit starting to learn about it looks into that synthetic biology. I had really good lab mates who were like very patient with me on this project who work with me and, and, you know, we’re basically my co authors in the paper who guided me and into this, you know, journey we are so far in and basically, I wasn’t organic chemists and physical chemistry, like, was always like looking at lasers and you know, shooting stuff on the glass has some colored stuff on it to like someone who’s like no messing with biology and trying to integrate it with with, with, with some some new materials that we that we thought we could be useful for. So it’s sort of definitely evolved, like the research statement of, of like, just gonna solve this problem to like, let’s try to solve the entire full stack problem of data storage.

Jonathan Miller  19:51

What What does 10 year old you think about it? 10 year old James.

James Banal  19:57

Definitely, I think 10 year old James, I’d say Yeah, you’re crazy. Why would you do something that you know you? You know, I always been in the physics chemistry side of things. And like I’m going into biology now and like, like you’re crazy. Why would you attend that but

Forrest Meyen  20:16

it’s been a fun journey. It’s been good when you were 10 you are in the physics and chemistry side of things.

James Banal  20:22

I’m that I was definitely my parents were in orinda government, the science of government. Okay. They were working. My dad was a funny enough My dad was a zoologist. And my mom is microbiologist by hated they’ve always bring me into their lab. And I do like doing it. And then but there was

Forrest Meyen  20:42

a now you’re going in on the weekend.

James Banal  20:45

Like my mom would ask me to count the number of cells on a certain number of like, colonies on a petri dish or something like that. for free. I missed your lab, obviously. bears in their in their office that I would go up to and like, and then I would there was a physicist in there was a Vulcan ologists really who I would talk to about. And yeah, I was I was definitely from like, always fascinated by physics and chemistry and biology is like, like my dad would have like, would show me how to kill a rat for like the like, medical medical stuff. So it was definitely I think my friends would be very proud that I’ve conquered the dark side.

Forrest Meyen  21:33

But what a childhood. Yeah, wait, so when would you say What? How old were you when you first started counting cells?

James Banal  21:42

I would say I was seven years old. I remember the reason why I remember this vividly because that’s when I started to be really fascinated by science. My, my dad gave me this world element act for kids 1997. And there’s like so many like things about And then my my, my aunt gave me this book about space about dinosaurs and that’s like, yeah, that’s that’s where my journey began was like, like I got fascinated by space actually. So that’s that’s basically how he got into like science early on but, but definitely not like in a two year old like some of those prodigious people.

Forrest Meyen  22:23

I don’t know that sounds pretty early to be working in a bio lab. I know I wasn’t doing that when I was seven.

James Banal  22:30

Yeah, I hate it. Like, you know, like when you count. So we remember, my mom doesn’t listen to this. Because she would ask me to count like colonies and there’s like this code like, too many to read. dm DMT er, that you would put if it’s like more than like, it’s just so hard to count and I would always wouldn’t be empty or it was. It was so so boring and like I can’t believe we’re doing this till I think automated and I wish I learned about machine learning back in the day, I think I automated But no, I was a kid. So that’s what it is. I was cheap labor.

Forrest Meyen  23:13

Exactly. That’s what kids are for your own kids. I have a two year old so I’m gonna start putting them to work

Jonathan Miller  23:26

with you said that this is like, this is such a challenge like this is this is a big, big problem area with so many unknowns. And is there something that when you when you feel like it’s, you know, this is like, yeah, this heart I kind of feel like quitting. Have you felt? Have you felt like, you know what, maybe there’s something else I should be working on besides this big problem. And then secondly, since you’re still working on it, what, what helps bring it back to the back to the lab to do this hard stuff?

James Banal  23:57

Yeah, I mean, yeah, for sure. I think, you know, like, I almost quit on this project for like, a lot of times, you know, because it wasn’t my comfort zone, right? I had to learn a lot of stuff, I had to fail a lot. So that that, like, because this is not on my area, so I didn’t know the tools again, no, like, Oh, you shouldn’t do this, like, you know, for example, a very long DNA shouldn’t like iPad very rigorously, only discussed breakups. I didn’t know that, you know, like, my PCR were failing. And, and basically, like, all my experiments were failing. Definitely, there was a lot of I there was a point where I think 2018 I went to my office. Basically, I told him, like, I’ve tried everything. I can, you know, I don’t think I’m up for it. And then I gave him slides of all my progress, and I asked him, like, maybe we can contact this professor who would be like, probably more, has better people to do it. And then then it wasn’t that it was a Friday afternoon and then Manny said, Yeah, let’s let’s call it over on the weekend. And I’ll say, send me those slides. And then you know, let’s go back. Let’s meet up on Monday again, to see if I efficient, send it or not. Then the weekend came and then Monday came and we had a meeting again and basically basically said, No, you can do it. I know you can. So I feel like he basically told me to, like, do the project again. And I gave him a lot of credit for like really believing in me even though I was not the right person for the task and like, failing, I feel a lot for two years had negative data until like, I got my first positive data 2019 around July when he went to Croatia, and I was like, This is my last chance now like Michael Jordan fought for 40 seconds of the clock. I need to make something like I’ve learned so much. Somebody figures they have to figure out something and so I just kept Wishing and you know, like, for me that, you know, if I if I gave up to that in 2018 if like, I listened to my son might be I just said, Yeah, yeah, you know, you’re not capable of it. You know, it would have been a different story. But you know, I, it’s all credit to him, like believing in me and trusting that would, I would I would try my best to like figure it out. So yeah, I mean for like, those challenging years of failures of their failures two years, or two years or three years. What came me force me to come back into lab, you know, even now, like coming back into the lab is just, you know, sometimes just try a new idea. Like, you know, like, there’s this, there’s this quote from like, you know, there’s, there’s 10,000 ways to become Thomas Edison, there are 10,000 ways to fail. You only need one way to make it work. And so you know, that, that, that, that that is like sort of the thing that, always in my mind, boggles my mind is a very important thing. That scientists or anyone, entrepreneur or whatever, that you need to have, like, you will face 10,000 failed attempts, but that’s no 10,000 ways to fail, but you just need one way to make it work. And so, you know, if there was like a very good story behind where we are right now, that will be like a lot of failures, until we found that magic magic sauce that that made it work. Yeah.

Forrest Meyen  27:22

So explain how you felt when you saw that positive data? Like, what were you doing? Like, what do you what is it? What is the positive data even look like?

James Banal  27:32

Yeah, so I was in the flow cytometry and MIT to cook Institute and it’s like, this is it. You know, it’s like Michael Jordan, you know, making a shot 10 seconds of time, and then the signal and this flow cytometer you know, the data points came in and I was like, Ah, it’s exactly what it should be like. I My first reaction was I was skeptic if there’s one, there’s one skeptic of my work. I’d like probably the biggest tech skeptic of mine. And then like, my people who worked with me probably doesn’t know that but like, I would spend hours and hours trying to convince myself this is true You know, I’m always a skeptic when something like ghosts work like if you’ve been if you’ve been failing for two years, and something works might you start to be skeptic This is like unusual this that this thing becomes like, unusual

28:29

not supposed to work.

James Banal  28:33

So yeah, I spend, I look at the data that repeated again repeated again until like, Okay, I need to show this to my key when it comes back from Croatia and basically, yeah, I was like, hoping someone would tell me your idiot, your bias, but everyone said like, Wow, it actually works. So you know that that became the that that was a sigh of relief that feel like someone said, Wow, it actually work, you know? Um, yeah, I’m the biggest pessimism my own biggest critic of my own work. Were you trying to save like a particular message like hello world? No, no, we’re actually were what we were doing back then it was to get out the picture of what he called is a picture of Abraham Lincoln, from a pool of data. Cool. Yeah, exactly. So I was trying to build like the Google search engine, like, like I would, I would

Forrest Meyen  29:34

see you pull it up. And it’s just a picture of Abraham Lincoln. And that’s how you knew it was real. And you’re like, actually doesn’t look like Lincoln. It looks like Washington status.

James Banal  29:44

Oh, yeah. Like it was like, I remember like seeing that data. Like so the photography happened like the populations are right, and we went to sequence that’s like the final as the final check. And if like any errors in the sequencing because we didn’t really like we were Not thinking, error correction back in the day, like, any error on that base in the sequence there if there’s any sequence, we won’t get the picture of Lincoln like we would have. And so I saw the picture of Lincoln I like and I told my I just shared this I couldn’t believe it like yeah, we just pulled out Lincoln and it wasn’t like we’re like putting the name Lincoln out of the pool we were doing like how would you do it in Google like a like precedent and not eating sanctuary for example, cuz you had Washington also, in that poll of me should want to pull out anybody seen a Boolean search query, and we got on LinkedIn and basically though, that was that was that that was the that was the beginning of like, let’s let’s just finish this and I think we got something so that that was a interesting time. So 2018 is definitely up there. You know. 2010 is definitely just not going to well. Yeah, yeah. So that’s that’s basically that feeling was was definitely something that I really like.

Jonathan Miller  31:08

You mentioned that you were able to use Boolean operators on this data and get our search result that you were expected to, or did not expect to, but you’re hoping for deep down. So then is one of the next phases somewhere in the pipeline to be able to, to apply the machine learning algorithms on top of this present, like large massive data that you have stored.

James Banal  31:33

Yeah, I mean, that’s the beauty of DNA as well is, you know, one thing that that that people should remember is that DNA is our molecules. And like all other molecules, they do have reactions in 3d volume. And that’s a very huge interactions face like and so you can basically think of DNA This is like the, the whole concept of DNA, competing back in the day started as like this, this massive degree of polarization almost Like you can think about is that you have an Avogadro’s number six to six things six times to the 23rd number of CPUs that you can use to do some computation. And you know, I don’t think any computer right now can that has that amount of CPU power. And so that’s, that’s like the next big thing is like, how can we use this platform enough Boolean logic? Cool. But then I think the next step would be like, can we apply some some of those machine learning operations and stuff like that, for data storage, where it becomes not only just a static storage, now it becomes a computing platform. And there’s, again, this is coming from the Microsoft group that they recently put in, within a preprint were in they were able to do machine learning on a data set of data images, basically, how they basically they basically show like how we use How an image Google search engine would work on a DNA data storage system. And so like you will have, like, for example, when you just look for a black surface on Black Cat, you’ll have the array of different results that you’ll get from Google Images. And basically, they were showing similar operation on a DNA data system. And so that that is a interesting direction for the field. And I think that’s going to be an important asset for data storage as we move forward. I mean, the like, with all this capabilities, who knows what, what, what, what, what the future could be?

Forrest Meyen  33:35

So when you’re when you’re building up your company, and trying to spin this out into a venture, who’s, I mean, who’s gonna buy this, who’s your first customer who’s the person that really needs the solution and, you know, is ready to, you know, give you give you some money to store and multiply massive amounts of data?

James Banal  33:56

Yeah. I mean, so assuming you No, we solved in a rag. It’s all DNA read costs and like everything’s becoming, you know, hunky dory. So I’m gonna, I’m gonna start with that assumption, I would say, you know, the biggest ones would be those who would have large amounts of archival data like Facebook or Twitter. You know, I think most data most social media companies they are they’re the biggest culprits why we’re heading towards this data again, and it’s like we’re losing a lot of data that we right now it’s social media. You know, I just learned about Tick Tock because of Coronavirus Tick Tock. Yeah. I like my, I was like, Oh my god, it’s like so many videos are like, and they all go to the argument like people are just copying other people’s dances. Yeah, I’m not gonna even start that discussion.

Forrest Meyen  34:47

But oh, well, we’ll make it we’ll make a tough tech today. tic tocs. before we let you go, we’ll do a one of the dances.

Jonathan Miller  34:56

Yeah. Jamie, you mentioned you mentioned The term data get him. Yeah. What what? What is it walk our audience through what that could be.

James Banal  35:07

Yeah, it’s actually like a coin word I, I got from a show from Silicon Valley, the show Silicon Valley from HBO. It’s a, it’s basically to a point where, you know, we’re running out of data that that we can use. So we run out of storage can use to store data. And that scenario could be like, like HBO shows. It’s a parody of Silicon Valley, really. But it’s like, it’s an idea that has like, that actually, listening to me is like, when to when we get to a point where the amount of data we’ve generated is, is like reaching to the capacity of what’s possible right now with with silicon based storage. Like, do we start rationing data do we start like, you know, ration data for every person in the world and you can only use 500 megabytes for your, for this day, then you’re telling

Forrest Meyen  35:56

me that the like, the internet’s about to fill up Like, what’s the situation? Like? How do we actually run out of data storage? Right, just make more hard drives? Are? We got no materials to make them?

James Banal  36:12

Yeah, I mean, like, most of the hot data we use right now they’re, they’re, you know, during Third, we delete them, right? So some of them gets deleted. And, you know, we don’t really care, but there are some data we, that degenerate where we don’t want to delete that. And they’re occupying, you know, some some, you know, basically some in the silicon. And so the how are we going to run out of data is because the that data, the amount of data that we’re not deleting is increasing. And so we need and we were getting to a point we’re going to go to a point where there’s not enough silicon left in the world to to make this this hard games. And so that’s, yeah, so that’s, that’s the scary scenario. That’s the worst case. scenario for, you know, like compression can only do so much right? Like, some people would argue, you know, there’s there’s a lot of ways Dropbox has, has some very interesting compression algorithms. But compression can only get you to so far. And so from the hard drive us or our sorry, the hardware hardware side of things, you start thinking about how much silicon do we have in the world to actually accommodate all this data and you someone did the math and the paper in nature materials. You know, we were going around of silicon very, very soon, for,

Forrest Meyen  37:39

like years decades,

James Banal  37:42

is believe in the cloud, if any fire report like in 10 years or five to 10 years, that’s where to claim but I’m assuming like, you know, I don’t think that’s going to happen anytime soon. You know, maybe like there will be some of the other Aside from technologies to start to come in and say, you know, let’s let’s sort of have a band aid solution to this. While you know, the other technology, other storage technologies, maybe I would, I would be conservative and say it’s going to be like an decades, maybe 20 years or something like that. Yeah.

Forrest Meyen  38:18

But in the long run, that means there’s tremendous demand for technologies like yours or even your technology.

James Banal  38:25

Yeah, definitely. Because that, you know, the amount of data we gotta generate, it’s not gonna stop anytime soon. It’s not Oh, no, yeah, no doubt.

Forrest Meyen  38:34

Is it exponential, and it feels like it’s exponential, at least on my hard drive.

James Banal  38:42

I think especially with

38:44

all these podcasts videos.

Jonathan Miller  38:48

Yeah. 1080 to 4k to 8k, but

38:51

yeah.

James Banal  38:53

I mean, like, just the movie industry itself, right? Yeah. It’s they’re moving to the 4k. Now. blu ray to 4k, too. 8k quality of so it’s kind of just go up, keep going up and up. And we just start like, no, there’s with apps, social media apps. You know, for for videos like PB, for example, I’ve never heard of that someone told me I have a friend told me about it. Like, there’s so much there’s another like, tick tock like sort of app. So you know, just just just gonna keep going. I don’t think it’s gonna stop. So, so definitely, we’re heading towards that path. And how long it will take depends on like, what kind of Band Aid solutions we have until like to figure out alternative technologies to store data in a much more sustainable way scalable answers family.

Jonathan Miller  39:44

My understanding is that in terms of DNA storage would be one of the mediums that would help us instead of this through this data get in the sort of post silicon storage era. All right, and so DNA that aside from support systems, Cooling or what other whatever is needed that DNA itself could store the amount of data generated globally for an entire year could store that in approximately a one meter cube. Right?

James Banal  40:14

Is that right? Is that consistent with with some of your findings? one meter one meter cube that’s been a bit like today like generated today?

Jonathan Miller  40:23

Yeah, cubic meter. Yeah. Say I don’t know like modified d E. coli or whatever. To to store that much data.

James Banal  40:31

That’s probably close to that. Yeah. It depends on like yeah, I say something close to that. And then the reason why I’m like hesitating is because it’s like, are we talking about dried in a wet DNA? So but yeah, so yeah, so

Jonathan Miller  40:45

yeah, so fingernails James How many fingers

James Banal  40:50

so so so yeah, I think like, you know, I think the Wonder kid is a fair a fair number. Is that still pretty dense right all day in the role like if I think about it, Like, how many Facebook data centers would you need? Like they’re probably exabyte data centers. So like if one exabyte data centers equivalent for football fields, right, like, approximately, and so how many that’s one exabyte and how many exabytes have we generated right now? Probably 200 exabytes or something like that. I’m just like, putting out numbers out there, but I don’t really What’s the number? So So, you know, like for football fields times 200 that’s the amount of space we need to keep making and then, you know, on top of that, keep making keep making keep making until we accommodate all the data we have the general good right now, so the i o heating, that’s, that’s the holy, you know, the, there’s no sustainability like manager required, you know, you have to control the environment still and hard disks and our magnetic tapes, right? Because they would degrade with, you know, very harsh temperature or humidity. So there’s definitely something Some some energy factor in there. So you know, yeah, if you factor all of that, then definitely it’s not just the density on Carville and you know, the ability to make multiple copies, but also their sustainability, sustainability and scalability argument you can make for gaming data storage.

Forrest Meyen  42:17

So how many years Intel? You can get a, like a data storage center online, like what do you? What do you think?

James Banal  42:26

Yeah, I mean, so what I’m hoping really is like in five to 10 years, you know, we think that’s a, that’s a conservative estimate of how long we’ll try to figure out in your write, so that it becomes really, really cheap. You know, we need to like, I don’t know, I think six orders of magnitude. Sort of like drop in cost of DNA synthesis to make it viable. So Danny knows the cost right now. You know, it’s it takes about, you know, hundreds of billions of dollars to store a petabyte of data. So we need to, like drop that cost significantly. So that, you know, the, the, you know, the average Joe can can, you know, accommodate and use that sort of data storage platform? So, that’s that’s like five to 10 years. And then you know, Dena, that’s quick that’s quick. I mean, very hard engineering I you know, if the thing we would with this area, I think is there’s not there’s not a lot of funding for dinner for like, this, this space, and there’s like, some funding was so

Forrest Meyen  43:36

new, like, I mean, a lot of people haven’t heard of it. And I didn’t know we’re gonna run out of data storage like,

James Banal  43:44

hey, there’s a you know, there’s good reason to invest some money into this technology. I mean, they’re definitely so this is semiconductor Research Corporation, which is a conglomerate of thumb, the center conductor folks are looking at dangerous Start should have this semi stem bio program where the outlines of their roadmap. So some of the technologies that they’re interested in daily data storage is one of them. Definitely. And so, you know, I think, you know, the big players like Amazon, Google and Microsoft, Microsoft is already like doing some, you know, research r&d on this. Obviously, we there, Karen Strauss and Louie says a in doing really a lot of the work early, but really funding a lot of the directions on this. This hair I mean, I mean, today, though it I don’t like the word moonshot, I think it’s become cliche. But I think like if one feeds the any of the surges of one shot, there are a lot of verticals that that could emerge from this, like, for example, like if, for example, they spend some billions of dollars trying to figure out DNA, right? Like, it’s not just any of the stories all of the sudden synthetic biology gets, get some get some, get something out of it, because, you know, to write really, really long letters Have DNA for static biology for therapeutics, you know, there’s, there’s this, there’s this crazy like gene editing now, there’s, you know, writing DNA, very for very, very long DNA at very high purity and low error rates would be a boon for that industry. So it’s not just the Indian story centers, there’s this whole area that it’s affecting the random access. point that, you know, if you can use the DNA as a storage device, and all of a sudden, you can, you can sort of take a snapshot of all the DNA of everyone in the world, all the species in the world right now that you can do it current laser technology is just too expensive, right. And it is almost impossible to do without the energy required to start all of that not all the time, you can do that which we can’t do anymore. We can do right now. And they act and they’re in the read side where sequencing is like if we drop the cost of sequencing, two pennies, that’s going to be huge for all of this. You No, personally genomics, sort of businesses are booming right now. So that’s no way I think about data storage, like putting a lot of money in here. It’s not just going to be on just, you know, the semiconductor industry, but also there are other verticals, where there’s going to be a lot of boon for him.

Forrest Meyen  46:19

Now, there’s just tremendous spin off potential. Exactly.

Jonathan Miller  46:23

Like if you talk to a kid who’s presumably not part of your family, since your family seems to get when we talk to, like, you know, your traditional kid playing with sidewalk chalk, but how do you how would you explain like, what you do?

James Banal  46:43

Yes, pretty hard. Like what age like so, you know, like a five year old wouldn’t know would have no concept of like data yet. I don’t think they’re picked up yet. So, like, maybe you know, like, for example, I would explain to a five year old you know, like, I would say, I haven’t had to In California was probably like three years old. You know, he loves the movie cars maybe I would the way I would say is like, you know, you know that movie cars you know, like all your all that other YouTube videos he has his as an iPad he gets glued into all those videos, you know, you know all that is stored somewhere like in a warehouse that is as big as your house. And you know, what I’m doing trying to do is to make sure to minimize that house into something that can fit on your palm of your hand. So that you know you can watch all of those movies for free now democratizing and democratizing Bobby that’s a five year old would understand. But like like someone older, understands a little bit about about but the constant data is like I would say, you know, the fact that I have you know, average height, black, black hair and brown eyes is because of that is encoded my DNA that information is caught in my DNA. And so like that’s that’s a lot of data and not just that that’s just like my appearance but like a lot of the things that’s happening in my body is encoding identity. So there’s that tiny tiny molecule that inside our our body contains a lot of information already and you know, what is the can do what if we start putting that data that we generate like to our mobile phones into that tiny tiny piece of DNA and so that’s and that’s what I wanted to that’s what I want to achieve is basically put all that data you generate in so that you never had to believe anything again. You know, it’s it’s no I don’t want to pay Apple anymore premiums on just to buy a 512 gigabyte phone and pay apple. That just as annoying as

Jonathan Miller  48:53

that sounds like pretty simple label almost literally have thumb drives,

James Banal  48:58

right? Yeah. That’s literally Come there I, like maybe maybe fingernail drive is probably like, why we’re trying to do with the data storage.

Jonathan Miller  49:08

That’s amazing story, James.

Forrest Meyen  49:12

I like the idea of just putting the DNA for all the animals into like, a little cube because then we can send it to Mars, you know, just in case.

James Banal  49:21

Yeah, definitely one of the, like, passion projects I have is, I’m from Australia. And so like the fire, the Australian bushfires was an eye opener for me, like the amount of damage it did for the Australian wildlife is just insane. And, you know, I was I was there 2019 and I was working again to the storage I was like, What if I, I think my technology and do something about storing a snapshot of those animals somehow and you know, send it to move, send it to the moon and sort of like, but I don’t know who’s gonna pay for it. Like, should I ask the delegation as their expense like who gonna pay for that, but it’s something that we as a

Forrest Meyen  50:02

column up

James Banal  50:05

Yeah, maybe I’m on

Forrest Meyen  50:07

my bike, we’re going back to the moon maybe in 2024.

James Banal  50:10

Yeah. You know, like, No, we have actually called the frozen Zoo and the frozen ark to the frozen zoo in San Diego County us the frozen Ark in the lot in London and there’s another us bombard seedbank in remembrance of Nori, and maybe suddenly that something somewhere into like, where it’s always cold. And basically, we’re storing a lot of our diversity in there taking snapshots of that, but to me, I think like, we’re just this word, like some of the, again, the verticals that I see for data storage, just like even figuring out a way to store like a lot of the snapshots that they have of our society right now. And, you know, maybe you know, 2050 years or 100 years from now we want to look back and see what what we are as a society. There’s diversity, you know, some some record of that, you know, because you Try to do it now. In current technology, you can’t just put your all the sequences in the cloud, you know, we’re gonna run the data all decided now, like, it becomes a chicken and egg problem, like, Oh, I have to start all these sequences, because when you do sequencing, you get a lot of this very big pile of data, all the sequences, and all of a sudden you get out of data. And so it becomes like chicken egg problem. But if you can just store that DNA molecule of that codes that that organism and take a snapshot of that and store an art catalog of that of our society in 2020, or 2021. I think that would be unimportant thing to do right now considering of climate change and stuff like that. So I don’t know, I don’t know who’s gonna fund that. But it’s definitely a passion project of mine. As you know, I’ve been fixing to like other folks who would like to name they thought it was an interesting idea, but you know, where to get the money’s

Forrest Meyen  51:55

worth? What would you volunteer to be in the database?

James Banal  51:58

Oh, yeah, definitely. Yeah, I mean, I’ve always there’s always going to be, you know, people like, I don’t like big butter.

Forrest Meyen  52:08

There’ll be little clones of you in the future when we need more humans.

James Banal  52:13

Yeah, I’m not. I’m not I’m not like, I don’t have any issue with with. With that. Yeah. I mean, yeah, I mean, that’s, that’s for sure. That’s, that’s definitely an argument against or that it’s, you know, surveillance and, you know, your DNA being used to, you know,

Jonathan Miller  52:35

to discriminate folks so that that there’s, there’s, there’s definitely some ethical arguments against that, James, it’s, it’s, it’s a inspirational picture you paint of, of how we can sort of get through some of the, the shadows of movie like lack of atoms that we have like in terms of silicon to be able to get past that. And the way that we may be able to, may be able to preserve just as an insurance policy against As you humans kind of screwing things up sometimes as a way to help protect nature in one way or the other,

James Banal  53:07

yeah, at least have a picture of it, you know, like it’s not a shot. So that you know, we can go back. Oh, that’s how how alcohol I used to look like how a woolly mammoth like, you know, it’s just funny. Like, we were like, now looking at how what the woolly mammoth dinosaurs look like, that’s gonna be who we are probably 50 years from now or like, hundred years. Looking back. Hey, that’s we are

Forrest Meyen  53:27

way longer than that.

James Banal  53:29

Look at 2020

Forrest Meyen  53:32

I think we’re gonna last longer because there’s, there’s, you know, people like you, you know, working on stuff that’s gonna solve some really serious problems that aren’t even on most people’s radar yet. Yeah.

Jonathan Miller  53:44

Yeah, that’s just we just got to keep getting the lab James.

James Banal  53:48

Actually, might be I know is that like, yeah, definitely like him, like supporting me going back to the lab and there’s COVID like, yeah, I appreciate that.

Jonathan Miller  53:59

Well, James, it’s been really great. Oh, we’ve, we’ve come up to time. I want to be respectful of that. Yeah. Do you have any final points that you’d like to tell us or? Or the audience?

James Banal  54:14

Uh, yeah. I mean, so I think, you know, I think there’s there’s a lot of challenges that you know, there for data storage, just send a quick little story if you go back to how it started from humble beginnings, to now, and just like, I look out for like, the next, the next step is to gain digital DNA, digital data storage, but it’s gonna take a long time, and I think some people are skeptical about it, and just, I just want to, like, tell like, people that, you know, some something like this, that takes time, you know, integrating circuit, they’re getting circuits, then we didn’t have our computers right now. So it took a long time. So you know, I I think people who are really excited to feel like what am I going to get that he’s going to get 10 years from now? I think he needs to be a little more patient. Yeah, technology just takes a long time. And I think everyone, everyone’s trying to get that this technology as quickly as possible. And so and that’s it. If you’re interested in learning about data storage yet I’m always happy to, you know, schedule a call and stuff like that.