Episode 6 - AI Flops and Failures

Andy B: 0:00

All right.

Kiran V: 0:00

Are you ready? Yes, welcome everyone to today's podcast on AI FYI. Today we're going to be talking about some AI flops and failures. So, as we've all probably experienced in many ways, ai does some really awesome things. It's able to mimic human language with things like chat, gpt. It's able to identify all sorts of objects, images, track things through videos and do things like speech to text when you're talking to Siri, or have assistants like Amazon Alexa. But today we're actually going to expose some of the flops and failures in AI and try to expose some of the weaknesses about this powerful technology. So I'm here with Andy and Joe and today we're going to talk to you about some of these classic failures in AI and I wanted to say at the top.

Joe C: 1:04

You know, this is the sort of thing that companies and whoever's making AI should probably be figuring out as they train models and evaluate them before being released out into the world or into production, as we say. So I'm going to guess that most of our examples are examples of where something's been released down to the world and then it goes south.

Andy B: 1:23

I'm going to go first because I coming in hot on this episode. So one of my favorite movies from the 80s is called Trading Places. It's Dan Aykroyd, eddie Murphy, jamie Lee Curtis and it's about commodities exchange markets and other things aside, the whole plot of the movie is about if you can know the orange futures a day before the orange futures markets open, you win the market, and that is essentially the premise of robo trading. So the economy really invested in the stock market S&P 500 for better or for worse, and sometime in the 80s, around the time that this movie came out, robo trading was introduced into the stock market. And what robo trading is is basically these regression models that are built out of big, big cohorts of scraped data and try to predict a little bit ahead of time what the changes in prices are going to be and then make those trades faster than a human trader on a floor could be. So this is an example of an unsupervised model built off of previous data. These regressions are essentially just mirroring the decisions and judgment that humans have made before, but on much, much more detailed data. And, to be honest, robo trading has oopsie the stock market many, many times. For example, in May of this year, somebody released a fake photo of the Pentagon with a fire or an explosion. It was AI generated and it dropped on Facebook and then like immediately went viral. And then somebody copied it over to Twitter and then it went even more viral and within like a few minutes, human beings were like, okay, well, that's, that's probably not correct. Right, the Pentagon did not just explode, this is an AI generated image, but robo traders actually tank the stock market. For like 10 minutes they did recover, but that shows you the speed at which these things are functioning. Right, these hedge funds and trading organizations that run the stock market and the robo trading is some estimate, between 50 and 90% of all the trades made in the stock market. The sooner they can get data from the real world, put it through their algorithms and then affect what they're buying and selling, the better. So they're obviously scraping social media in real time and then, based off of that, running through some regressions and then deciding buy or sell on certain types of assets. So, yeah, the regression didn't realize it was a fake, ai generated model and the S&P 500 tanked. It recovered pretty quickly. But that just shows you how like super brittle these programs are, and the head of the SEC, gary Gensler. He actually wrote a paper two years ago about deep learning and financial stability and he wasn't even talking about these regressions that run the stock market. He was just saying, like we're going to have a lot of these models expand to being based on a couple foundational models and if everybody's running on very similar data, these systems become very similarly fragile. Part of the reason why the robo traders are able to tank the stock market so quickly and recover so quickly is because they're also watching each other. So if one of them starts doing something crazy, the other ones are like what does that robo trader know that I don't and starts following the patterns and they become this kind of echo chamber and when that's 90% of the trades in the stock market, that can cause all kinds of wacky effects.

Joe C: 4:59

Oh, I can imagine that we're probably going to see more situations like this. I just read a really interesting article on just how untrustworthy the internet could become because of AI. There are going to be more images and videos and whatever else data that's going to be fake, and if we have models like these, listening to what's out there in the world, it's going to be compromised.

Kiran V: 5:21

I can actually attest to a use case in the stock market as well. I worked at a previous company called Icentium where we were actually analyzing sentiment in social media. So we're ingesting a whole bunch of tweet data and what we did is Twitter has hashtags, but they also have cash tags and I guess I should call it X now Twitter X. They have these cash tags that people post. So when you say dollar sign a APL, it will actually identify that as an Apple stock. And so what we were doing is we were actually ingesting all these tweets and we were analyzing the sentiment per cash tag within each tweet. So, for example, if someone said I love $$aapl and I'm selling all of my $$msft stocks, we would actually predict that as a positive sentiment towards Apple and a negative sentiment towards Microsoft. And I built the simulation tooling to actually run trades and visualize returns based on the sentiment found in these tweets, and I can tell you it was pretty crazy the types of things that we were predicting in certain cases and while we were claiming we're 3% ahead of the market. As someone that was actually working there, I can tell you it was very much human, moderated. The machines were not given the ability to make decisions because of how volatile those predictions were at certain times, something I keep thinking about and I think we're going to be thinking about as we go through examples.

Joe C: 7:07

But you do kind of have to stop and compare how this would go if it was just humans doing the stock trading or whatever it is, and a lot of times, I think, on net or in total, these robo or algorithms or machine learning models are probably doing the thing they're meant to do much better, but there are flops and failures along the way. So just a little perspective there, and it's important to note that these robo traders aren't just working on the US stock market.

Andy B: 7:37

They're working on all the major international stock exchanges as well as international commodities markets, so you can actually download open source packages and run models yourself that tries to look at the daily fluctuation of currency exchange values and buy and sell to make a profit. My parents have stories of doing this when hyperinflation was hitting Yugoslavia when I was a kid. But the fact that there's enough people out there to support a whole open source community of trying to make this really profitable really tells you that some people think of this as kind of a game a little bit. When, in reality, can you imagine if enough people were running that and some nation's currency got tanked on accident? That would be not great and it's absolutely within the realm of reality. This kind of stuff happens in the stock market all the time. There was a lot of news about this happening in like 2010, 2011,. 2012 is when it really kind of came out how many trades in the US stock market were AI driven, and since then it's been a little bit quieter, but know that this stuff hasn't gone away. This is the foundation of our money markets currently.

Kiran V: 8:47

And I do want to call out that this is commonplace. So this is regularly how the stock market works. And though these machines can be very volatile, we've also seen humans be very volatile, right? We saw this in the latest Silicon Valley bank collapse, where a few people started taking their money out and then suddenly there was this massive outflux of money from their bank, which immediately caused them to freeze all trades and actually close the bank down and eventually ended up selling it. So while machines are volatile and can be scary, it is totally a normal thing to happen in the stock market. So don't be frightened, it hasn't actually caused any depressions or regress, you know, recessions, anything like that.

Andy B: 9:36

These are often just a couple of minutes, sometimes a couple of hours of blips that tend to self control back within the same day of market, which is really fortunate.

Joe C: 9:48

I really love tying back these use cases and these scenarios to like what would happen if just humans were in control and humans disagree all the time too. So I think back in the day, if we had like image fakes, a human could also pick that up, or a group of them, and make poor decisions. So I think a lot of machine learning is just sort of aggregating what humans think and trying to get to the right answer and hopefully every time that something like this happens, that company or that the folks creating the AI will learn from it and do better next time. I hope that's maybe why we're hearing a little less of the stock market stuff these days, and maybe it's more of a thing of the past. But we'll see.

Andy B: 10:33

I wonder who the person was that made the AI generated Pentagon on fire image and posted it, and if they had any idea that, like within an hour, the stock market was going to go boop because they did that there are bad actors there are malicious people who are maybe doing this on purpose, or that was their goal to really manipulate the stock market? Or could just as easily have been somebody who just didn't think about it and thought it was funny, and then realize how interconnected all these systems are, that social media feeds directly into these trading algorithms, right, yeah, that interconnectedness is a really big part of this, I think.

Kiran V: 11:10

The crazy part is we're still looking at such a small amount of all of the data coming out. I'm willing to bet it is unlikely that any of these trading models actually looked at that image, but probably more looked at the text that people were posting about the image, because doing this sort of image analysis in real time and then making trades on that is actually incredibly difficult. And as much as we've built, you know, really cool AI and a lot of systems that talk together, there's still a lot of gaps in what we're able to do, and it's a computational problem that there's physically just not enough compute power being put towards these problems to be able to do these kind of things. So, for example, a challenge we run into at work is again the ability to analyze images in real time, because it just takes so much power and it's so expensive for us to do. And so you can imagine, as we continue to increase our compute ability, some of these things, more of these things, are going to come online and you might actually start looking at actual images and translating that into some sort of stock market decision, which is exciting and scary at the same time, because of you know the types of things that we might start seeing in different data.

Andy B: 12:38

And I actually look ahead in machine learning and these types of models, regression, called weights basically and features. Long story short, you have lots of different kinds of data and you let different amounts of it affect the algorithm differently, right? So the stock market is probably more influenced by a fake image about the government than it is influenced by a fake image that goes viral about a celebrity. So that's just also something to keep in mind that the content like if a fake image or video drops about a government or world leader that can influence the downstream tweets and texts about it, messages and forums that gets picked up by NLP algorithms, which are computationally much faster, and gets added to the trading and weighted differently based on what subject it's talking about.

Joe C: 13:29

We should consider doing a companion episode where we look at where algorithms have seen real time data out in the social world, looking at images coming in and made really smart decisions, hopefully looking at data that is real. But you know, I think there's always the other side of this, where it's doing the AI is doing its job pretty well and taking in the right data to make the right decisions.

Kiran V: 13:53

Yeah and I'll actually get into a little bit of this when we start to talk about NLP. But the crux of where we are is, while machines might be simulating human behavior, in some cases it doesn't actually understand what it's doing. The concept of understanding is still very much lacking in AI, and so when it is making these trades or making decisions based off of what it's seeing, again it's still very much doing pattern recognition. So it's doing pattern recognition versus actually consciously thinking and making a decision like humans do, and so this again is like a really big gap between you know where humans are in our cognitive abilities and where a machine is, when it may be simulating that behavior without actually having the consciousness behind it.

Joe C: 14:47

Yeah it really speaks to the importance of explainability to, and having some way to learn from these models and understand what happened when something does go wrong. In a lot of cases, I don't think that's possible. I think things happen and we you know. Ai being sort of a black box, it's like. Maybe we don't really know exactly what we would need to tweak to fix this, but often we do.

Andy B: 15:11

So with the stock market, it's kind of a punchy thing to start with, but this has very personal impact on people. So one other way that these sort of like financial regression models gets used is very much on credit lending applications, on like people around the world. So if you've ever applied for a credit card, for mortgage, for insurance coverage anything where traditionally a human being would look at you and decide is this person trustworthy, should our company give them money, you've had your data run through a machine learning model and humans can learn faster than models can in some sense. So, for example, I think a lot of people were more self aware of how racist they were and how poorly they were treating black people in 2020. And they did their best to improve. But some of these algorithms they're based off of data collected from maybe the 80s or the 90s or even the early 2000s, and they they have a longer legacy. They're like your you know racist older relative who just like won't learn anymore because they're very entrenched in their own habits. So and what happens is like everyone can do their best and still accidentally make like a racist credit card application algorithm. They will take in a lot of information about you and then try to provide a score that determines, kind of like a credit score, whether you should be entitled to some credit worthiness. And it's illegal to discriminate based off of like, gender and race, but you can do it on accident really easily. So a real example of this would be when you apply for some of these things online, you're also, in addition to providing your information like your employment history, your home address, your education, you're providing your browser fingerprints so they can collect information like are you running Windows or a Mac? How old is that computer? Do you have certain types of services running on it? And it's possible that their historical data is correlating something to race accidentally and then being racist. So this would be an example of like you drop people's race identification and you don't use their names. You can't accidentally be racist. But if the applicant went to a historically black college and then black university and all the previous people from the early 2000s and 90s went to historically black universities, they could just accidentally pick that up, as people that went to these schools should not be considered credit worthy. So unless the people building the models are very, very careful in their feature selection, like how they input the data, they can accidentally downstream, select sexist or racist features and then train the model to repeat the worst of humanity, or sexism or racism.

Kiran V: 18:10

This is like the unconscious bias of the AI world.

Joe C: 18:14

Okay, I wanted to say just on that note there are many examples of where a system is being racist or sexist. I think that's going to pop up a lot. It's something we really need to like work on as a as an industry. I came across a couple of examples as well of where that was. The end result is something that was biased.

Andy B: 18:36

And for the exact same reason that these like underwriting things, where a human being used to judge credit worthiness of another human being and now it's assisted or done completely by AI it's recruiting systems that help people get their foot in the door and interview do the exact same thing. So hopefully it's not a surprise to anyone, but when you apply for a job, most companies anything from like if you apply for a retail store at a big Mac national brand all the way to applying to a fancy tech job they run it through like an applicant scoring system where their pattern matching things in your resume to the job and then ranking people so that the human recruiter or the hiring manager can immediately find like the best qualified people and spend less time going through the stack of resumes themselves. And, big surprise, lots of people have gotten in trouble lots of companies for building really problematic recruiting systems. Amazon, for a while, was running a system where they would rate people applying to their tech jobs on a one to five star rating system, the way you'd rate a product, and it basically started picking up that they shouldn't hire women and to this day, actually, amazon is one of the only major tech companies that does not report how many women work in its tech team. Like Google, apple, meta, they all tell you exactly what benchmark they have of women working on the technology. And Amazon's real quiet about it Because, like this was in 2014, they started using this. They claim to have only used it slightly for a year, but the news story dropped in 2018. Who knows what they did. So this is just like an example of like from your ability to make money, your ability to borrow money, your ability to grow your money, there is brittle and risky AI that has made some pretty big oopsies in every step of that way. That I think is really interesting, and in this case, it's all regression models that are based on looking at previous data and then trying to predict what should happen next. And because the previous data is made by people, it's imperfect and we replicate the biases.

Joe C: 20:56

Yeah, I want to drill down on that point. You know it's so important to get these things right because they do have waterfall effects in that instance of the model being gender bias and hiring. The result of that is then maybe you have less women on your tech team. Diversity is so important. You can imagine what a you know all male tech team is then going to produce as their next model, something that maybe isn't thinking about a model that works for women. So it sort of echoes or, you know, continues the problem.

Andy B: 21:32

That's just one example of you know how it's important to note that, like it's not like anyone's sitting around any machine learning scientists being like I'm going to make a racist algorithm, it's that, like they have, imagine like a spreadsheet and every feature that the algorithm reads in is a column. There will be like 150,000 columns or more and some of them, if they're used, could make the algorithm biased. And it's on a person or a team to think through that and be like which of these should we not use? And they don't. They make mistakes, Like it just happens, yeah.

Kiran V: 22:10

And I think this highlights kind of a broader challenge in AI. And while there are a lot of instances and cases of, you know, failures and successes that we see in AI every day, I think we should understand what's really like the underlying challenge that we have here. And the challenge is that these models are only learning from what they see, right, and if you think of any individual human being, they're only ever going to see a limited set of the view, right, no matter how much you might try and travel the world, experience different things, humans have a physical limitation in terms of how long they're surviving and how much they can take in as an individual. And if we think about how these models are built, again, these are just individual humans training individual models. While we might think of AI as like the AI and the cloud and you know all of these things as maybe being connected, they're very much disconnected. Right, people are creating individual models that you know they might be using for a particular application or certain type of task, and so that model might see a lot of instances of a very small subset of data and for us to think about. You know the what is it? The singularity, as much as we might think about the singularity right. That concept means that that that a single AI machine will have needed to see very, very broad spectrum of data around the whole world, and so we're nowhere near this, and the ability to even chain two models together is still really challenging for us, and there's a lot of work and tuning that has to go into any models when we do want to chain them, and so to conceptualize how much information a model would have to learn in order for it to be completely unbiased in order to do a diverse set of tasks, is so far away from where we are today, and so this is really what leads to a lot of these failures because of the limited capabilities of any one particular model and the amount of data available.

Joe C: 24:33

Okay, so we've touched on a few what we call computer vision use cases already, and computer vision just think of it as like eyesight, so image or video related and so I wanted to start with a Google project where they wanted to be able to identify a diabetic retinopathy which can lead to blindness if not called early. They wanted to identify using AI, so experts who can identify this are a few and far between, and you can identify it by looking at eyes, so photos of eyes. So Google spent time training this algorithm on photos to identify this disease and then released it into the wild, starting with some clinics in Thailand and in the lab while they were training it and evaluating it. It had 90% accuracy and it could get a result within like 10 minutes, and so everything looked really great. And then they put in production and it started getting used in these Thai clinics and it really didn't do so well. The success rate was far less, and the reason for it was that the images that were being supplied by the clinic were being taken were of far less quality. So that really speaks to how important it is that when you're doing the training of a model, your training data should look as close as possible to the data that's coming in through the real world Because, you know, to us we might be able to identify an eyeball and a grainy photo from one that's in like a really high definition photo. But this is a big factor for what machine learning model can understand and it just wasn't cutting it with these photos that were coming in.

Andy B: 26:20

Oh, the researchers at Google were probably using like fancy pixel phones and they were like this looks great. And then they put this in some maybe rural clinics and they're using a 10 year old phone that's kicking around for free in the clinic.

Joe C: 26:34

That's right.

Andy B: 26:35

That's so silly.

Kiran V: 26:37

And this goes back to what we talked about training data and the importance of quality and diversity of that data. If you're creating an image model that you want to be used by lots and lots of consumers, you'd also need to consider the quality of camera that those consumers will have when you're making predictions on those tasks. That's right.

Andy B: 26:58

And tactically what this means is like, if you the way computer vision models work is, if you think of an image that's really small it's 100 pixels by 100 pixels it goes through every single pixel and takes its color and turns it into a number, and then it makes again a line of numbers called vectorization, and that it does AI on the vector. So in a high definition image there's literally more pixels, which means there's more numbers, there's more data to predict on. That's why, of course, it would perform differently based on the quality of camera.

Joe C: 27:33

That's right, and there was. There was another aspect to this, and it's not exactly due to the training of the algorithm, but the internet speeds in Thailand were far less than in the lab, and so this project came with a guarantee that results could be very fast, and they ended up being very slow or not really coming through, which didn't deliver on what on what these medical professionals needed, and that that is a factor. Again, it's not really a issue with the training of the model or the performance of it, but sometimes that processing power and what's available does matter in how these systems work and getting inference back to back to the person who initiated it or is using the model.

Andy B: 28:20

Yeah, none of this works if you can't connect to the internet. So if you are somewhere that doesn't have a stable, high speed internet connection, your alternative option is to get the compute power to locally predict and run these things. And like that's not going to happen, I mean it could. You could buy, you know, like a 80 gig, some RAM and shove it in a tower, but like who's going to bother?

Joe C: 28:42

All in all, I hope this is really a wonderful use case of the power of AI. You know, imagine replicating the power of medical experts and being able to serve a global population in place of those so like few experts that maybe couldn't be there for everyone. So I hope that the Google team, or anyone that sort of picked up this work or creating is creating the next iteration of it, will will learn from this and move forward with the right training data.

Andy B: 29:08

There's something you can do with models that's called distillation, where you can actually take the performance of a model and make it smaller so it runs faster with less compute. And I wonder if, like I mean, what I would hope for is that they could distill this model to maybe even run as small as on some old phones or with worse data, because how cool would be if they could even just publish an app to people around the world saying like take a picture of your eye and it'll tell you if you should go to the doctor, to go get a check out.

Joe C: 29:36

Right.

Andy B: 29:37

That'd be. That'd be so sick.

Joe C: 29:38

That'd be a great first step, at least. Yeah, just sort of a yes or no. Something's something's amiss here. So I want to talk about another example, and this sort of gets back into the conversation of gender bias and racial bias. So this is a little bit more recent. The that Google project was, I believe, 2019. So just this year, bloomberg published a really interesting expose on stable diffusions performance. So stable diffusion is a generative model trained on large amounts of image data to produce images, and this group did a study to understand what stable diffusion diffusion is showing in their output, and they found that there were high levels of gender and racial bias. And one of the examples they gave was actually like create me a photo of a tech CEO and most of the results that were coming out were white males and that extended to other queries for other professions and women were absent and people of color were absent in it. And so this group called it out and I think this is a great example of you know, like hearing you said we're very, very wild right now by large language models, generative models, and we need groups that can stop and say sort of act as a third party to check and see how this is performing, because otherwise they might not get called out and we might not know that these well used models maybe have these biases hidden away. You know I've used stable diffusion and I this is maybe something I wouldn't have picked up myself, and so someone's called them out for it and I hope that they're improving on it. And this is really a failure of training data again. Presumably they scraped the web and the web can be an awful place that's full of bias and the worst of humanity, and the model just picked it up Again. It wasn't really the I'm going to guess it wasn't the intention of the team or the folks who train this model, but more just the data set that they had.

Andy B: 31:50

And two comments on that One. You can do something called balancing training data, where you go through and you're like, okay, for the examples of corporate leaders, let's make sure that we have a equally balanced and weighted. You know 50% male, 50% female working, or you know 40% identifies one gender, 40% another and 20% is non binary. You can decide the split that you want for race and manually curate that data, and what that does is you basically will have to. If you think about the examples on the internet of pictures of men tag the CEO, pictures of women, it's probably a bar chart that's like a lot of men, a lot of women, very few women of color and then very thin line, and if you want to have an equal amount, you chop off at the line of the lowest. So you have a lot less data and what's an easy way to make models work better? Throw more data at them. So sometimes you have to make the conscious choice to feed the model on balanced data in order to get it to, on aggregate, perform better and the way these generative AI models work. It's not like they teach them one problem at a time, like now we're going to learn on text use. They're trying to teach the entire world of images that have ever been tagged or described and if any possible problem a person could imagine, there's just no way to balance that data, not realistically. So this is why brings me at point two. People in the industry will always say something like AI is really dumb, AI is really stupid. And that's just true, because once you know how this works, you just realize like the best it's ever going to do is however much data we can pull together or generate somehow and show it, and we will never do that good of a job at this decade at least. We've made a lot of strides in the last 20 years, but we've got a lot more to go.

Joe C: 33:58

And I have heard of cases where, in order to make a model less problematic, data has been taken out and other parts of the model the ways that it was more successful suffered because of it. So this really is a balance, is sort of what is good data versus what is enough data, and I think that's something we're going to be working on for a little while.

Kiran V: 34:26

And this is where synthetic data can actually be very useful If you're able to synthesize relatively realistic examples of those that have a low amount of training data. Again, you have to be really careful when you're synthesizing this data because you don't want to bias the model in any particular way. And even in those subclasses where you have maybe less data, there might be still further ontologies in that class that aren't represented even in the data or synthetic data that you have. And so these levels go so deep when we think about humans and we take a lot of things for granted as humans and our ability to distinguish different things that we see in the world. But a machine, again, doesn't have the understanding that we do in the have about the real world, and so it can't make those fine grained discriminations that we might be able to do as humans, just naturally because of our experience and growing up and, you know, decades and decades of learning as an individual.

Joe C: 35:40

All our data in our heads.

Andy B: 35:43

And what happens in companies at least this has been my experience and I've witnessed this is there's a team and somebody on that team has been told to build a model that does something, and then at some point, the whole teams in a brainstorming session like think, like six, eight people in a conference room and they're just whiteboarding, and they're like, okay, what do we need to think about as we prep the training data? What are the labels we want labeled? And what happens is like who's in that room matters, because different kinds of people, diverse people, think of different things, so that's why it's really important to have diversity working in AI as well as they. Just they've never get it right. The first time, when three of us work at a data labeling company and customers would define their ontology to get their images labeled, I can't think of a single time a customer out of the hundreds we worked with got their ontology right, labeled their data and kept going 100% of the time. They'd label some of their data, then go review the labels and be like, oops, we missed a major category.

Kiran V: 36:46

And there were cases when they're trying to be extremely comprehensive and you look at the ontology and it's 50, 100, many 100 items that you have to label in a single image, and so then the challenge goes to the humans actually labeling that data is. Humans aren't even able to 100% accurately model the things that they're seeing. Again, we have a lot of implicit biases and subconscious understanding that we aren't aware of. We're processing every day.

Joe C: 37:26

Yeah, this part of the process is so important but it's very difficult. You know, it's a science. And one interesting thing about the data labeling company we were at was, I think a lot of us, particularly those who worked more closely with customers, became experts on sort of how to get the right training data and what to look for and hopefully helped guide these teams that had maybe never done it before or really hadn't encountered this process before. All right, so one more thing I want to cover with computer vision. It wouldn't be a computer vision segment if we didn't talk a little bit about autonomous vehicles, self driving cars, and I don't want to focus on anyone instance, but you know these cars are out there now, here in San Francisco, we see them. I've taken to already and I suspect a lot more in my future. I also see a lot of headlines about what's going on with autonomous vehicles, some of it good, some of it bad, but it's, it's happening. Folks, they're out there and you know this is they've really turned a corner on on the abilities of these things. But yeah, good one, I didn't even plan for that, no pun intended. So we hear different things Cars blocking emergency vehicles, a lot of situations where the car is acting up and no one knows who to talk to because there's no operator of the vehicle. So you can, you know, bang on the window, but nothing's really going to happen. So cars generally malfunctioning and doing something weird All the way to you know, we hear about actual injuries and people getting killed. So this is one of those areas where it's really interesting to consider Is it really worse than human drivers or will it remain worse than human drivers? That's an interesting aspect of self-driving cars.

Andy B: 39:21

We were talking about this while drunkenly walking around Golden Gate Park in the evening. But the data that the government is collecting from these self-driving car companies and making public is bad data. Just it, straight up is, and obviously drives us crazy, because how are we supposed to make good decisions on bad data? We're also like models. Right, we need good data, and it's not clear what the ratios are of like how much driving to how many oops incidents there is. So these self-driving cars have logged a lot of miles and there's a detailed record of every time that they got re-rendered or that they caused an incident, but it's not shared with the public, like what percentage of ratios it is. So, for example, I'm pretty sure there's data that, like, most people get into a car accident within like four miles of home, because, when there's all these theories as to why that was, and then it turns out just because, like, you do most of your driving within four miles of home, because that's where your grocery store is, that's where you drop your kids off at school, like if 90% of your driving is within four miles of home, that's where 90% of your car accidents are going to be right. So that's the exact same situation with these self-driving cars. Before you make up your mind about whether you are a big fan or a big detractor of them or kind of neutral, take a look at the data that's being fed to you. I know that it's not being collected correctly, and so when you see a news piece that's like self-driving cars are really good or self-driving cars are really bad, like question how the reporter got that data or made that judgment, because I'm not sure that the data exists to make a call yet.

Joe C: 41:05

And I have an example of that is self-driving cars. It's so in the conscious of people's minds right now. It's being reported on a lot. I think it's maybe a new frontier in how much the public's learning about AI, because we're all familiar with cars and it's a really interesting manifestation of things we've dreamed about. But I saw a headline recently, so I believe it was this past week. A woman was injured in a car accident and the headline I saw about this said that a self-driving car was involved. It really highlighted the self-driving car aspect of this. But if you read into the details, what happened is that a non self-driving car, a human driven car, actually struck a woman and sort of chanted up in the other lane right in front of a self-driving car that instantly braked and the human driver drove off and it was a hit and run. And so the actual role of the self-driving car in this was that it is supplying camera footage and telemetry to help police solve the crime of who actually struck this woman. And it's just an interesting case because the article sort of had a spin on it that, like, the self-driving car was involved in a bad way, and this is maybe a case where it's actually going to be helpful to a larger society that the car, the self-driving car was there. So it's just an interesting thing on how politicized AI can become or how controversial and the growing pains of what it's like to live in a world with.

Andy B: 42:44

I've got a whole tangent, kieran's going to cut out of this final episode, but I'm going to go on my tangent about how lazy journalism is when it comes to tech. Like I get it, having to explain stats to yourself and to understand it and then explain it to a disparate audience is harder than writing a listicle, and we've seen the success of shorter form content basically swallow and eat traditional reporting. But, like, if you're currently working in journalism, like, grow up, adopt to these new technologies is kind of what I'm thinking and like, figure out a way to communicate the reality of the situation in an unbiased manner. This might mean you have to yourself learn a little bit of math here, like and not a lot, just a little like high school level stats right, and explain that to people because it's important. I don't want the average person to have a really biased and incorrect data on something as important as the future of self-driving Cars and then turn around and give their legislators different commandments and be manipulated by people and manipulate by news. So I really wish that the people reporting on the stuff were said less lazy, that they were working harder to demystify this stuff. Hence what we're doing this podcast and we've seen this. I'm just gonna say all the COVID vaccine reporting. You get a lot of clips if you're fuzzy about your what you're reporting on, and we saw lots and lots of data trying to be clearly reported, like how many people actually were having adverse vaccine reactions. But you saw it was a really heated thing and, like no matter how much reporting of various kinds, it has a little bit of bad reporting stolen to a lot of people not trusting COVID vaccines. This is all going to happen again in the world of AI and this AI grows in our lives unless journalism decides are going to do better and speaking of reality.

Kiran V: 44:36

Going back to self-driving cars for a second spoiler alert, autonomous vehicles have actually been around for decades. Airplanes have been flying themselves for a very, very long time, and airplanes are able to go literally from gate to gate without any human intervention. Now, when we're talking about efficiency, humans are still going to fly the plane more efficiently in many cases, which is why they have a pilot there. Which is why the pilot is responsible, because that's actually how they get a lot of their bonuses is by saving fuel, and so airline pilots will make more money if they're able to use less gas on a particular trip, and so that's why you do still have that human intervention. But airplanes are self-driving. Airplanes are able to go from point A to point B completely autonomously, and the reason why we've effectively solved it for air travel is because, when you're flying in the air, there are very few things that you can actually run into, and so the amount of obstacles is significantly less than for a self-driving car. But at the same time, even if we're talking about a vehicle, it is largely a bounded problem. While the bounds of it are very, very large, for the most part, you're still going to be driving on a road, you're driving in a lane, there are generally traffic rules, and so it is something that we are able to make progress in, because you kind of have those bounds in your problem versus something like a humanoid robot. Now you've exponentially grown the number of obstacles and challenges that you would have to have that overcome, and so again, there's like many layers of AI and many levels where things have largely been solved because the bounds are simply really small and you're able to fix a set of parameters that will work in those situations 99.99% of the time. But there's other situations, like self-driving cars or humanoid robots, where the bounds are significantly larger, and so it's just going to take a lot longer to be able to accommodate for the variety of nuances that you might occur in those situations.

Andy B: 46:56

Joe, I was trying to Google this the other day. Maybe you know, but as I understand it, self-driving cars like fully level five autonomous haven't led to any human deaths that we know of yet. But the incidents that have led to people losing their lives in cars have all been with like Teslas, which are like level two or three autonomous, and people trusting the car when they really shouldn't have been right.

Joe C: 47:21

I remember one instance of a woman with a stroller crossing the street. It was in Tesla. It was another company and I believe is in Phoenix. That's the only one I can think of. I'm not sure.

Andy B: 47:34

That's. That's awful and just so everybody's aware. Like these self-driving car companies and teams, they have like incident report programs and they watch every time there's an incident. There's like a room full of people program managers who are watching the video of what happened, trying to figure out why did the car behave this way? Was there something about the training data that was incorrect? There's something about the weights in the model that is incorrect, and trying to prevent these mistakes from happening. In some sense, when one self-driving car makes a mistake, all the self-driving cars learn from it, which is very different than from people, right? If I have a car accident where I forgot to turn left before making whatever looks the direction for making unprotected turn, kieran could make the exact same mistake the next weekend. He doesn't learn from my mistake. So that's an advantage to these distributed systems. Overall, I think I much prefer self-driving buses to self-driving cars. I think we've dumped society way too much money into self-driving cars. But the technology is really cool and it's made some big oopsies. But don't let that scare you from trying these things out and having a little faith that it's going to keep getting better.

Joe C: 48:44

Definitely I want to say sort of blanket. All my research, I uncovered a few use cases. Some of the things getting reported on are instances of AI where it really matters. I mean, we're talking about autonomous vehicles. There's a huge safety factor. We're talking about healthcare and people's eyesight. There were a few examples that we can talk about in another episode in warfare mistakes AI being used in warfare, where lives are on the line but there's probably also a lot of failures and flops in areas that maybe aren't as important or things that we don't see a lot of. And something that comes to mind is AI is largely powering ads and the whole internet is basically run on ads, and so there's probably a lot of flops happening there that aren't getting reported on but are probably resulting in strange bias or people losing money. So something to think about how much or how little this is probably actually getting reported on.

Andy B: 49:50

And every single person listening to this has an AI flop at least once a day. You ever get a piece of spam that evades your spam filter and then the email shows up and you're like this is spam. That's just like the motto didn't catch that piece, just messed up. Or if yours knows something exists and you're searching for it on Google and you just can't find it, that's an AI model that messed up, like your intent was not collectively picked up. It didn't index it correctly to its search space and didn't get you what you needed. So you're having an AI flop at least once a day.

Joe C: 50:21

And some of them are quote unquote, meaningless meaning. Maybe it's not going to affect the larger picture, like some of the cases we're talking about.

Kiran V: 50:30

So we see that there is a broad spectrum of AI failures. Right, when we think of something like an incorrect ad or poorly placed ad being put in front of us, it's a fairly trivial mistake that an AI machine may have made. Then you get to some of the bigger things where you might be misdiagnosing individuals for diseases or missing out on a diagnosis, again because of a failure of AI. Talking about maybe some of the the range of challenges that we've seen in LLP, I think some of the simpler ones again that are, you know, less consequential to our lives. I'm sure most people, if not everyone listening has experienced this. You know you call into an organization or you know you're calling an airline and you're initially directed to an automated tone, an automated answering system with an AI assistant on the other end and it's asking you a bunch of questions, ask you to tell you its name, your name, or ask you to give your phone number or input your you know confirmation code, and in a lot of these cases you're able to talk directly to the system it may or may not understand what you're saying and then direct your call accordingly. And so in the instances where you know, you repeat your name over and over and over and it's like I didn't get that. Please repeat it. Can you dial it on the phone pad so I can actually identify what you're saying? This is a classic failure of LLP and essentially, you know LLP has a range of types of models. The first model that it has to do when you're using one of these is a speech to text. So it needs to convert what you're saying into text data so that it can have, you know, the next level of processing. And those speech to text models are, again, often not trained on the variety of data that it comes across, because of how unique and different all of our language and speech might be. And so you know, if you're someone with a speech impediment, or you know it has challenges speaking, then these machines might fail more often than someone who has maybe more of a traditional speech style or even accents. Yeah, or even accents. Right, if the machine has never heard a particular accent and suddenly someone with that accent is trying to call in, it's going to have a very, very tough time actually understanding what that individual is saying.

Andy B: 53:18

The cutest example of this. Just go on to TikTok and watch videos of toddlers trying to talk to their Alexa. And Alexa does not understand toddler speak. None of us understand toddler speak in Alexa's defense and watching toddlers get frustrated trying to tell Alexa to play Bluey or whatever is really funny.

Joe C: 53:39

It's really. This is a fascinating area to me because I do think probably a lot of teams go into an NLP project thinking that, oh, you know, language is language, english is English or whatever language it is. It's defined, it's what's in the dictionary and it's just not true. Like language evolves, even you know we're using emoji so much today. Like is that being considered in? You know, wherever text is being analyzed? It's really, it's really fascinating. It's not cut and dry.

Andy B: 54:11

For the record, nlp stands for natural language processing. This is the domain of machine learning and AI. That's all to do with language written language, spoken language.

Kiran V: 54:24

Exactly, and another aspect of NLP, or language processing in general, is called NLU, which is natural language understanding, and this is kind of what I alluded to earlier taking the next step beyond just processing that data, but actually understanding, maybe in more of a conscious sense, what is being said in that text or in that speech, and so well, I'll get into that in a second, but I do want to. So, talking about the variation in failures, right, we have, you know, something that's maybe more trivial, like, you know, a voice assistant or an answering machine. I think I want to talk about one of the, you know, maybe more famous failures of NLP, which kind of had, you know, this doomsday vibe and, you know, probably one of the more sensationalized things we saw in the news. But in 2017, facebook created an AI model that was, you know, able to converse, and it actually trained this model on negotiations, and what it had is it actually had two AI models conversing with each other, trained on this negotiation data, and they just kind of let the system run. So, you know, one system would say something, then other one would respond, and vice versa, and what actually happened is the AI created its own language and started talking about things that humans couldn't understand, and so this was largely sensationalized, right, like people thinking, oh my God, the models are taking over and this is going to be the next, the first terminator, and you know, we're all going to die.

Joe C: 56:09

It does get kind of scary when I mean on explainability, like we at that point entered into a domain where we didn't really know exactly what was happening or what was being said, and that is like movie level scary.

Kiran V: 56:24

Yeah, yeah and I think that's the piece that we need to understand, as humans interacting with these technologies, is the machine came up with its own language, but really what it did is this was an online learning system, meaning that as the conversation was happening, it was actually able to learn from itself. And again, when we talk about an AI model, it's just a giant math problem. And when we talk about learning, what we're doing is we're adjusting the values in this math problem to accommodate for a expected output, right. So, again, if you think back to the most basic, quote, unquote AI math problem, you think about y equals mx plus b. It's literally just a regression model and in that case, you know, if you're given a particular x and a particular y, then you'll go and solve, for you know, the optimal m and b to make sure that that equation solves true. And this is essentially what you're doing in an AI. When you're allowing an AI model to learn, whether it's from new data that we've collected in the world, whether it's from synthetic data or whether it's from its own interactions, it's just constantly trying to optimize that equation and these equations are, you know, much more complex and numerous than a single. You know equation for a line, but it's essentially the same thing. And so what happened in this instance is these two machines were talking to each other. Those model weights were being updated as the conversation was happening, and what it was doing is it was trying to optimize for that conversation. But you can imagine, without that natural language understanding, which, again, is very, very limited in the world today the model doesn't really have anything that it's optimizing for. It's optimizing for some ephemeral thing that it has chosen to optimize on. And so if you just let this model run free in a conversational AI bot, you're giving it input which is a sequence of words, and you're producing an output which again is a sequence of words. But because it doesn't understand that sequence of words, it doesn't know if it's outputting real words or if it's outputting complete garbage. It doesn't understand that there's a difference between those things. So all it's doing is it's like all right, well, you gave me a sequence in and here's some sequence out, and if you're updating those weights without any intervention by a human to make sure that the output is correct, then you're just going to have end up with a bunch of garbage, and that's essentially what happened. So like did the AI make up a new language? Maybe, sure, but all it did is it started outputting a arbitrary sequence of characters, that to humans, when we were trying to understand what the machine is doing. And we, you know we understood that, as the AI has created its own language and now it's going to take over us.

Joe C: 59:38

Yeah, I think this is, I guess, the failure and the flop here is that it wasn't really a guided experiment. I'm not going to say it's a failure, but this is probably. You know, we really learned something from this, or that team did, and I think that this were to be put into production or used in some way in the world. It just goes to show that you probably really want to start a project with a goal in mind, like what do you want to have done here? If you want to, you know AI is chatting together, what do you want them chatting about and what are the parameters, what are the weights you're going to use?

Andy B: 1:00:14

And what are the examples of how these OOPSIs happen that we've been talking about? So one is essentially this process where, like, a team decides to build a model, they get their training data, they train the model once or twice. With some training data, they literally somebody's got a script, they hit, run, boop, out comes a model. Then they put that model into something. They go, go, have fun, and the model doesn't do a good job. The underwriting models and financial stuff it's the eye photo models an example of that. At the time of creation, the model just sampled data incorrectly. The model's just bad, or the humans really sampled the data for the model incorrectly. The other option is like some team gets really excited about online learning, where they're like hooking up all this ways that, like, the model will automatically collect its own data and like run miniature training cycles and update itself and new behaviors. And I can tell you, because this has been me 1000% of the time you're like this is going to be the coolest and you get really excited, thinking that, like to the top, it's going to be the best. And then what actually happens and there's so many examples of this, of chatbots, like Microsoft's TAFE. We're going to talk about that, like it just spirals. Like you look away for an hour and a half and it's become a complete babbling idiot Like you hooked it all up thinking it's going to yeah races Like you think it's going to be incredible and it's just mid.

Kiran V: 1:01:37

This stuff.

Andy B: 1:01:38

You know, it's how it happens.

Kiran V: 1:01:39

And again, there are many instances still emerging, but a lot of instances of online learning systems, and the ones that are most successful are those where it's actually learning from human inputs rather than from itself, right? So in this case, with Facebook, it was just two machines that were learning from each other, and so, you know, you can, you can understand how there's no real meaning or intent, because there was no, you know, moderation in that feedback loop where a human is, you know, validating that something is correct or not. But you think about, like Netflix, right, when I go to Netflix, if I start rating the shows that I like and I don't like, it's actually going to produce a better output for me. It's going to be recommending things that are more relevant to the types of stuff I'm interested. And again, this is a, you know, online, offline, you can, you know, decide how much you want to go into the details of is it actually doing this as I'm clicking the buttons, or is this a process that happens, you know, once a day or whatever? But the fact that these models are being retrained by something that a human has indicated rather than something that a model is indicating, it's going to help these get better over time versus something like the Facebook model had nothing, no sense of better because it was, because it was just learning from itself.

Andy B: 1:03:03

The Netflix example is something called a recommender system If you want to Google or learn more about those, and this can work really well because there's a clear goal. Show you what you like and you can say I like it, I don't like it, and then it gets better at recommending things to you. If it's just two robots talking to each other, it's basically like you handed two typewriters to two monkeys and wish them luck. Like there's not a clear goal there.

Joe C: 1:03:31

And this goal setting is such an important part at the start, just like selecting your data is and doing that rigorous process of evaluating to make sure it's performing, you know, in the lab.

Kiran V: 1:03:44

So so I was doing this research on NLP to understand flops and failures. I think that Facebook one was the most hyped or sensationalized one that I was able to, you know, find. And then I started thinking you know what are more recent cases of NLP? And I think you know the again, the big sensational one has been chat GPT, you know, in the last, you know two years, and so I started thinking about it. I'm like, okay, well, now you can actually interact with chat GPT. There are rarely I've actually never experienced a case where it's outputting completely nonsensical language and it made a bunch of things up for me. Oh yeah, so it'll make stuff up all the time.

Andy B: 1:04:32

But when I read it it's like oh yeah, like you know, this is a thing that a human could have said yeah, it's like it's a sentence.

Kiran V: 1:04:39

I can understand what it's saying, whether the information in that is correct or not, you know it's still very much hallucination and I'll get into that. But I started thinking like wait is, is NLP solved? Like have we figured out natural language processing? Because we have a machine that is broadly accessible to billions of people in the world that is able to have a human like conversation, where you can go on for an hour and talk to this thing and feel like there's an actual human on the other side. And I think the the takeaway I had here really is no, it's absolutely not solved. There's still a lot of problems with it. And you know, I think a couple things to call out here is we think about chat to BT and think about this as a general understanding model, right, you can talk to it and knows about so many different you know things that it might have learned a lot of different topics that exist in the world. And and this is again goes back to the bounds of the problem, and so the reason why it's able to simulate that human activity so well is because of the sheer volume of data that it has been trained on and what chat to BT has learned is how to produce English or a human language based on what exists on the internet. So if you, if you read it and you interact with it, it's largely just another web page and the text on there is going to look similar to web pages that you've seen in the past. But if we now start to have it make decisions on real world events or even give it a math problem, right, there are cases where it's unable to actually produce something that is usable in our world, and the reason is, again, because of the sheer bounds of that problem. Right, if, if you want to have a general like here's, you know, a language model that you can interact with yeah, there's so much content on the internet that it was able to be trained on that it could simulate that for most people and it's good enough, and it's. It's a good thing to have a lot of people that are able to use it and it's good enough and I can have a conversation with it. But when you actually start to dig into the details and have it focus on specific tasks, it will fail miserably in many, many cases.

Andy B: 1:07:08

My favorite examples of this is when you try to get chat GPT to make you like recipes. Oh, it's bad, like it's not a good innovator in the kitchen.

Joe C: 1:07:20

I saw an example and I'm going to get some of the details wrong here, but it was someone asked chat GPT to give them a crochet pattern of a monster. I think it was a dinosaur, I think dinosaur. And then they, they went and they actually like crocheted, these are made them and they were like weird and freakish and I think they sort of had to fill in the gaps because the model did so bad. It was like, well, this isn't even going to be like a thing. If I don't, you know, add my own judgment into it. But yeah, you get some wild stuff.

Kiran V: 1:07:52

Yeah, there's also you can look up online. There's a lot of movies now that have been made where the AI wrote the script and then you have real humans acting this out and you're like wait a second, this makes no sense at all. And there's one, I think, that came out maybe like 2016, 2017, which was like one of the first ones I saw and it's it's hilarious. It's like a Star Wars-esque movie where you're, you know, you're like kind of like futuristic and every line is just complete nonsense and it's hilarious. I highly recommend that. Well Adoling.

Andy B: 1:08:27

I got really frustrated with chat GPT the other day because I wanted to make like I already knows what. Cal-marie is right, you like have these categories and you're supposed to go declutter them in your home. And I was like I don't want to spend that much time in any one session. Give me subcategories. And I wanted the smallest granular subcategories of my possessions to like, go through and like. It was clear, the model just thought I was a man and I kept like. It kept suggesting to me in clothing like, check your suits, oh my God, I don't own a suit. And it kept missing bras and I kept being like no, be more specific and tailor it for women. And it kept spitting out like, not like, like I would like my jewelry to be subcategorized, like my drop earrings, my study earrings, et cetera, like kind of girly things. I had no idea what girly things I was talking about. I got so annoyed.

Kiran V: 1:09:17

I guess in that case you might actually want to add a little bit of bias, to say like girly immersed manly, and again that's like context, right, you don't want to have the model hyper fixate on these things, but for it to interact in a human world where we do have biases. The machine should, you know, in an ideal world where the singularity is real, be able to identify when, to actually acknowledge these biases and when, to you know, avoid those biases.

Andy B: 1:09:51

Stop telling me to declutter my neck ties. I don't own any neck ties. I don't know anybody who owns any neck ties.

Kiran V: 1:10:00

So so I think this kind of leads into again you know what are some of the failures of NLP and beyond. You know individual examples of failures I kind of want to mention. You know the challenges with NLP, so you know some of the the biggest challenges and problems that we face with NLP today. One is contextual words and phrases and homonyms. So again this goes back to natural language. Understanding right is the machine doesn't understand what a chair is. It doesn't understand what a dog is. It has read this word in many places and may have a sense of when that word should appear in a particular phrase, but it doesn't know that a dog is a living thing, that humans keep as pets and they love them and they walk them and they, you know, have them protect their home. Right, that concept that that concept just doesn't exist and we don't currently have understanding of how we would code that into a model for it to truly understand human consciousness. Homonyms we think about. You know words that sound the same but mean different things. So I ran to the store is totally different than I ran out of toilet paper, but in that context ran is just a word that it's seen, and if it's seen it in the context of movement, then it's going to refer to it in those type of contexts more than running out of an item of things, right?

Andy B: 1:11:49

I ran to the store. You don't mean you literally jog to the store. You probably hopped in a car or a bus or a bike. Like I ran to the store does not at all refer to how you got to the store.

Kiran V: 1:12:02

The other. Another challenge that NLP has is with synonyms. Again, it it doesn't have an understanding of any particular word, right? It just understands that this word appears next to these other words. And so if you ask it to give me synonyms of a word, it doesn't. It's not able to map the conceptualized language, under human understanding, of that word to some other word that's like oh, these things mean the same thing. It just knows that those words will appear in phrases around other words in a particular times, Other things.

Andy B: 1:12:42

So the weird example of this I want to call out a risk is like people are so much smarter than models. You can post a picture of like a violent grizzly bear to Reddit or something and then everybody in the comments will be like I want to kiss it, I want to hug it. Why is it so soft? And like we know that we admire the animal, but no one should go up and actually try to Chris, kiss this grizzly bear. But, if you take that image and text combo and you feed it to a model, it's going to classify grizzly bears as kissable huggable, which is like obviously not the case. We wish they were, but no.

Kiran V: 1:13:22

Yeah, and that actually brings up, I think, the next one, which is irony and sarcasm. That is sarcasm, right, like I'm sarcastically saying this thing and while I might feel that thing, I'm never actually going to do that thing. And and sarcasm of you know, a lot of times the basic premise with sarcasm is you, what you are saying is generally the opposite of what you're intending. So it's like you know, if you're on a really shitty vacation you might say, wow, this was the best vacation ever. And that's totally sarcastic because you know it's sucked. But if a model sees that in text and it's trying to learn from that, it will associate a positive sentiment with that vacation. Because it's again, it doesn't see the body language, it doesn't hear the intonation of the voice and these are characteristics of sarcasm.

Andy B: 1:14:16

If you're texting with somebody and you're like, wait, are they being for real right now? Like it's the exact same problem. You miss things when you text with people. People misunderstand you, you misunderstand them and you know that other person and you have so much context. The model doesn't have anything other than the text.

Joe C: 1:14:34

I also think this stuff changes every day to culture, so we touched on this. But culture is so fast moving that new words, new acronyms are getting introduced every way new, you know, styles of humor and memes, and we need to be training models that can keep up with this and understand what you know what we're talking about.

Kiran V: 1:14:58

And the other failures that we see in natural language processing generally have to do with low resources or not enough training data. So you know errors in text or speech. If you are training it with labeled data and you have labeled that poorly, you're not going to be able to build a model that is an accurate representation of you know that human understanding, colloquialisms and slang again these, like we mentioned, right, new language is being created every day, whether it's you know Gen Z's, chuggy, or you know Hip or Bitchin, right, these are things that aren't part of the classic you know quote unquote, english dictionary. And so until it's seen enough examples of these things, it's not going to be able to learn. And that's where you know you get modeled, drift over time, and the reason why you want to continuously train these models and keep them updated, because the world is changing. Other things are domain specific language. So, again, chat GBT is great at general knowledge quote unquote, again, it doesn't actually understand anything it's saying, but it's seen a wide variety of examples of things about you know, from Wikipedia, so it can understand. Maybe the concept of a person is you know someone that has characteristics of age, date of birth, height, parents, spouse, etc. And so if these characteristics exist on a page, then I might think this is a human. But then when you get into domain-specific things, which are some of the examples we talked about here of Facebook training, a model on negotiations it's very specific. Or in the computer vision example of looking at diseases in your eye, these are things that have very low amounts of training data and so it's unable to have a firm grasp and actually perform well in the real world. Low resource language and just a general nasancy in natural language, where we don't have that inherent ability to give the knowledge true human understanding, but rather it sees a pattern or sequence of words and it is just predicting what the next thing should be, based on the bits and bytes that it saw in that vector.

Andy B: 1:17:28

And this is why probably most people are not really happy with the performance of their Alexa. Yet I don't know about you guys. Like my parents have accents. We speak multiple languages in our home and we switch languages mid-sentence sometimes and then if my sister-in-law or my uncle are there, there's new languages sometimes introduced. So this was even years ago, around 2018, I was on a phone call with a major consumer electronics brand and they were asking me to try and find them English examples. They really, really wanted Indian Americans to have better performance for their electronics and I was just like okay, so now I have to see where I can find volunteers to give me or pay them for audio samples in English, and what English means is totally different for different people that speak English and Hindi. Different generations like you're mixing different kinds of language and there's different words where your context switching like click shifting. A lot of these natural language processing teams, natural language understanding. They hire actual linguists like PhD linguists, and even they can't keep up.

Kiran V: 1:18:42

So not a solved problem at all yet yeah, and again, this really goes back to the amount of training data. Even languages like Hindi or Thai that are, or Vietnamese, that are spoken by millions and hundreds of millions of people, right, like all over the world. So many people speak these languages, but the models have been trained on that, because today the world we're in is, you know, a lot of this research is done in English, and you know people trying to create English language models because solely because that's, you know, the most popular language on earth, and so these languages that are so common in our world don't perform well in a natural language model, simply because we haven't spent the time to label the same volume of data in Hindi as we have in English, and so it's just, it's just not going to perform the same.

Joe C: 1:19:39

Mm-hmm, all right. Well, were there any other examples of an NLP you wanted to cover, karen?

Kiran V: 1:19:48

No, those are all the things.

Joe C: 1:19:51

I think we can go ahead and start to wrap it up, Maybe round Robin. Any final thoughts on flops and failures in AI.

Andy B: 1:20:00

Right now we're in this kind of golden era of AI where every second day you're hearing people talk about some incredible they were able to do with, like chat, gpt or stable diffusion or these generative models, and it is really exciting. But what's really bizarre to me is how many times like AI experts in the industry who have huge followings are going on to like the Joe Rogan podcast and saying like we are days away from general intelligence. Hopefully we made it clear that that is not at all going to happen. We are in we're still in the very early days of making all this cool stuff happen and while there's real risks already, hopefully you understand now, like why this is hard, why it moves slowly, where the risks are, and don't listen to anybody telling you that we are close to the singularity, because we are not. Those people get more clicks when they lie to you about that.

Kiran V: 1:21:06

The thing I want to close on, I think, is, again, these models have failures because we have failed to provide it with the context or the understanding, and that's not for lack of trying. That's because it is simply an incredibly difficult problem to be able to give a machine the understanding that a human has, that we take for granted because we live in the world that we live in and we're only trying to virtualize that world for machines. And I think the takeaway is don't be so scared of AI. If you see an article that Facebook has created a new language with its AI and it's going to take over the world, it's not going to take over the world. Hopefully, at least humans don't give machines the ability to make decisions for us for a very long time, because we know that maybe some of those decisions will be really good and powerful and we should have it surface recommendations. But we currently need humans in the loop at every step of the way to check these models to make sure that the decisions coming out of them are indeed accurate and when they're not, we need to be providing those machines with feedback so that we can get to a world where maybe more and more of these decisions can be made by AI, but we're still quite a long way from that.

Joe C: 1:22:37

My final thought all this discussion, I think, actually made me hopeful, or made me realize I'm hopeful about this. A lot of AI work fits into the scientific method. This is trial and error. I hope that with all these flops, we're learning about the gaps in the data that we have, or the gaps in our own judgment, and making the next model even better. So I'm hopeful that all these scenarios where there have been failures, the next time they are attempted, they're going to be better. They're going to be that much better so that's, yeah, the people making these mistakes.

Andy B: 1:23:15

Nobody thinks this is cute. People are really embarrassed when their projects get into the public eye and they accidentally released a racist model or a sexist model, and many people working on this care deeply and they're trying their best. It's just a hard problem and we don't have enough people working on it.

Joe C: 1:23:38

All right. So with that, thanks for listening and tune in next time.

Kiran V: 1:23:42

Check us out on Instagram, check us out on Spotify and we will catch you guys with another episode.

Andy B: 1:23:50

And if you want to contact anybody, find our information at aiflyicom.