About this series
Join Leon Furze as he navigates the ever-evolving world of artificial intelligence, examining its origins and significance within education. Learn how AI can support personalised learning, encourage student engagement, and foster the development of future-ready learners. Unearth the potential of AI in your school and understand its capacity to transform teaching and learning experiences.
Key takeaways include: the history and significance of AI in education; the potential for AI to support personalised learning and student engagement and strategies for implementing AI in your school to enhance teaching and learning experiences
Book an upcoming session with ACMI's Education team today
This transcript was machine-generated and published for accessibility purposes. It may contain errors.
Zoe McDonald: Welcome everybody. I'd like to begin by acknowledging the traditional owners, the Wurundjeri and Boonwurrung people of the Kulin Nation on whose lands we meet, gather and work here at ACMI in the center of Federation Square in Naarm, Melbourne. I'd also like to pay my respects to elders past, present, and emerging and acknowledge First Nations participants who may be joining us today.
My name's Zoe. I'm a producer of school programs here at ACMI, and it's my great pleasure to welcome you to our first session in Demystifying Artificial Intelligence. This session is what is AI and how will it affect my school? We'll have a few minutes for some questions at the end of the session. So I just ask you to use the Q and A function at the bottom of your screen if you have any questions along the way. You can also upvote by choosing the thumbs up icon. If you feel that someone has a similar question to you. As we have a short amount of time at the end, we'll not get to all questions, but we will endeavor to answer them for you before the next session in writing. So I'd like to introduce our speaker today, Leon Furze. Leon is an experienced consultant, author and speaker in education and leadership. He's currently studying for his PhD on the impact of artificial intelligence on writing, instruction and education. Hailing from Stoke on Trent, Leon is passionate about education's capacity to empower people and has worked extensively in educational leadership and strategy. Welcome, Leon.
Leon Furze: Thank you very much Zoe, and thank you to ACMI for the opportunity to run these sessions. So I'm really excited about the next four sessions. I'll just begin with my own acknowledgement of country and acknowledge the Gunditjmara people who are the traditional custodians of the land on which I'm presenting from today and pay my respects to their eldest past and present. And extend that respect to all Aboriginal and Torres Strait Islander peoples here today. And Gunditjmara country is down in the southwestern corner of Victoria near the SA border and we are a lovely part of the world here but a storm is just about to hit, so hopefully, fingers crossed the internet holds out and we get through this session in one piece. As Zoe mentioned, we are going to have Q and A at the end and I'll try to get through as many of those as possible.
But please, by all means do leave questions in the chat so that we can get these sessions being as interactive as possible and we'll come across some of those questions in between sessions. So this is the overview for today's session and really this is part introduction to artificial intelligence and ChatGPT and part overview of some of the bigger ethical concerns as we go into these technologies. So I will briefly talk about AI as a concept and as a technology, as an industry and a field of study. And then I'll focus in a little bit on ChatGPT because that's obviously the most talked about application of these technologies at the moment.
As we go through the next few sessions, however, we're going to come across a variety of different applications of artificial intelligence and particularly generative AI or generative artificial intelligence. Which is the type of AI that can produce text like ChatGPT or images like Midjourney, Stable Diffusion and other things. So most of the images that you see in this session this afternoon have been made in Midjourney, and I'll be talking about that in a later session. I'll also talk about AI in education, how it's already come into education, maybe what some of the opportunities are so that we can balance being critical about the technologies with some creative and practical uses of the technologies.
So I'll begin with a very broad and a high level definition of artificial intelligence because one of the biggest issues with the term AI is that nobody can actually agree on what it means. So as a concept and a field of study, AI dates back to the 1950s and there was a conference, the Dartmouth Conference, where the term AI was first coined. It was either going to be called artificial intelligence or probably cybernetics. And we might have had fewer problems with the technology if we'd have gone with cybernetics because both words, artificial and intelligence are actually a little bit problematic. So some people would say it's not artificial because it's made from a lot of human data and many people would argue that it's not intelligent by any of our current definitions of the term, but it's what we're stuck with. So that's what we'll work with.
When I talk about artificial intelligence in the context of this presentation tonight and the next few sessions, really what I'm talking about is algorithms and data. So the algorithms that sit behind technologies like ChatGPT and similar models have existed to a certain extent for quite a few decades now. So we know that some of the algorithms behind these machines have existed since the 1980s, and as I mentioned, those concepts go all the way back to the 1950s.
So although it feels like a new technology that's kind of crept up on us and really exploded into the public consciousness since November last year when ChatGPT was released. As a field of study and conceptually it's existed for a pretty long time. What hasn't existed for that whole time, however, is the vast amount of data that is used to train current models. And that data, which I'll talk about in more detail when we get to ChatGPT is one of the biggest affordances of the technology and one of the biggest advantages, but also one of the most problematic areas of the technology. Particularly when it comes to some of the ethical concerns that we'll talk about later in this session.
So essentially with many forms of artificial intelligence, we have algorithms processing data and that gives us some kind of output. But the really important thing to hold onto as we go through this is that it's not magic. The title of this session and this series of sessions is Demystifying AI, and that's because I've got a bit of a personal gripe with the way that artificial intelligence is presented in the media and is presented by some of the organizations that release these technologies. So there's a lot of language around these technologies concerned with magic and mythology and even godlike or superpowers. And I really want to bring it down to earth and start talking about these technologies for what they are, which is really just computations and algorithms and a lot of data. So although when ChatGPT first came out, we saw a lot of pretty explosive headlines and we saw things like the end of the high school essay or the end of English as we know it, these technologies aren't going to, they're not going to end the world immediately.
And I think we need to take some of the hype with a pinch of salt. The other thing is that whilst these technologies aren't going to end the world immediately, you will have probably seen in the press a lot of discourse around the big, major threats of artificial intelligence. So what's happened recently is that these huge advances in generative AI in particular like ChatGPT, have sparked a new kind of frenzy within the field. And we're seeing now a lot of conversation from fairly well respected scientists and philosophers and people who've worked for big organizations like Google, IBM and so on, coming out of the woodwork and saying, "We need to put the brakes on and we need to stop this technology from developing any further." Now of the opinion that the level of AI that we have now at the moment represented by things like ChatGPT, that's not going to go anywhere.
It's not going to get regulated away, it's not going to get stopped in its tracks. Really, those discourses around the threat of AI are talking at the next level up and we're not there yet. Some scientists will tell you that we'll be there in three years. Some will say we'll be there in 10 and some will say that we'll never get there at all. But I do think that there are other threats and concerns associated with the current types of technology, and we'll talk about some of those this afternoon and in the future sessions. So in defining artificial intelligence, first of all, we can't because we don't actually know what it is. But in terms of what we're talking about at the moment, really try to think of it as a complex and a sophisticated algorithm, but an algorithm nonetheless. And it's not magic and it's not myth, and at the moment at least it's not going to be destroying the world.
So with that in mind, I want to kind of bring it down another level and talk about a specific application of artificial intelligence. So as I discussed, we'll be going through a few different forms of artificial intelligence, but they're all broadly in the category that I'm calling generative AI. And I'm calling it generative in a few senses of the word, but mainly because it generates new output based on whatever you put in. So any of these technologies where we put in a prompt and it gives us a response or where we put in one form of data and it gives us another form of data is generative AI. So although I'm going to talk about ChatGPT in some more detail at the moment, I will flag that we've got text to image generation. So that's technologies like Midjourney like, Dall-e, like Stable Diffusion where we can use a prompt, text prompt to generate an image.
We also now have image to text and image to image where we use an image as the prompt and the machine reads the image and gives us an output. So for example, the next generation of ChatGPT will be able to take an image as a prompt, you give it an image and it will be able to describe what that image is. So image recognition technology. In Midjourney, which is image generation, you can already put an image into Midjourney and it will give you a text description of what it thinks that image looks like based again on its data. We're also seeing, and this will be discussed in the final session in this series, we're also seeing text to video and image to video multimodal technologies. And these kinds of generative AI, we'll take a text prompt and create a video from scratch or perhaps take an existing video and run it through an image generation processor to change it.
So like I say, we'll talk about those technologies more as we go through, particularly in the third and fourth sessions in this series. But it's well worth bearing in mind that generative AI is much, much broader than just ChatGPT. But this is the first session and this is a bit of an intro to these technologies. So we will start with ChatGPT, and we'll start with this analogy that I've been using for a little while now of the AI iceberg because I think it captures a lot of how ChatGPT actually functions. Now, like any analogy, it's got a few holes. This analogy has been the most successful one for me, and I really want you to start from the bottom up here and look at that big dataset that's sitting under the waterline. So just like an iceberg, the dataset is the unseen component of the large language model, which is the type of artificial application that we're looking at at the moment.
So a large language model or an LLM contains this huge dataset and that dataset comes from a variety of places. Now some of that information is proprietary and we don't know where it comes from, and that's particularly true of language models owned by companies like Google. They've got a palm language model. We don't know much about that training data. OpenAI we know a little bit more because they've included some of that information in previous releases of their documentation, but we don't know exactly what's in there. We do know that it has information from the Common Crawl, which is a repository of a vast swathe of the internet. So little bots go out and they crawl across the internet and they scrape the text content and that content is made a available, publicly available, open source. The whole of Wikipedia is in there as part of that data set.
So if you can imagine how big Wikipedia is and cast your mind back if you've been in education as long as I have to when Wikipedia first came out and that was going to destroy the world and destroy education. So this technology now contains the whole of Wikipedia within it and coding repositories like GitHub and social media sites like Twitter and Reddit. So there's a whole chunk of text data in a model like ChatGPT. Other models have other data sets, so Stable Diffusion and Midjourney will have image data in their datasets and some text data from labeled images. But the important thing to remember is that the data sets are tremendously large. So we're talking billions and billions and billions of instances of text in a GPT dataset. Now when I get into the ethics part, in a moment we'll see why that dataset is problematic as well as providing benefits.
But we'll start with the benefits. That dataset drives the language model, which is in this analogy here, the bit that's sitting above the waterline. So the bit of the iceberg that we can see. So a large language model is a neural network, a type of artificial intelligence that's trained on that huge amount of data that sits under the waterline. And it's trained to predict language and demonstrate knowledge based on the information in the dataset. So whatever's in that big chunk of iceberg under the waterline, the language model can draw on that information. It condenses and compresses that information. It looks for patterns, linguistics, syntactic patterns in the data, and it can then do things with that data. And this is where that kind of magic analogy starts to creep out because it does seem magical and essentially it's working like a massively scaled up version of your predictive texts that you might be familiar with in phones and emails.
However, instead of looking at the text word by word, the way that these models work is by taking the context of the whole prompt or the whole phrase or the whole bit of text that you are working with. So it doesn't just look from one word to the next and predict the most likely next word. It actually looks at the whole context and it predicts in the chunks what the most likely parts of the text are. So it is using the neural network and using the algorithm to make those predictions. And essentially, it's totally probabilistic and it's totally derived from those algorithms that are in the model. Now, there is a bit of additional training that then happens that goes on top of these to refine these models, and some of that training might result in applications like ChatGPT, which in my iceberg there is a tiny little snowman on the top.
Now the snowman actually came from ChatGPT itself. So I use this analogy for a couple of months, and I just had a little flag sitting on top of the iceberg to mark where the ChatGPT application sits. And I actually put the analogy into ChatGPT, and said, can you refine this analogy? And it suggested that itself should be represented as a snowman. So take that as you will. But the rationale for that is that the little ChatGPT model is sort of carved out of the larger model as a refined version of the existing model. So in the case of ChatGPT, the iceberg above the waterline would be GPT three or GPT 3.5, and then ChatGPT is further refined and carved out that language model. So they do that through processes called fine-tuning, and they might employ humans to test some of that data, which I'll talk about shortly when we get onto the bias conversation. Or they might process it through various other machine learning tasks to train and refine that data.
So again, we're talking about demystifying AI, and that was a reasonably technical discussion of how these technologies work. But hopefully not too technical because I do think that it's really important that we acknowledge the technology that sits behind these models and we start to get some understanding of these models, particularly when we're talking about them to students. And there's a few features of this iceberg analogy, which then become really useful for explaining how these things work and why they work and the way that they do. So you'll note that the sea there has got a little shark floating around in it. Now that's not accidental. If we take the ocean, we're really going to stretch this analogy thin now, by the way, so bear with me. But if we take this ocean analogy to include the whole internet, and we know that the internet is dark and full of horrors, we know that the internet is full of toxic and discriminatory content or sharks, and we know that the internet really makes up the body of that dataset.
So that means that all of that, or much of that harmful content from places like Twitter and Reddits and from the open parts of the internet make their way into the dataset. They make their way into that under the water iceberg level. So all of that, the dangerous and toxic content can be brought into the model. Now that creates all kinds of problems as you can imagine. We know that the internet is a bit of a dark and terrifying place at sometimes. We know that it's filled with content that is potentially discriminatory and harmful, and therefore these models end up containing a lot of that information. So that's my segue really into the ethical problems with some of these machines. But hold that iceberg analogy in your head as we're going forward, and it will help to explain some of these next slides.
So when I talk about AI ethics and I've got a series of blog posts that are rolling out at the moment on my blog, which is just leonfurze.com/blog. And I've been writing about nine areas of ethical concern with AI, and I probably could have written about 12 or 20 or 30 areas of concern, which is a worry. But in this session, I'm going to focus on four areas, and I'm going to talk about bias, environmental concerns, truth and issues with copyright, because I think they're particularly pertinent to education and also for what ACMI does and ACMI promotes. But the most pressing issue from my perspective is this issue of bias and discrimination. And the discrimination unfortunately in these models is baked into the dataset. So again, we'll come back to that iceberg analogy and the way that it scoops up or scrapes all of that information from the internet.
The majority of the web pages that contain the data that are scraped into the in into dataset are written from a white western male perspective, and that's because the majority of the English language pages on the internet are written by white Western males. So the preponderance of the content on the internet that's written in English is written by people like me, white, middle-aged English, or American, or maybe Australian, New Zealand, Canadian, but the vast majority are the white Western males. And that means that you've essentially trained a language model that has a worldview that the majority probably, and the estimations vary, but between 75 and 80% of the content comes from that perspective. Just like any worldview, just like a person's worldview, what goes in is what comes out. So all of the content that comes out of a model like GPT comes from naturally that perspective.
So that gives it a distorted worldview, it gives it a distorted perspective. Which is geared to a particular voice, a particular point of view, a particular way of looking at the world. Now, that's one level of bias and potential discrimination because we know that that means then when we get output from these machines, it's potentially going to discriminate against other races. It's potentially going to discriminate by gender and it might otherwise marginalize certain communities. The other part, and this goes back to the shark in that previous analogy, is that they deliberately harmful and toxic data on the internet also goes into the model. So whilst we have an inadvertent or an accidental white western male worldview, the dataset also contains deliberately toxic and harmful content. So that might be Twitter threads or Reddits commentary in particular, those places on social media where people are being deliberately racist, deliberately antagonistic, sexist, ableist, misogynist and so on.
All of that data's in there as well. And in fact, so much of that data is in there that can sway the output. So Microsoft in 2017 trained a bot on the Twitter corpus and within hours of releasing it was spouting extreme right wing racist content and they pulled the bot down and they never tried the experiment again. So we have an inadvertent level of discrimination. We have an almost deliberate layer of discrimination, and then we've even got one more level of discrimination, unfortunately, which is when the organizations behind these AIs try to remove or filter out some of that discrimination, they can actually make the problem even worse. So in trying to filter out the bias and in trying to make the worldview of these language models a little bit more equitable, companies like OpenAI, they have to put in guardrails. They have to train the machines specifically to act in a certain way.
So one way that they do this is problematic. They use human labor to train and label toxic data. So if you go online and search for a Time article that was published earlier this year, you'll see a case where companies like OpenAI had outsourced the labeling of toxic data to very low paid workers in the Global South. In countries like Kenya in particular. And these workers were paid far below a western minimum wage to manually flag and label data that was abusive and very traumatic in some cases. To the extent that some of these workers had to then undergo counseling and couldn't continue in those roles. Now, companies like OpenAI and the outsourcing agency that provided those services will argue that they're providing fair and reasonable wages and that they're creating jobs and all kinds of justifications, but at the end of the day, we have humans labeling potentially harmful data in the interest of filtering that data out for the users of applications like ChatGPT.
Another problem with the way that these models are filtered is that they often use a publicly available language filtering systems. So one of them is called the big naughty banned words list or something to that effect. They're not very imaginative with their names, but it's basically a publicly available list of censored words or curse words or swear words or improper words. But the people that compile these lists obviously have their own worldview and their own perspective on things, and we know that those words lists contain an inordinate amount of words which would be used more in certain communities. So there are a lot of words in those banned words list that might be used in chat rooms and social networks in the LGBTIQ communities that get flagged as inappropriate sexual language. We know that there are words which are associated with certain races and religions which get flagged as potentially terrorist or harmful language in those banned words lists.
So even by trying deliberately to remove or filter out the bias and discrimination in these models, these companies can inadvertently further marginalize groups such as the LGBTIQ community or such as certain religious communities. And that really the problem is so deeply baked into these models that it's almost inescapable. So we know that these models are informing technologies that are going to come into classrooms. We know that Microsoft base is its current iteration of technologies like Bing and Copilot on GPT four. So the question is what can we do about that in education? And I don't want us to feel powerless in the face of this huge problem, and there are lots of things that we can do. So first of all, we have the conversation with students, and I'll come back to these recommendations later, but we talk to them about the fact that this bias exists.
We'd be really clear about the worldview and the perspective that these technologies are coming from and the data that it's drawing on. We warn them about the potential for harm that exists in these systems, and we can also start encouraging people to look for models which are more ethically minded from the ground up. So models do exist beyond GPT and Palm and the big models by Google and OpenAI. One model is called Bloom, and that is an open source model that's been compiled by researchers, data scientists, and the open source community.
And they've deliberately tried to use source which don't infringe copyright, which aren't harmful, which don't marginalize certain communities. So that research is happening right now to try to not just mitigate the risks and not just to put guardrails around, but to actually rebuild these models from the ground up. So if you're interested in that angle and you're interested in what I've just said there around bias and discrimination, I would really encourage you to go and seek out some of the models like Bloom and open source communities like Hugging Face, which is a huge open source community. Because there are people actively working on building better language models from the ground up.
The biggest risk that we have in education I think, is that we will continue to use the well-established models that they've spent all of this money on training them. They've built them into systems like Microsoft and Google, and we won't get away from those models which are inherently flawed. So I think we've got to come at it from a slightly different perspective.
Now, I'll go through the next three ethical areas at a slightly quicker clip because they're not as pressing in education perhaps, but they are still very relevant. And students in particular, when I've worked with students like discussing the environmental impact of AI. So as a infrastructure, as an industry, there's a huge cost to training a model like GPT. So a model like GPT might consume as much energy as a small American town consumes in three months. Using something like ChatGPT consumes as much fresh water as a glass of water in the process of cooling the equipment, for example, and running the infrastructure that sits on the servers behind these machines. They use an incredible amount of energy in terms of computes to train and run the models and to fine tune and retrain the models, which means that the graphics processes and the machinery and the equipment that's running something like ChatGPT is really going madly and consuming a lot of energy and producing a lot of environmental waste in terms of its carbon footprint.
So again, there's a lot of research out there that's starting to back this up now, but one really accessible resource that I highly recommend is Kate Crawford's Atlas of AI, which is a book from a couple of years ago. Kate Crawford is an Australian academic, a researcher, she's worked with Microsoft. She's very knowledgeable and very insightful in these technologies. And what she talks about is not just the carbon footprint of AI and the cost of training the models, but even the cost of the entire infrastructure behind them. So the rare earth minerals that go into the technologies that run the AI are often extracted from mines, again, in the Global South and from the poorest parts of the world. Huge lithium mines across the world, including in Australia and in America, are used to make the batteries that power the technologies that use these services.
So we've got a lot of artificial intelligence built into our technologies like iPhone and all of the mining and extractive infrastructure that goes into making these technologies has a part to play in this conversation about AI. So again, keeping it attuned to what we do about this in education, we've got to promote a conscious use of artificial intelligence, not just a flippant or throwaway use. If we're using ChatGPT and we're not just using it to quickly knock off a couple of lesson plans or because we're in a bit of a hurry and we've got to leave an extra, so we get ChatGPT to do it for us. But we're actually consciously using these technologies to do really useful things. We know that they're powerful technologies. We know that they are useful, and I'm not suggesting that we stop using them or abandon using them in some kind of Luddite fashion. But we do need to acknowledge that they have an impact and we need to again, look for ways to work against the impact of those technologies.
Now, just like Bloom and the open source models that I mentioned with bias, there are things in the works to make these models more environmentally sustainable. There are activities by organizations like Google and OpenAI to neutralize the carbon footprint of these technologies, but there are also research communities that are working on making the models more efficient, which means they cost less energy to run. So again, it's about balance. It's about finding ways to identify and work with the technologies in ways which are more ethical than perhaps we would otherwise do. In an education context, we've obviously spoken a lot since November around truth and academic integrity around the idea that students are going to be cheating with these technologies. I tend to steer away from that narrative for a couple of reasons. So I think that it would be redundant to say that students aren't going to use these technologies first of all.
I know if I was a student, I would be using them technologies. If I was in secondary school and I was being given a piece of homework that ChatGPT could do, I would probably get ChatGPT to do that for me. That that means really that we have to reframe how we think about assessments with ChatGPT. We've got to look for opportunities for using ChatGPT that go beyond that, and I'll talk about that more in the second session in this series and how we can design assessment tasks. Which are a little bit more AI proof or maybe a bit more human friendly if you want to frame it that way. But we know that that narrative around cheating has been really prevalent in the media and that it's something that secondary K-12 and higher education are really, really concerned about.
But in terms of truth, and I've subtitled this side, truth and academic integrity, truth has a few other meanings when it concerns artificial intelligence. So first of all, this idea of hallucinations that you might have come across if you've read a bit around AI. A model like ChatGPT is designed to, as I mentioned earlier, take a chunk of text in the dataset and process it and give the probabilistic outcome of what you are looking for as a user. What a ChatGPT or a similar model doesn't have is something called ground truth. It's got no point to anchor itself to in reality, it will just completely spin out total gibberish. It will create facts or things that seem like facts, will fabricate references, and that's because of how it's designed. So some people call this a glitch. I say that it's not a glitch, it's actually it's a design feature of these models because they were never designed to provide truthful output.
So if you ask ChatGPT to write an essay and provide references, depending on which model you use, 3.5 or four, some of the references might be accurate, some will be totally made up. What we call hallucination or what many people call hallucination, I like to just call it a fabrication because I think hallucination anthropomorphize is a little bit too much and makes it sound like ChatGPT is thinking about something when it isn't. But those fabrications can be really compelling, and I think that's the biggest issue here, and that leads into the kind of fake news angle. So the potential for these language models to create misinformation is huge. And I think that's a much more pressing concern than whether students are going to use these technologies to cheat. So we've already seen these technologies, and I'm talking image and text and video generation and audio being used in the context of things like deep fakes to create likenesses of political figures.
So you may have seen Donald Trump getting arrested, that was a Midjourney image. You may have seen a fiery explosion near the Pentagon, which knocked $5 billion off the US stock market very briefly, which was again generated in Midjourney. But we've also got generative AI now that can really effectively emulate a person's voice. So if I was to record 10 or 15 minutes of my voice or use the recording from this session and put it into any applications which are commercially available at the moment. So one called Descript is one that springs to mind. It will generate for you a very, very accurate AI version of your voice. We've seen this already coming into play. We've seen right wing political agitators in the US creating videos of a post apocalyptic sort of Biden reelection future as propaganda. Which have included deep faked audio of Joe Biden saying things that he never said.
I've seen articles in the media where journalists have deliberately created clones of their own voices and then used them to access mobile phone and telephone banking services. So when they give you a voice identification on the bank because your voice ideas like your fingerprint according to the bank, that's no longer true because these deep fake AI voices can bypass those security features. On the run up to the 2024 elections in the US next year, we are going to see a huge amount of deep faked visuals, videos, audios and text. And so as a society almost, we're going to have to learn how to deal with that and how to understand when these AI systems are being used to create this content.
Now, I'm not monitoring the chat at the moment, but I can see there's a couple of questions in there, and this may well be one of them. But one big question around all of this is can AI content be detected? And from my point of view, it's a no. And that's now backed by some research which is starting to come out. Obviously this is a very new territory, so it takes a little while for peer reviewed research to come out, but there are a couple of articles coming out now that have tested software like Turnitin, GPT Zero, which you may have heard of. And they're finding that they are easy to get around, particularly with GPT four, the latest model, and also provide a lot of false positives. So I certainly wouldn't be encouraging any teachers to use something like Turnitin or GPT Zero as a means of catching a student or as a means of challenging a student's work.
Those technologies could be used as part of an academic integrity discussion. So I know that my university through my PhD uses Turnitin and encourages you to run your own content through Turnitin, and to look at the likelihood of it being AI generated and to reflect on that. But it's not used in a punitive way, it's used more as a way of just saying, hey, look, this looks like it may have been generated by AI. Maybe it was wasn't. If not, get on with your life. If it was, then maybe consider adjusting how you're using that technology.
And the last area that I want to talk about, and I'll return to this one in the final session around the future of video and the arts and video games is the issue of copyright. So that huge dataset, the huge below the waterline part of the iceberg contains a lot of data which hasn't necessarily or hasn't gotten copyright clearance. Although in the text data of ChatGPT, that's ostensibly openly available data. So the common crawl for example, doesn't go behind paywalls. You could still very validly argue that it is intellectual property. So for example, I've got a blog, like I mentioned, all of the text that's on my blog is publicly available, and that means that it could be scraped by the common crawl and go into a data set. Now, do I want my intellectual property as part of a data set and do I want a future model to be able to say, write me a blog post in the style of Leon Furze and for it to draw on that, possibly not as a creator.
We then scale that up in order of magnitude, I think when we get to the visual generators. So we'll get to the image generation because those data sets pull photography and arts from places like Flicker and Art Station with no recognition of the original artists. And because they're labeled, that allows you to generate an image in the style of another artist who may be living or dead, another photographer, a cartoonist, an animator, a computer graphics designer, anybody who's got data in that dataset.
Now there's a lot of copyright cases going on across the world. There are a lot of cases being brought against companies like OpenAI at various levels in various courts. As Australian copyright law currently stands a and to an extent European law, the collection of data by scraping is covered under research purposes and is seen as a legitimate and not in infringement of copyright. As for whether the images that come out of a generator can be copyrighted, that's also sort of up for debate. In Australian copyright law, you cannot copyright an image made by something like Midjourney because it doesn't count as having a significant enough human input. So that's where we are at the moment in Australia.
So just before the Q and A for this session, I'll wrap up this session by talking about AI and education again. Having given some of the ethical concerns and some of the lay of the land and how these technologies work, it's really important to acknowledge that it's already present in lots of applications and in many forms. If you're using a learning management system, it will probably have an AI feature in there or more than one. If you use an iPhone, you're using AI daily in location services, in predictive algorithms, in music, in Spotify, in Netflix, all of those apps. We've also got integration coming next month in fact, in a few days with Microsoft Copilot, which will be across the whole of its Windows 11 operating system. It will come into office, it will come into teams, Google Workspace with Duet, which will go into Google Sheets, Google Docs, Gmail, and other platforms.
So whether we like it or not, whether we think it's ethical or not, these technologies are coming into our classrooms. In the next sessions, we'll talk about some of the affordances and the potential of these technologies as well. I'd like to ground the discussion in the murky ethical side of things because I think it's good to start with a critical mindset, but we do need to also look for opportunities for ways to use these technologies positively. So we'll talk about teacher workload, we'll talk about personalization and the possibilities of brainstorming and creativity, and we'll talk about the ways that these could potentially reduce barriers to accessing creative processes for making visuals, text, videos, and all sorts of media.
So I will open the floor now to questions, and I think Zoe from ACMI is going to field a couple of questions at me. If you have questions, drop them into the Q and A. If we don't get to them, then I'll be addressing them in between sessions.
Zoe McDonald: I've got a few questions for you, Leon. The first one is where you sourced your graphics for your presentation?
Leon Furze: Yeah, absolutely. So I'm sure if I mentioned this briefly at the start, but most of the graphics were generated in Midjourney. So this graphic, this one is a stock image from Canva, and these background graphics were generated in Midjourney. Partially to demonstrate some of the capabilities of these technologies and those images were generated as part of the series of blog posts that I mentioned. So if you check out the blog, you'll see some of the detail of what goes into generating those images. So Zoe, I've just lost your audio there.
Zoe McDonald: Oops, sorry. Can you hear me now? Beautiful.
Leon Furze: Yes. Gotcha.
Zoe McDonald: Just another question, Bloom does not seem to be offering opportunities to use it, requesting a demo leads to page not found and the link, she's popped the link in there too. Can you recommend any other more ethical options if you can't access that one?
Leon Furze: Yeah, Bloom's been hit or miss because it is mostly a research project. But Hugging Face, which is the community led open source model, has a whole platform where you can access various different models. There are actually hundreds of language models under development from different places, all built on top of different kinds of technologies. Now, Hugging Face, again, it's not very user-friendly and this is where we're going to fall into the trap in education. I think because ChatGPT is so easy to use because we're already using Microsoft, we're already using Google products, we will sort of almost out of necessity end up using those much easier to use things. I just think it's important to be aware that others are available. So Hugging Face, which has got a little Hugging Face emoji, that's a open source community of people developing AI in all manner of modes.
Zoe McDonald: I have another question. How do you recommend referencing AI generated work?
Leon Furze: Yeah, lots of universities are now coming out with processes and policies, so I would look to higher education institutes. My university Deacon, for example, says not to cite ChatGPT as a primary source, but to acknowledge if you've used it as a tool. And if you use it for research purposes and it gives you citations, check them, go to the original source and cite the primary source. APA seven, which is the referencing system that I use, has updated its guidelines and Harvard and MLA have as well. So if you go to APA website or Harvard or MLA websites, they have guidance for sighting as well.
Zoe McDonald: Thanks, Leon. Another question. Given how little students have typically cared about the biases and issues with info sources like Wikipedia, do you think there will be any difference with ChatGPT, even if educators do tell them about the problems?
Leon Furze: It's an interesting question and I think sometimes with digital technologies we can bang the pot as loud as we like, and some of it sinks in and some of it doesn't. I think that really it's our responsibility to put it out there and then what the students choose to do with that knowledge is ultimately their decision and we can only do what's in our circle of influence. But certainly in my experience, and I've been working with a lot of schools, I do a lot of consulting with schools and I've been working on AI policy and things like that. So I've done a lot of student forums recently and when I talk to students, they are interested once you give them a bit of information, particularly around the hegemony, the white Western male perspective that I mentioned and some of the marginalization that these models can contribute to.
Zoe McDonald: We've got time for just one quick question, and this may be something you address next session, Leon. As writing teachers, how do you recommend we integrate these tools into our classes?
Leon Furze: Yeah, it's a great question and it is what the one that I'll focus on in the next session on June the 14th. So good segue into the next session. I am doing a lot of work because I'm a former English teacher and I'm on the Victorian Association for the Teaching of English Council. So my main recommendation is to get in there and experience using them for yourself as a teacher before going anywhere near using them in a classroom. And then in the future sessions we'll talk about some of those things like brainstorming, creativity, how they can be used for editing and how they can be used to compliment some of what we already do in the writing classroom.
Zoe McDonald: Wonderful. Thanks, Leon, and thank you so much to everyone for joining us this afternoon. It's been really eye-opening and definitely a lot of pertinent issues that are coming up for all of us. The next session Leon has up on his screen here, which is using ChatGPT in education, which will be on Wednesday the 14th of June, 3:45 to 4:35. So the same time slot. We'll be sending you out a Zoom link closer to the date, again a couple of days before, and then a nice reminder for you on the date so you'll be able to click straight in and join us then. We look forward to seeing you then and thanks so much, Leon as well for your time.
Leon Furze: Thank you very much and thank you everybody in the crowd. We'll see you soon.