• 0 Posts
  • 42 Comments
Joined 1 year ago
cake
Cake day: July 8th, 2023

help-circle
  • If you actually read the article you will see that they tested both allowing the students to ask for answers from the LLM, and then limiting the students to just ask for guidance from the LLM. In the first case the students did significantly worse than their peers that didn’t use the LLM. In the second one they performed the same as students who didn’t use it. So, if the results of this study can be replicated, this shows that LLMs are at best useless for learning and most likely harmful. Most students are not going to limit their use of LLMs for guidance.

    You AI shills are just ridiculous, you defend this technology without even bothering to read the points under discussion. Or maybe you read an LLM generated summary? Hahahaha. In any case, do better man.



  • And that’s the whole point of my comment, did you even read it? To summarize, there is currently a loophole in law that allows these bullshit arguments about it being different than straight up copying shit (though this haven’t been litigated yet, so it’s not yet clear if these arguments are actually valid). This means that while a person reading my AGPL code and copying it (without following the license) is 100% illegal, doing the same through an LLM may be legal. So this means that open source licenses can be bypassed by first training an LLM with the code and then extracting the code from the LLM. This is terrible for open source, and in general for anyone who wants to make a living from creating copyrighted work. So we should close this loophole, and I’m glad there is a push to close this through better laws. Even if these laws are comming from Disney, Sony, and all those awful companies.

    So again, what’s the point you are trying to make here? That we shouldn’t make these laws stronger to prevent this bullshit? I honestly don’t understand what you are trying to argue here, nothing of what you have said has anything to do with this conversation.




  • Nah my man, you are either brainwashed or are being paid hahaha. Is copyright a mess? Of fucking course, I haven’t meet a single person (except crazy ass libertarians funnily enough hahaha) that likes copyright. Are big corporations using copyright to exploit artists, create monopolies, and generally being dicks? Again, of fucking course.

    But anyone, like you, saying that we should just let AIs destroy copyright effectively is a fucking prick, that simple. And your agruments are dissingenous at best or outright lies. For example, just as big copyright holder companies are pushing to strengthen copyright law, the big tech companies are pushing for effectively destroying copyright through AI models. I have seen you pushing in multiple thread for open source models like that’s a solution. But if you were a serious person researching about the software open source community you would see that pretty much no one there agrees with your position because it would effectively destroy the copyleft open source licenses. After all, if an “AI” model, open source or not, is allowed to just “train” on my AGPL code and spit it back (with minor modifications at best) to an engineer in AWS that’s it for my project. Amazon will do the Amazon thing and steal the project. So say goodbye to any software freedom we have.

    And let’s be 100% clear here, this is not being pushed by the evil copyright holders like you seem to imply (and they are totally evil just to be clear hahahah). This is being pushed by the big tech companies and people like you spreading their propaganda. The fact that the copyright holders happen to be in the right this time is just a broken clock being right and all that, but it’s still good that they are pushing back to big tech. I do agree we have to keep an eye on them, the objective here can’t be to make copyright bigger, just to close the “loophole” that big tech companies are exploting to steal everything.

    People like you who want to destroy copyright without offering any alternatives to allow creatives to work in a market are either missinformed or just assholes. Again, of fucking course it’s not an ideal system, but going full kamikaze and just destroying any possibility for artists and creatives of making a living with their work is the most evil thing goung on right now, so bad that the big copyright holders happen to fall on the less bad side this time hahaha. And all for what? So people can be lied to by dumb chatbots? Or so people can create mediocre derivative “art” without putting any effort? Or so we can get mediocre code autocomplete that is subtly wrong all the time? Is fucking ridiculous.







  • Probably too late, but just to complement what others have said. The UEFI is responsible for loading the boot software thst runs when the computer is turned on. In theory, some malware that wants to make itself persistent and avoid detection could replace/change the boot software to inject itself there.

    Secure boot is sold as a way to prevent this. The way it works, at high level, is that the UEFI has a set of trusted keys that it uses to verify the boot software it loads. So, on boot, the UEFI check that the boot software it’s loading is signed by one of these keys. If the siganture check fails, it will refuse to load the software since it was clearly tampered with.

    So far so good, so what’s the problem? The problem is, who picks the keys that the UEFI trusts? By default, the trusted keys are going to be the keys of the big tech companies. So you would get the keys from Microsoft, Apple, Google, Steam, Canonical, etc, i.e. of the big companies making OSes. The worry here is that this will lock users into a set of approved OSes and will prevent any new companies from entering the field. Just imagine telling a not very technical user that to install your esoteric distro they need to disable something called secure boot hahaha.

    And then you can start imagining what would happen if companies start abusing this, like Microsoft and/or Apple paying to make sure only their OSes load by default. To be clear, I’m not saying this is happening right now. But the point is that this is a technology with a huge potential for abuse. Some people, myself included, believe that this will result in personal computers moving towards a similar model to the one used in mobile devices and video game consoles where your device, by default, is limited to run only approved software which would be terrible for software freedom.

    Do note that, at least for now, you can disable the feature or add custom keys. So a technical user can bypass these restrictions. But this is yet another barrier a user has to bypass to get to use their own computer as they want. And even if we as technical users can bypass this, this will result in us being fucked indirectly. The best example of this are the current Attestation APIs in Android (and iOS, but iOS is such a closed environment that it’s just beating a dead horse hahahah). In theory, you can root and even degoogle (some) android devices. But in practice, this will result in several apps (banks in particular, but more apps too) to stop working because they detect a modified device/OS. So while my device can technically be opened, in practice I have no choice but to continue using Google’s bullshit. They can afford to do this because 99% of users will just run the default configuration they are provided, so they are ok with losing the remaining users.

    But at least we are stopping malware from corrupting boot right? Well, yes, assuming correct implementations. But as you can see from the article that’s not a given. But even if it works as advertised, we have to ask ourselves how much does this protect us in practice. For your average Joe, malware that can access user space is already enough to fuck you over. The most common example is ransonware that will just encrypt your personal files wothout needing to mess with the OS or UEFI at all. Similarly a keylogger can do its thing without messing with boot. Etc, etc. For an average user all this secure boot thing is just security theater, it doesn’t stop the real security problems you will encounter in practice. So, IMO it’s just not worth it given the potential for abuse and how useless it is.

    It’s worth mentioning that the equation changes for big companies and governments. In their case, other well funded agents are willing to invest a lot of resources to create very sofisticated malware. Like the malware used to attack the nuclear program plants in Iran. For them, all this may be worth it to lock down their software as much as possible. But they are playing and entirely different game than the rest of us. And their concerns should not infect our day to day lives.


  • I figure since big tech spent quite a bit of money building those datasets and since they were built before the law, they will be able to keep using them as long as they don’t add anything new but I can’t be certain.

    This is a very weird assumption you are making man. The quoted text you sent above pretty much says the opposite. It says everyone who wants to train their models wirh copyrigthed data needs to get permission from the copyright holders. That is great for me period. No one, not a big company nor the open source community, gets to steal the work of people producing art, code, etc. I honestly don’t get why you assume all the data scrapped before would be exempt. Again, very weird assumption.

    As for ML algorithms having use, of course they have. Hell, pretty much every company I have worked with has used them for decades. But take a look at the examples you provided. None of them requires you or your company scrapping a bunch of information from randoms on the internet. Specially not copyrighted art, literature, or code. And that’s the point here, you are acting like all of that stops with these laws but that’s ridiculous.


  • So you are saying that content scraped before the law is fair game to train new models? If so it’s fucking terrible. But again, I doubt this is the case since this would be against the interests of the big copyright holders. And if it’s not the case you are just creating a storm in glass of water since this affects the companies too.

    As a side point, I’m really curious about LLM uses. As a programmer the only useful product I have seen so far is copilot and similar tools. And I ended up disabling the fucking thing because it produces too much garbage hahaha. But I’m the first to admit I haven’t been following this hype cycle hahahaha, so I’m really curious what the big things will be. You clearly know so much, so want to enligten me?



  • My man, I think you are mixin a lot of things. Let’s go by parts.

    First, you are right that almost all websites get some copyright rights when you post on their platforms. At best, some license the content as Creative Commons or similar licenses. But that’s not new, that has been this way forever. If people are surprised that they are paying with their data at this point I don’t know what to say hahaha. The change with this law would be that no one, big tech companies or open source, gets to use this content for free to train new models right?

    Which brings me back to my previous question, this law applies to old data too right? You say “new data is not needed” (which is not true for chat LLMs that want to include new data for example), but old data is still needed to use the new methods or to curate the datasets. And most of this old data was acquired by ignoring copyright laws. What I get from this law is that no one, including these companies, gets to keep using this “illegaly” acquired data now right? I mean, I’m pretty sure this is the case since movie studios and similar are the ones pushing for this law, they will not go like “it’s ok you stole all our previous libraries, just don’t steal the new stuff” hahahaha.

    I do get your point that the most likely end result is that movie studios, record labels, social media platforms, etc, will just start selling the rights to train on their data and the only companies who will be able to afford this are the big tech companies. But still, I think this is a net possitive (weird times for me to be on the side of these awful companies hahaha).

    First of all, it means no one, including big tech companies, get to steal content that is not theirs or given to them willingly. I’m particularly interested in open source code, but the same applies to indie art and any other form of art outside of the big companies. When we say that we want to stop the plagiarism it’s not a joke. Tech companies are using LLMs to attack the open source community by stealing the code under the excuse of LLMs being transformative (bullshit of course). Any law that stops this is a possitive to me.

    And second of all, consider the 2 futures we have in front of us. Option one is we get laws like this, forcing AI to comply with copyright law. Which basically means we maintain the current status quo for intellectual property. Not great obviously, but the alrtenative is so much worse. Option two is we allow people to use LLMs to steal all the intellectual property they want, which puts an end to basically any market incentives to produce art by humans. Again, the current copyright system is awful. But why do you guys want a system were we as individuals have to keep complying with copyright but any company can bypass that with an LLM? Or how do you guys think this is going to pan out if we just don’t regulate AI?


  • Maybe I’m missing something, but I don’t understand what you guys mean by “the river cannot be dammed”. The LLM models need to be retrained all the time to include new data and in general to get them to change their behavior in any way. Wouldn’t this bill apply to all these companies as soon as they retrain their models?

    I mean, I get the point that old models would be exempt from the law since laws can’t be retroactive. But I don’t get how that’s such a big deal. These companies would be stuck with old models if they refuse to train new ones. And as much hype as there is around AI, current models are still shit for the most part.

    Also, can you explain why you guys think this would stop open source models? I have always though that the best solution to stop these fucking plagiarism machines was for the open source community to create an open source training set were people contribute their art/text/whatever. Does this law prevents this? Honestly to me this panic sounds like people without any artistic talent wanted to steal the work of artists and they are now mad they can’t do it.



  • And Apple has earned any trust? Jesus christ people, like less than 2 months ago they were caught restoring “deleted” photos from iCloud to user devices hahahahaha. Of course fans were excusing them talking about disk sectors like that has anything to do with cloud storage being available accidentally hahahaha.

    But yeah, Apple cult followers will find a way to justify surrendering even more freedom to Apple with this BS for sure. And they will be paying top dollar for the pleasure hahahaha.


  • What a load of BS hahahaha. LLMs are not conversation engines (wtf is that lol, more PR bullshit hahahaha). LLMs are just statistical autocomplete machine. Literally, they just predict the next token based on previous tokens and their training data. Stop trying to make them more than they are.

    You can make them autocomplete a conversation and use them as chatbots, but they are not designed to be conversation engines hahahaha. You literally have to provide everything in the conversation, including the LLM previous outputs to the LLM, to get them to autocomplete a coherent conversation. And it’s just coherent if you only care about shape. When you care about content they are pathetically wrong all the time. It’s just a hack to create smoke and mirrors, and it only works because humans are great at anthropomorphizing machines, and objects, and …

    Then you go to compare chatgpt to literally the worst search feature in google. Like, have you ever met someone using the I’m feeling lucky button in Google in the last 10 years? Don’t get me wrong, fuck google and their abysmal search quality. But chatgpt is not even close to be comparable to that, which is pathetic.

    And then you handwave the real issue with these stupid models when it comes to search results. Like getting 10 or so equally convincing, equally good looking, equally full of bullshit answers from an LLM is equivalent to getting 10 links in a search engine hahahaha. Come on man, the way I filter the search engine results is by reputation of the linked sites, by looking at the content surrounding the “matched” text that google/bing/whatever shows, etc. None of that is available in an LLM output. You would just get 10 equally plausible answers, good luck telling them apart.

    I’m stopping here, but jesus christ. What a bunch of BS you are saying.