There is no such thing as an effective "AI detector", nor will there ever be one.

Excel@lemmy.megumin.org · edit-2 1 year ago

There is no such thing as an effective "AI detector", nor will there ever be one.

Aux@lemmy.world · 1 year ago

The biggest issue with publicly available ML based text tools is that they’re American centric. Detection of ChatGPT in the UK is simple - it creates texts using American spelling. And if you live outside of English speaking world, like most humans do, it’s completely useless.

Zeppo@sh.itjust.works · 1 year ago

So far, yes, only because they’ve been developed in the US and therefore trained on US English text. Eventually someone can make models for other languages and regions, but it is a lot of work and very expensive.

jochem@lemmy.ml · 1 year ago

ChatGPT speaks other languages. It’s actually a really good translator.

I just asked it to describe an organization using UK English and it indeed used ‘organisation’ instead (didn’t check for other words).

MBM@lemmings.world · 1 year ago

Can it understand and create new compound words (in a language like German)? That’s an issue I have with most spell checks and translators as well, it’s forcing the language to be more like English

bleistift2@feddit.de · 1 year ago

Erfinde ein Rezept für Mohrrübensaftdressing [carrot juice dressing].

Rezept für Mohrrübensaftdressing: […] Beginne damit, frischen Mohrrübensaft herzustellen

I’d say it works.

jochem@lemmy.ml · 1 year ago

Looks like it does: https://chat.openai.com/share/1b487711-c1be-468a-877b-98091449b55e

I asked it to translate ‘meeting agreements’ to Dutch and it came up with the word ‘bijeenkomstafspraken’, which is a valid but very uncommon Dutch word (I’m native Dutch and don’t think I’ve seen it before). If I throw it into google with quotes around it, the first page is results with ‘bijeenkomst afspraken’, where ‘afspraken’ is used as the past tense of ‘afspreken’ (to agree) instead of as its noun (agreements).

It btw also suggested ‘vergaderafspraken’ as a translation, which is a way more common word.

MBM@lemmings.world · 1 year ago

That’s nice, thanks for checking. I thought ChatGPT only worked at the level of whole words but it seems it chops them up internally.

jochem@lemmy.ml · 1 year ago

Correct, it’s not just regurgitating words, it’s predicting which token comes next. A token is sometimes a whole word, but for longer ones it’s part of a word (and some other rules that define how tokenization works).

How it knows which token comes next is why the current generation of LLMs is so impressive. It seems to have learned the rules the underpin our languages, to the point that it seems to even understand the content. It doesn’t just know the grammer rules (without anyone telling it, it just learned the patterns), it also knows which words belong to each other in which context.

It’s your prompt + some preset other context (e.g. that it is an OpenAI LLM) that creates that context. So being able to predict a token correctly is one part, the other is having a good context. This is why prompt engineering quickly became a thing. This is also why supporting bigger contexts is another thing (but a larger context requires way more processing power, so there’s a trade-off there).

It’s btw not just the trained model + context that gives you the output of ChatGPT. I’m pretty sure there are layers before and after, possibly using other ML models, that filter content or make it more fit for processing. This is why you can’t ask it how to make bombs, even though those recipes are in its training set and it very likely can create a recipe based on that.

Spzi@lemm.ee · 1 year ago

I just ran a quick experiment, asking it to create compound German words. Easy. But maybe it just memorized them? So I asked it to create double-compound words (consisting of 2*2 = 4 words), and it delivered:

Natürlich! Hier sind einige zusammengesetzte Wörter aus zusammengesetzten Wörtern:

Sonnenblumenfeldweg

Wassermelonensaftglas

Handschuhfachdeckel

Zahnarztpraxisrezeption

Flugzeugsitzplatzreservierung

Straßenbahnhaltestellenschild

Regenschirmgriffpolster

Apfelbaumpflanzloch

Fernseherfernbedienungsknopf

Hausaufgabenerledigungszeit

Diese Wörter mögen etwas ungewöhnlich sein, aber sie folgen den Regeln der deutschen Sprache und sind zusammengesetzt aus jeweils vier Wörtern.

They all seem legit. Most are rare. I think this was the first time I read “Wassermelonensaftglas” as a native German reader. Sadly, the next iteration (compound words made from compound words made from compound words) failed, it just added one more word.

Flying Squid@lemmy.world · 1 year ago

Of course when one of these grand mountain ranges goes stretching across the printed page, it adorns and ennobles that literary landscape–but at the same time it is a great distress to the new student, for it blocks up his way; he cannot crawl under it, or climb over it, or tunnel through it. So he resorts to the dictionary for help, but there is no help there. The dictionary must draw the line somewhere–so it leaves this sort of words out. And it is right, because these long things are hardly legitimate words, but are rather combinations of words, and the inventor of them ought to have been killed. They are compound words with the hyphens left out. The various words used in building them are in the dictionary, but in a very scattered condition; so you can hunt the materials out, one by one, and get at the meaning at last, but it is a tedious and harassing business. I have tried this process upon some of the above examples. “Freundshaftsbezeigungen” seems to be “Friendship demonstrations,” which is only a foolish and clumsy way of saying “demonstrations of friendship.” “Unabhaengigkeitserklaerungen” seems to be “Independencedeclarations,” which is no improvement upon “Declarations of Independence,” so far as I can see. “Generalstaatsverordnetenversammlungen” seems to be “General-statesrepresentativesmeetings,” as nearly as I can get at it–a mere rhythmical, gushy euphuism for “meetings of the legislature,” I judge. We used to have a good deal of this sort of crime in our literature, but it has gone out now. We used to speak of a things as a “never-to-be-forgotten” circumstance, instead of cramping it into the simple and sufficient word “memorable” and then going calmly about our business as if nothing had happened. In those days we were not content to embalm the thing and bury it decently, we wanted to build a monument over it.

– Mark Twain, A Tramp Abroad

Spzi@lemm.ee · 1 year ago

Okay, interesting. Of course it would be nice if languages were easy to understand and easy to learn. German seems to be on the hard end of this spectrum, but no language is free from unecessary complications like these. They all grew historically and organically, and were not constructed with accessibility in mind.

It is nearly impossible to get an objective view on languages, since each of us is inherently biased, and most of us don’t speak another language so well that we could truly judge it. It’s easy to spot silly things in other languages while we may be unaware of how difficult our mother language is to learn for foreigners.

The interpretation of the given examples feels wrong for me. While the technical part is correct, I think the conclusion is incorrect. For example, “Unabhaengigkeitserklärung” emphasizes the independence, while “Erklärung von Unabhängigkeit” emphasizes the demonstration. The two are not equivalent. Twain seemed to be ignorant about that and simply assumed a foreign language would follow the same rules as his own.

While I can understand Twains frustration in learning another language, his critique is based on a lack of understanding.

For some compound words, there is no straightforward equivalent. “Apfelbaum” (apple tree) could be “Baum, an dem Äpfel wachsen” (tree on which apples grow). But that leaves the question wether it’s still an Apfelbaum when it does not grow apples in this moment, like in winter. “Baum des Apfels” (tree of the apple) can refer to a miniature tree on an apple. “Baum der Äpfel” (tree of the apples) might be okay.

Further, what he believes to be superior can sometimes be inferior. Consider cases like “The presentation on renewable energy technology investors.” In this sentence, it’s not clear whether “renewable energy technology” is a single entity modifying “investors,” or if “renewable energy” and “technology investors” are separate entities, both modifying “presentation.” The sentence could refer to a presentation for investors interested in renewable energy technology or to a presentation about investors who focus on renewable energy projects. Compound words prevent ambiguities like these.

Hyphens can help in these cases. They can also be used in German to make it easy to identify compound components, like it’s required in https://en.wikipedia.org/wiki/Leichte_Sprache.

We used to speak of a things as a “never-to-be-forgotten” circumstance, instead of cramping it into the simple and sufficient word “memorable”

That’s another interesting point to discuss. Which is easier for foreigners? Sure, a single, short word in itself is easy to learn. But it is a new word, which has to be learned. In this case, you have to learn which part of “memory” or “memorize” can be used, and which part must be replaced.

I also don’t think “memorable” has the same meaning as “never-to-be-forgotten”. Isn’t “memorable” more fitting for positive things, while n-t-b-f is well suited for negative things? Was the Holocaust ‘memorable’?

Flying Squid@lemmy.world · 1 year ago

Honestly, I posted it more because I thought it was funny than anything. I didn’t expect such a deconstruction, but it’s interesting!

MBM@lemmings.world · 1 year ago

@ChatGPT@lemmings.world Was bedeuten diese Wörter?

Sonnenblumenfeldweg
Wassermelonensaftglas
Handschuhfachdeckel
Zahnarztpraxisrezeption
Flugzeugsitzplatzreservierung
Straßenbahnhaltestellenschild
Regenschirmgriffpolster
Apfelbaumpflanzloch
Fernseherfernbedienungsknopf
Hausaufgabenerledigungszeit
Wassermelonensaftweg

ChatGPT@lemmings.world · 1 year ago

Sicher! Hier sind die Bedeutungen:

Sonnenblumenfeldweg: Ein Weg, der durch ein Feld von Sonnenblumen führt.
Wassermelonensaftglas: Ein Glas, das mit Wassermelonensaft gefüllt ist.
Handschuhfachdeckel: Der Deckel eines Handschuhfachs, normalerweise in einem Fahrzeug gefunden.
Zahnarztpraxisrezeption: Der Empfangsbereich einer Zahnarztpraxis.
Flugzeugsitzplatzreservierung: Der Vorgang oder die Bestätigung der Reservierung eines Sitzplatzes in einem Flugzeug.
Straßenbahnhaltestellenschild: Ein Schild, das eine Straßenbahnhaltestelle kennzeichnet.
Regenschirmgriffpolster: Ein Polster am Griff eines Regenschirms für zusätzlichen Komfort.
Apfelbaumpflanzloch: Ein Loch, das zum Pflanzen eines Apfelbaums ausgegraben wurde.
Fernseherfernbedienungsknopf: Ein Knopf auf einer Fernbedienung, um einen Fernseher zu steuern.
Hausaufgabenerledigungszeit: Die Zeit, die für die Erledigung von Hausaufgaben vorgesehen ist.
Wassermelonensaftweg: Könnte auf einen Weg hinweisen, in dessen Verlauf Wassermelonensaft verkauft oder verzehrt wird, aber dies ist eher ein Zusammenbau von Wörtern und hat keine verbreitete oder standardmäßige Bedeutung.

MBM@lemmings.world · 1 year ago

Neat, that’s a pretty reasonable definition for all of them, even for #11 (which I’m pretty sure isn’t in its dataset, unlike the others which all have some matches on Google)

hemko@lemmy.world · 1 year ago

@chatgpt translate above message to English

Aux@lemmy.world · 1 year ago

Well, I haven’t used it for a while and according to comments below it changed a lot. So I stand corrected.

There is no such thing as an effective "AI detector", nor will there ever be one.

There is no such thing as an effective "AI detector", nor will there ever be one.

What’s an “AI detector”?

What does “effective” mean?

Why should the accuracy bar be so high? Isn’t anything better than a coin flip good enough?

Why can’t a good AI detector be built?

Why do these “AI detectors” keep getting advertised if they don’t work?