AI Content Policy

March 29, 2024 at 06:31 PM

First of all, my sincere apologies for cncorrect for briefly banning them over AI generated content, when I realize that we don't actually have a policy about that.

Second, though, I do think we need to have a policy about AI generated content, and wanted to open up discussion of what that policy might look like. I think we could probably all agree that simply typing an question into ChatGPT or whatever and then posting the response as one's own work shouldn't be allowed, and likewise any user who primarily posts AI-generated content is not contributing very much and probably should not be allowed to carry on doing that.

However, I can think of a number of cases where AI-generated content *that is cited as AI generated content* in a larger discussion might make sense; for example, contrasting answers from multiple sources, or from sources that some users might not have access to. Likewise, AI-generated *translations* seem like they probably ought to be OK - again, as long as they're cited as such - in the context of a larger discussion about the meaning of something-or other. And of course the use of AI-generated translation is entirely appropriate for users posting in English when their native language is not English.

So the broad outline of what I'm thinking is that it's fine to use AI as a tool to answer questions, but you have to say that you're doing so, and you have to be contributing some original thoughts of your own too. And AI translation is pretty much always OK, but I don't think you should be pasting Chinese text into Google Translate and copying the English result and passing it off as your own work. Does that make sense?

I also wonder if - in terms of maintaining the future health of the Chinese-Forums archives - it might make sense to have a specific tagging system of some kind for AI content?

March 29, 2024 at 10:08 PM

On 3/29/2024 at 6:31 PM, mikelove said:

And AI translation is pretty much always OK, but I don't think you should be pasting Chinese text into Google Translate and copying the English result and passing it off as your own work.

I'm not sure exactly what you mean by this.

In my opinion, I agree, we don't want lots of AI generated content that is of no interest or relevance to anyone.

Having said that, I don't see a problem if people wish to use AI to help with their posts, whether that be translating from their native language into English, just improving the readability of their English, or actually discussing the AI-generated content as the topic of the conversation. I agree it would be nice if people could cite that they are using AI generated content where it is relevant, but at the same time, this is a public discussion forum, not an academic assessment board, so I think it would be a bit heavy-handed to insist that people declare their use of AI with the threat of a ban, besides which, it would be difficult to police anyway.

March 29, 2024 at 10:21 PM

We have a newly-instated policy, comparable to what you mentioned, at the Chinese Stack Exchange (here).

The thing is, AI-generated content can be used to facilitate language learning (they're LLMs after all---the second L is "language"). They usually have a very broad knowledge of both English and Chinese, and are particularly good at some things (like generating large numbers of example sentences, explaining the difference between a Chinese word and an English word) and are particularly bad at other things (like pinyin and adding references). If you don't believe what your AI says, you can ask them to prove their claims (or use another method). I also find AI useful for quickly generating python code or unix commands for manipulating Chinese corpora (like lists of words and so on).

Personally I find it enriching to have multiple sources or points of view, and to compare where they agree and/or disagree.

AI chatbots have been around for decades (such as XiaoIce), but the quality hasn't been good enough to be useful until now. Now they have made themselves useful. There's a bunch of open-access academic articles here about the role of ChatGPT in learning/teaching Chinese. Teachers are using ChatGPT to offload the more tedious teaching tasks, and students are using ChatGPT for immediate assistance.

At the same time...

On 3/30/2024 at 2:31 AM, mikelove said:

I think we could probably all agree that simply typing an question into ChatGPT or whatever and then posting the response as one's own work shouldn't be allowed, and likewise any user who primarily posts AI-generated content is not contributing very much and probably should not be allowed to carry on doing that.

Yeah, it's this mindless copy/paste usage that's problematic and needs moderating. At Stack Exchange people would downvote and they'd automatically get blocked from posting.

March 29, 2024 at 10:34 PM

On 3/29/2024 at 6:08 PM, anonymoose said:

improving the readability of their English

Yes, sorry, I left out that one but "AI as a way of making your own thoughts easier for others to understand" seems fine.

On 3/29/2024 at 6:21 PM, becky82 said:

Yeah, it's this mindless copy/paste usage that's problematic and needs moderating.

This is the main thing I'm worried about, particularly with new users. As Chinese-Forums is independent and doesn't confer any reputation points / legitimacy that might enable you to post spam somewhere else, I suppose the incentive to do so here is limited, but nevertheless we do see quite a few attempted first posts that are obviously AI generated.

March 30, 2024 at 01:13 AM

I think it would be great if we can develop a consensus on uses of AI here that are clearly OK and uses that are clearly not OK. Anyone want to try to develop a first draft of OK and not-OK AI? Preferably in simple-to-understand bullet points.

As AI develops, we'd have to revisit the policies, of course.

March 30, 2024 at 10:21 AM

I hope that first and foremost the forum is able to ensure that all original threads are started by humans. I've heard about places like reddit getting threads started by AI in order to scrape information, which I think would undermine the positive, helping atmosphere here. C-F is a great place because it allows new students of the language to connect with an experienced community of learners who have been-there-done-that or are-there-doing-it-right-now. The exchange between advanced/intermediate and beginner students of the language is rewarding for both sides; I feel like while an AI coming in and posting could in theory provide helpful information, I fear it would actually undermine the sense and perception that we are connecting with a real community of people through this hub.

March 30, 2024 at 06:21 PM

Let's see... A first draft, I can edit this as we go:

Yes: To articulate one's own thoughts more clearly.

Yes: To generate a few example sentences, as long as this is relevant in the thread, and not to an excessive amount, and poster should mention the sentences are LLM-generated.

Yes: To generate other useful text. Poster should mention the text is LLM-generated and should stand for the text to be correct/good/effective. Becky mentions 'quickly generating python code or unix commands for manipulating Chinese corpora', that sounds like the kind of things that can be a useful addition, as long as that code has been tested by a person and found good.

No: As initial question, just for the sake of making a post.

No: To generate an answer just for the sake of answering, when poster doesn't actually know the answer themself.

March 30, 2024 at 06:36 PM

In some cases, I disagree with Mike.

* contrasting answers from multiple sources, or from sources that some users might not have access to: an LLM is not a source of information, it is a language generator. LLMs usually don't give a reliable citation/source for any information that appears in the text they generate. Unless this information is outdated already, LLM-generated text should not be used to contrast answers from multiple sources; the poster should just find the sources themself (or ask someone to do so for them).

* Likewise, AI-generated *translations* seem like they probably ought to be OK - again, as long as they're cited as such - in the context of a larger discussion about the meaning of something-or other: I disagree, LLM-generated translations are not a good source for determining the meaning of something. They are useful in making translations, just like Google Translate is, but not in producing reliably good and correct translations. If I make a post asking 'What does X mean', I don't want a reply that says 'Google Translate says it's Y', and likewise a reply of 'LLM says it's Z' is of little value to me.

In other words: for me, using an LLM to generate language/text is alright; using it to generate information is very much not okay.

March 30, 2024 at 06:52 PM

I like that last line as a succinct summary - language/text generation OK, information generation not OK.

And sorry, that whole first post was very sloppy - by 'multiple sources' I was imagining a situation where it would be interesting to see several different chatbots' take on something, not so much as a source of information but as a way of evaluating them as tools (since, like it or not, a great many people have started using them for language learning); "here's how these 3 different chatbots describe the difference between 不但/不仅/不只 and then here's an actual human-written take," that sort of thing.

March 31, 2024 at 01:03 AM

This topic has been discussed a lot over at Stack Exchange. These are the main problems mentioned there:

- Overwhelming. Flooding the site with easily generated content, at the expense of other content.

- Deceptive. The reader naturally assumes contributions are human-made; copy/pasting ChatGPT without attribution is plagiarism. These posts can trick people into thinking non-experts are experts.

- Unconfirmed. The reader naturally assumes the author of a post believes its contents are correct (and they have some ability to verify its correctness), which is probably not true for ChatGPT copy/pasted posts.
- Lazy. Thoughtlessly copy/pasting ChatGPT without adding your own (human) contributions; it can feel dismissive and rude.

- Difficult to debunk. ChatGPT's faulty claims can take time to disprove, and can mislead people in the meantime. [This is less of a problem here, where we can just read it and check. It's more of a problem at, say, Stack Overflow since you need to have particular software in order to check.]

It also leads to secondary problems, like users wasting time identifying and moderating ChatGPT-generated content.

March 31, 2024 at 08:45 AM

@mikeloveIt doesn't matter. That's an interesting topic to discuss.

As for the instance, I wrote the answer with my own personal knowledge and thinking. But my English skills are not good enough. I feel it challenging to write long paragraphs in English. And searching for the basic background information about the songs is time consuming. So, I decided to use the AI generated song introduction. But I had checked the content and thought it might be helpful for the users.

In my opinion, any content, whether they are AI generated or not, can solve a problem is a good one and should be encouraged.

Is AI content bad for the SEO of the site? Do you have any insights?

I often need AI revise my English writing to make it sound more natural. But I wrote this message without it. Do you find it difficult to understand me?

March 31, 2024 at 10:32 AM

I think many of us will have noticed a Chinese tendency to reach for the accepted knowledge and present it:

- "I'm going to X"

- "X is a beautiful place and has A, B and C"

- "Oh have you been?"

- "No."

I think this is a perfectly normal human thing to do but tends to be taken further in China (or perhaps just in Chinese chat with foreigners) than I'm used to. I get it, it may just be an opening gambit, and perhaps it's partly coming from a modest reluctance to say "I think that, I think this, aren't I clever and original". But I don't particularly like it and it strikes me that AI (specifcally, @Lu's AI info-generation) is an extreme example of it, and I don't really see any benefit on here at this stage.

March 31, 2024 at 10:42 AM

On 3/31/2024 at 4:45 PM, cncorrect said:

In my opinion, any content, whether they are AI generated or not, can solve a problem is a good one and should be encouraged.

It's not super important, but my feeling is that often people aren't looking simply to "solve a problem", but actually to hear what other people think is the solution.

Your post about the songs seems like an interesting example: in your first sentence you used your own knowledge to answer the question, which is exactly what the OP wanted. But then the AI text that mostly followed was probably quite off-putting because (a) it is AI text and (b) it is making general statements ("the song is characterised by....") which may or may not be how you feel about the songs personally.

March 31, 2024 at 12:23 PM

On 3/31/2024 at 10:45 AM, cncorrect said:

I wrote this message without it. Do you find it difficult to understand me?

Not at all, you write very clearly. I understood every sentence, and the overall post, very well.

On 3/30/2024 at 7:52 PM, mikelove said:

by 'multiple sources' I was imagining a situation where it would be interesting to see several different chatbots' take on something, not so much as a source of information but as a way of evaluating them as tools (since, like it or not, a great many people have started using them for language learning); "here's how these 3 different chatbots describe the difference between 不但/不仅/不只 and then here's an actual human-written take," that sort of thing.

Ah okay, I understand now. But I still disagree. If someone wants to know the difference between 不但/不仅/不只, an LLM is not the place to go for an answer, because an LLM generates language, not information. Give me the human-written take, give me a quote from a human-written textbook, give me an example sentence from the latest web novel you've been reading, or just what your gut says: all that is informative.

Now if someone wants to compare different LLMs and the texts they generate for various prompts, they are free to do so, but I think they should do so elsewere. This is Chinese-forums, not LLM-forums.

As someone wrote elsewhere, that I thought was illuminating: if you ask an LLM a question, the text it produces is not the answer to that question, it's what the answer to the question would look like. And LLMs are very good at creating texts that look like answers. But those are not actual answers.

March 31, 2024 at 05:41 PM

Quote

an LLM generates language, not information.

Quote

LLMs are very good at creating texts that look like answers. But those are not actual answers.

This is such a crucial distinction, and I wish more people understood it. If you're looking for answers or facts, do not ask AI. It can only tell you what some fairly random people feel are facts.

April 1, 2024 at 05:17 AM

On 3/31/2024 at 8:23 PM, Lu said:

if you ask an LLM a question, the text it produces is not the answer to that question, it's what the answer to the question would look like.

Sometimes I genuinely think that's what most (all?) human speech is also.

April 1, 2024 at 07:38 AM

Yesterday I asked chatGPT something like "Welche Hörnchen leben in Hangzhou am Ufer des Westsees?" (What squirrels are living in Hangzhou on the Westlake Border, but Hörnchen in German hast the double meaning squirrel and anything crescent shaped). The answer was something like "In Hangzhou leben Grauhörnchen." (In Hangzhou there are Grey squirrels). "Welche Art Grauhörnchen?" (What kind of Grey squirrels) "Chinesische Grauhörnchen." (Chinese Grey squirrels). If LLMs only give answers that look like an answer, it seems to me that chatGPT ist definitely more than only a LLM implementation.

But when answering to a question of an other user, maybe it should be indicated if an AI was used. But wouldn't it even be better to hint to the user to use an AI himself when looking for an answer? Then, if the AIs answer is surprising or incomplete, the results could be discussed in the Forum.

April 1, 2024 at 02:23 PM

On 4/1/2024 at 9:38 AM, Baihande said:

Yesterday I asked chatGPT something like "Welche Hörnchen leben in Hangzhou am Ufer des Westsees?" (What squirrels are living in Hangzhou on the Westlake Border, but Hörnchen in German hast the double meaning squirrel and anything crescent shaped). The answer was something like "In Hangzhou leben Grauhörnchen." (In Hangzhou there are Grey squirrels). "Welche Art Grauhörnchen?" (What kind of Grey squirrels) "Chinesische Grauhörnchen." (Chinese Grey squirrels). If LLMs only give answers that look like an answer, it seems to me that chatGPT ist definitely more than only a LLM implementation.

Yes, that looks exactly like an answer to your question! ChatGPT is well-built like that. I'm not saying LLMs never generate an answer that is correct; the answers mostly are mostly correct. But unless you already know the answer, you don't know when the text they generate is in fact the correct answer to your question, or a partial answer, or just something that looks convincing. In this way, it's pretty similar to Google Translate, which is also an impressive piece of programming and a useful tool for some useages.

April 1, 2024 at 02:55 PM

On 4/1/2024 at 10:23 PM, Lu said:

But unless you already know the answer, you don't know when the text they generate is in fact the correct answer to your question, or a partial answer, or just something that looks convincing.

Same with humans.

But answers on this website are from humans who have registered an account, which I think makes all the difference (even if I'm not 100% sure how or why).

April 1, 2024 at 02:59 PM

The humans presumably actually understand the question and usually intend to give a correct answer. And if you ask the human 'How do you know that?' the human usually has a real answer for that. (I'm a native speaker and I always say it that way; my teacher told me; I read it in book XYZ, which is a real book one can check; etc.) Humans are not perfect, but LLMs are one derivation away from humans and thus less perfect.

Sign In

AI Content Policy

Recommended Posts

mikelove

anonymoose

becky82

mikelove

Moshen

Tomsima

Lu

Lu

mikelove

becky82

cncorrect

Guest realmayo

Guest realmayo

Lu

Moshen

Guest realmayo

Baihande

Lu

Guest realmayo

Lu

Join the conversation