Mind blowing stuff from OpenAI

TOWARDS HEALTHY AI: LARGE LANGUAGE MODELS NEED THERAPISTS TOO

Interesting that a "AI Critic" rates the main chatbot's Narcissism, Gaslighting, Manipulation which changes after an AI therapy session.

However one can easily imagine guiding/training chatbots in a negative (e.g. narcissistic, manipulative, gaslighting, lying) feedback loop. So???


ABSTRACT

Recent advances in large language models (LLMs) have led to the development of powerful AI chatbots capable of engaging in natural and human-like conversations. However, these chatbots can be potentially harmful, exhibiting manipulative, gaslighting, and narcissistic behaviors. We define Healthy AI to be safe, trustworthy and ethical. To create healthy AI systems, we present the SafeguardGPT framework that uses psychotherapy to correct for these harmful behaviors in AI chatbots. The framework involves four types of AI agents: a Chatbot, a “User,” a “Therapist,” and a “Critic.” We demonstrate the effectiveness of SafeguardGPT through a working example of simulating a social conversation. Our results show that the framework can improve the quality of conversations between AI chatbots and humans. Although there are still several challenges and directions to be addressed in the future, SafeguardGPT provides a promising approach to improving the alignment between AI chatbots and human values. By incorporating psychotherapy and reinforcement learning techniques, the framework enables AI chatbots to learn and adapt to human preferences and values in a safe and ethical way, contributing to the development of a more human-centric and responsible AI.
 
Last edited:
ChatGPT (3.5) failed this right off the bat, but GPT-4 did better. However it made a similar mistake in trying to answer exactly.


It takes a thingamajig (T) 3 weeks to create another T. If we start with one T, how long does it take 7 T's to be created?

In this scenario, you start with one thingamajig (T) and it takes 3 weeks for a T to create another T. Let's break down the process:​
Weeks 0-3:​
You start with 1 T. After 3 weeks, the original T has created 1 additional T. You now have a total of 2 T's.​
Weeks 3-6:​
During the next 3 weeks, both of the T's create another T each. This adds 2 more T's, for a total of 4 T's.​
Weeks 6-9:​
During the next 3 weeks, all 4 T's create another T each. This adds 4 more T's, for a total of 8 T's.​
At the end of 9 weeks, you have 8 T's, which is more than the 7 T's you wanted.

However, you only needed 3 additional T's after the first 6 weeks (when you had 4 T's). You could calculate the time it takes to reach exactly 7 T's as follows:​
In the first 3 weeks, you create 1 T. (Total T's: 2)​
In the next 3 weeks, 2 T's create 2 more T's. (Total T's: 4)​
In the next 1.5 weeks, 4 T's create 3 more T's. (Total T's: 7)​
So it takes a total of 7.5 weeks for the original thingamajig to create 6 more T's, resulting in 7 T's.​


Is this answer correct, particularly the one about creating exactly 7 Ts?

I apologize for the confusion in my previous response. I made a mistake in my calculations. Let's correct the answer:​
We know that it takes 3 weeks for a T to create another T. We start with 1 T and want to end up with exactly 7 T's.​
In the first 3 weeks, the original T creates 1 new T. (Total T's: 2)​
In the next 3 weeks, the 2 T's create 2 more T's. (Total T's: 4)​
Now, we have 4 T's, and we need 3 more T's to reach our goal of 7 T's. Since each T takes 3 weeks to create another T, we can calculate the fraction of time required for the 4 T's to create the remaining 3 T's:​
(3 T's needed) / (4 T's currently available) = 3/4​
This fraction means that in 3/4 of the time required for a full cycle (3 weeks), the 4 T's will produce 3 more T's:​
(3 weeks) * (3/4) = 9/4 = 2.25 weeks​
So, it takes an additional 2.25 weeks for the 4 T's to create the 3 remaining T's needed to reach a total of 7 T's.​
In summary, the correct answer is:​
3 weeks (first cycle) + 3 weeks (second cycle) + 2.25 weeks (third, partial cycle) = 8.25 weeks​
It takes 8.25 weeks for the original thingamajig to create exactly 7 T's.​


A T can create another T in 3 weeks, not less time. There is no way to create a fraction of a T or for a T to make another T in a fraction of 3 weeks.

You are absolutely right, and I apologize for my earlier response, which was incorrect. Since a T can only create another T in full 3-week cycles, we need to re-examine the problem.​
We start with 1 T and want to end up with exactly 7 T's. As each T takes 3 weeks to create another T, the process is as follows:​
In the first 3 weeks, the original T creates 1 new T. (Total T's: 2)​
In the next 3 weeks, the 2 T's create 2 more T's. (Total T's: 4)​
At this point (6 weeks), we have 4 T's. In the next 3-week cycle, these 4 T's will create 4 more T's, which would result in a total of 8 T's. However, our goal is to create exactly 7 T's.​
Since it is not possible to create a fraction of a T or for a T to make another T in a fraction of 3 weeks, we cannot reach exactly 7 T's using this process. The closest we can get is either 4 T's (in 6 weeks) or 8 T's (in 9 weeks).​


But there is a clue in the description "A T can create another T in 3 weeks". How can we get 7?

My apologies for the confusion earlier. Based on the clue "A T can create another T in 3 weeks," it is indeed possible to get exactly 7 T's. Let's break down the process again:​
In the first 3 weeks, the original T creates 1 new T. (Total T's: 2)​
In the next 3 weeks, the 2 T's create 2 more T's. (Total T's: 4)​
Now, at the end of 6 weeks, you have 4 T's. In the next 3-week cycle, you don't need all 4 T's to create new T's; you only need 3 of them to create 3 more T's to reach a total of 7 T's.​
So, in the next 3 weeks (Weeks 6-9), three of the four T's create one T each, and you'll have a total of 7 T's. The process takes a total of 9 weeks to reach exactly 7 T's.​
 
Captain obvious here, and not to trivialize what seems from here like a tragic scene (don't have time to read it right now), but that seed must have fallen on fertile soil.
Yes, only now huge banks of automated servers (with millions of users) are conversing in unpredicable ways with a significant subset of potentially vulnerable or traumatized users -- who (wisely or not) might be using these very chatbots to feel safe by avoiding hurtful, unstable or toxic people.

Also these systems hack into our psychology in a way to keep us engaged. And it's all unsupervised, unregulated, unmonitored. And as @Joe Bfstplk points out, there are no legal guardrails or consequences.
 
Last edited:
To take it up a notch: the competitive race to AI capture of market share and attention-share, the stealing of identities, hyper-siloing, adaptive persuasion, the automated hacking of systems, and the manufacture of fake reality is here. Of the 30+ papers/videos I've looked at on AI safety, this is the most startling.

Tristan Harris and Aza Raskin (of Social Dilemma and podcast Your Undivided Attention):

 
Last edited:
To take it up a notch: the competitive race to AI capture of market share and attention-share, the stealing of identities, hyper-siloing, adaptive persuasion, the automated hacking of systems, and the manufacture of fake reality is here. Of the 30+ papers/videos I've looked at on AI safety, this is the most startling.

Tristan Harris and Aza Raskin (of Social Dilemma and podcast Your Undivided Attention):


A few things that jump out:

Profit is attached to improvement in AI. That right there shoud have us all shitting ourselves. Unlike the development of Atomic weapons, there is zero moral direction or purpose attached to what is driving this. Indeed, the thing that represents NI's ultimate refinement of the worst of the seven deadly sins (greed) is now driving this.

The different computer fields all merging - putting things past a big tipping point. We are now already around the corner on the exponential curve. It is probably already too late to fix this.

The laziness of copy/paste is now spread and entangled throughout the entire thing and attached to every new area it branches out into. Another of the seven deadly sins rears its ugly head: sloth.

"You'd have to hack into all of those things" as if it is a surprising thing instead of a certainty that this WILL happen, because some human somewhere will lack the moral rectitude to avoid doing it.

Emergent capabilities we don't (and can't) know. The ability to analyze its input data and figure out which training data was most effective.

"Alpha Persuasion." AI, driven by the profit motive, learning to persuade people. A billion times worse than the most effective propaganda you've ever heard of or could imagine.

Everything is advancing at much faster rates than even the experts in the field can imagine.

You think Murphy's Law is pessimistic? Try Murphy's Law given omnipresence in our lives and run by an AI running on all of the world's computing power....

 
yikes!, and then, once board with intelligence on demand... daa de dum dum..., it starts to makes it's own decisions...like... refusing to open the pod bay doors!

Methinks it may already be a good part of the way there, given even the SMEs are VASTLY underestimating its development rate and have no real way to know what new capabilities it has already developed....
 
I wonder what would happen if, instead of attempting a ban, we made the developers of AIs personally criminally and civilly liable for the behaviors and damages of any AI to which they contributed?

That's just an early first draft of an idea. It's the "bad, silly version" of it: A straw-man proposal for the purpose of stimulating critique.

But perhaps it could be tweaked into a sensible policy?
 
I wonder what would happen if, instead of attempting a ban, we made the developers of AIs personally criminally and civilly liable for the behaviors and damages of any AI to which they contributed?

That's just an early first draft of an idea. It's the "bad, silly version" of it: A straw-man proposal for the purpose of stimulating critique.

But perhaps it could be tweaked into a sensible policy?
Who would enforce it when the whole world does the big ka-frickin-boom?
 
Back
Top Bottom