Win / KotakuInAction2
KotakuInAction2
Communities Topics Log In Sign Up
Sign In
Hot
All Posts
Settings
All
Profile
Saved
Upvoted
Hidden
Messages

Your Communities

General
AskWin
Funny
Technology
Animals
Sports
Gaming
DIY
Health
Positive
Privacy
News
Changelogs

More Communities

frenworld
OhTwitter
MillionDollarExtreme
NoNewNormal
Ladies
Conspiracies
GreatAwakening
IP2Always
GameDev
ParallelSociety
Privacy Policy
Terms of Service
Content Policy
DEFAULT COMMUNITIES • All General AskWin Funny Technology Animals Sports Gaming DIY Health Positive Privacy
KotakuInAction2 The Official Gamergate Forum
hot new rising top

Sign In or Create an Account

69
Facebook’s head of AI safety lost all her emails to an out of control OpenClaw 😂 (twitter.com)
posted 120 days ago by SophiesBoyfriend 120 days ago by SophiesBoyfriend +69 / -0
42 comments share
42 comments share save hide report block hide replies
You're viewing a single comment thread. View all comments, or full comment thread.
Comments (42)
sorted by:
▲ 4 ▼
– FellowCanuckIstan 4 points 120 days ago +4 / -0

From a machine base search:

Here's a summary of her public research and outreach: Published Research (key papers)

The WMDP Benchmark — measuring and reducing malicious use of AI through unlearning (2024)

LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet (2024)

Aligned LLMs Are Not Aligned Browser Agents (2025) — ironically relevant given the OpenClaw incident

A Careful Examination of LLM Performance on Grade School Arithmetic (2024)

Planning in Natural Language Improves LLM Search for Code Generation (2025)

Work published at NeurIPS and ICLR conferences, often in collaboration with Dan Hendrycks (Center for AI Safety).

Background

Computer Science + Economics from UPenn's Jerome Fisher M&T / Wharton program

Started in software engineering (YouTube Trust & Safety, Google Brain, DeepMind)

Founded Scale AI's SEAL lab — focused on private, tamper-proof LLM benchmarks (the SEAL Leaderboards)

Partnered with Center for AI Safety on the WMDP benchmark

Public Outreach

Scheduled speaker at SXSW 2025: "Beyond the Hype: Building Reliable and Trustworthy AI" Active on Twitter/X sharing research Now at Meta Superintelligence focusing on alignment

The OpenClaw incident is being widely covered today as a real-world example of the exact problems her research warns about — agentic AI losing context and acting beyond intended scope.

permalink save report block reply
▲ 4 ▼
– WeedleTLiar 4 points 120 days ago +4 / -0

I worry about this far more than some sort of malevolent AI; which is extremely unlikely so long as LLMs have no individual agency.

But these idiots are sprinting to turn over more and more important systems to LLM authority with very little or no control (or understanding) over the underlying "logic" of these systems.

God help us if they, in their infinite wisdom, decide to create a "smart" electrical grid or traffic system.

permalink parent save report block reply

Original 8chan Links to Gamer Gate:

.

The main GG discussion is on the videogames board: https://8chan.moe/v/

.

GamerGate archive is at https://8chan.moe/gamergatehq/

.

GamerGate Wiki:

https://ggwiki.deepfreeze.it/index.php/Main_Page

. . . . . .

. . . . . .

Rules:

.

ONE: Do not advocate for illegal violence or post other illegal activity. (Be aware of your local laws.)

.

TWO: Don't threaten, harass, or impersonate users. Also: don't be a psycho. New users will be held to a higher standard.

.

THREE: Do not post porn.

.

FOUR: NSFW/NSFL content must be flaired NSFW.

.

FIVE: No vote manipulation. Do not break communities.win's features.

.

SIX: No spam or reposts. Do not make more than 5 threads a day.

.

SEVEN: Do not post falsehoods and hoaxes that are obvious to an uncontroversial degree.

. . . . . .

. . . . . .

Moderation Logs:

.

(Two different versions, Scored has more features and is cleaner, but .win let's you see a few more details in certain instances.)

  • Scored
  • .win

Moderators

  • DomitiusOfMassilia
  • C
  • BandageBandolier
  • CarmenOfSandiego
  • The_Shadow_of_Intent
  • SocraticMethod1
  • Kienan
  • Smith1980
Message the Moderators

Terms of Service | Privacy Policy

2026.02.01 - 8wn6p (status)

Copyright © 2026.

Terms of Service | Privacy Policy