shiza (@shizacharania) 's Twitter Profile
shiza

@shizacharania

finding the optimal policy for my perceived reward function | engineering, prev ai research

ID: 1454568176528896006

linkhttps://shizacharania.com calendar_today30-10-2021 21:57:48

395 Tweet

701 Followers

1,1K Following

shiza (@shizacharania) 's Twitter Profile Photo

many people say, let's stop capabilities research and work on alignment before further advancing models. what would incentivize a company to switch if other companies that are competition are also doing capabilities research. you can't just get all the companies to stop, right?

shiza (@shizacharania) 's Twitter Profile Photo

check out my march and april newsletter: - working alongside a company on a consulting challenge - speaking at and attending World Summit AI - my exploration into climate resilient housing preview.mailerlite.com/f5r9a0w1s4/221…

check out my march and april newsletter:
- working alongside a company on a consulting challenge
- speaking at and attending World Summit AI
- my exploration into climate resilient housing

preview.mailerlite.com/f5r9a0w1s4/221…
shiza (@shizacharania) 's Twitter Profile Photo

no matter how many regulations you have, alignment requires technical effort- this is where the real struggle is. deception and other alignment related problems can’t be solved with the government or companies helping with regulation, even though regulation is still important

shiza (@shizacharania) 's Twitter Profile Photo

my plan is to start putting out research paper videos in a few weeks - good forcing function to get more technical, explain better, put myself out there, and test the math AI knowledge I've been learning :)

shiza (@shizacharania) 's Twitter Profile Photo

how much does awareness even do if we already know that this is a problem? kind of frustrating that I see this happening with issues like climate change too (we already know this is a problem, the question is what do we do now?)

shiza (@shizacharania) 's Twitter Profile Photo

this summer, I spent 4 weeks in Kenya w ppl from all around the world to sum up my key insights (actualization of neglected problems, how low levels of diversity -> more stigma, service site project, and technological trends), check out my Substack post: propagatingforward.substack.com/p/i-know-that-…

shiza (@shizacharania) 's Twitter Profile Photo

I recently built a dual-axis solar panel (with arduino) that optimizes its orientation to get the most amount of sunlight possible. check it out: youtu.be/hsfcWSR9iwc

Richard Ngo (@richardmcngo) 's Twitter Profile Photo

Earlier this year I helped organize the SF Alignment Workshop, which brought together top alignment and mainstream ML researchers to discuss and debate alignment risks and research directions. There were many great talks, which we’re excited to share now - see thread.

Earlier this year I helped organize the SF Alignment Workshop, which brought together top alignment and mainstream ML researchers to discuss and debate alignment risks and research directions. There were many great talks, which we’re excited to share now - see thread.
Dan Hendrycks (@danhendrycks) 's Twitter Profile Photo

AI systems can be deceptive. For example, Meta's AI that plays Diplomacy was designed to build trust and cooperate with humans, but deception emerged as an subgoal instead. Our survey on AI deception is here: arxiv.org/abs/2308.14752

AI systems can be deceptive.
For example, Meta's AI that plays Diplomacy was designed to build trust and cooperate with humans, but deception emerged as an subgoal instead.

Our survey on AI deception is here: arxiv.org/abs/2308.14752
Chamath Palihapitiya (@chamath) 's Twitter Profile Photo

This is the best, most interesting and saddest (!) presentation I have witnessed at any conference. Hard to be all three. Bill Gurley torches the political class and calls out some superior forms of corruption he’s witnessed during his business career. Must. Watch.

shiza (@shizacharania) 's Twitter Profile Photo

been working on understanding the math behind ML (linear algebra + calculus) and put out two videos based on what I've learned so far: youtube.com/playlist?list=… next up, I'll be implementing ML models from scratch + exploring interpretability and adversarial networks with CNNs

Jeffrey Ladish (@jeffladish) 's Twitter Profile Photo

It's not so much p(doom), the probability that we will all die from AI It's p(human control), the probability that humans will stay in control of systems much smarter than us People seem very optimistic they can hold power over systems that will be thinking faster, learning

Anthropic (@anthropicai) 's Twitter Profile Photo

AI assistants are trained to give responses that humans like. Our new paper shows that these systems frequently produce ‘sycophantic’ responses that appeal to users but are inaccurate. Our analysis suggests human feedback contributes to this behavior.

AI assistants are trained to give responses that humans like. Our new paper shows that these systems frequently produce ‘sycophantic’ responses that appeal to users but are inaccurate. Our analysis suggests human feedback contributes to this behavior.
stevengongg (@stevengongg) 's Twitter Profile Photo

Had lots of fun building this tiny robot over the weekend at the microbots hackathon with vincent dubnubdubnub shiza !! We built a tiny differential drive robot that can be tracked through apriltags and move cubes from point A to point B with a probe we didn’t quite