value alignment problem
The problem of AI alignment is generally accepted as the challenge of ensuring that we produce AI that is aligned with human values.
For example, if an AGI (Artificial General Intelligence) ever did develop at some point in the future would it do what we (humans) wanted it to do?
Would/could any AGI values ‘align’ with human values?
What are human values, in any case?
The argument might be that AI can be said to be aligned with human values when it does what humans want, but...
Will AI do things some humans want but that other humans don’t want?
How will AI know what humans want given that we often do do what we want but not what we ‘need’ to do?
And – given that it is a superintelligence - what will AI do if these human values conflict with its own values?
In the notorious thought experiment AI pioneer Eliezer Yudkowsky wonders if we can specifically prevent the creation of superintelligent AGIs like the paperclip maximizer?
In the paperclip maximizer scenario a bunch of engineers are trying to work out an efficient way to manufacture paperclips, and they accidentally invent an artificial general intelligence.
This AI is built as a super-intelligent utility-maximising agent whose utility is a direct function of the amount of paperclips it makes.
So far so good, the engineers go home for the night, but by the time they’ve returned to the lab the next day, this AI has copied itself onto every computer in the world and begun reprogramming the world to give itself more power to boost its intelligence.
Now, having control of all the computers and machines in the world, it proceeds to annihilate life on earth and disassembles the entire world into its constituent atoms to make as many paperclips as possible.
The problem is called ‘value alignment’ because we want to ensure that its values align with ‘human values’.
Because building a machine that won’t eventually come back to bite us is a difficult problem.
Determining a consistent shared set of human values we all agree on is obviously an almost impossible problem.
The Facebook/Cambridge Analytics kerfuffle ‘exposed’ this weekend by the Guardian and New York Times is an example.
The Guardian are outraged because ‘It’s now clear that data has been taken from Facebook users without their consent, and was then processed by a third-party and used to support their campaigns’
Ya think?
In fact CA just cleverly used the platform for what it was ‘designed’ for.
This is exactly what Don Marti nicely captured as ‘the new reality… where you win based not on how much the audience trusts you, but on how well you can out-hack the competition.
Extremists and state-sponsored misinformation campaigns aren’t “abusing” targeted advertising. They’re just taking advantage of a system optimized for deception and using it normally.’
And are the Guardian and NYT outraged because parties who’s values don’t align with theirs out-hacked them?
After all, back in 2012 The Guardian reported with some excitement how Barack Obama's re-election team built ‘a vast digital data operation that for the first time combined a unified database on millions of Americans with the power of Facebook to target individual voters to a degree never achieved before.’
Whoever can build the best system to take personal information from the user wins, until it annihilates life on the internet and disassembles the entire publishing world into its constituent atoms.
Is data-driven advertising going to be the ad industry’s own paperclip maximizer?
Any AGI is a long way off but in a more mundane sense we already have an alignment problem.
And this only helps deceptive sellers.