lecture 12 learning by reinforcement 2973391