(ldquogoodrdquo or ldquobadrdquo) of each channel evolves as independent and
identically distributed (iid) Markov processes. A user, with limited channel sensing capability,
chooses one channel to sense and decides whether to use the channel (based on the
sensing result) in each time slot. A reward is obtained whenever the user senses and
accesses a ldquogoodrdquo channel. The objective is to design a channel selection policy …