论文部分内容阅读
Formally, we provide a generic analysis on the opportunistic spectrum access problem by casting the problem into the restless multi-armed bandit (RMAB) problem, one of the most well-known generalizations of the classic multi-armed bandit (MAB) problem, which is of fun- damental importance in stochastic decision theory. Despite the significant research efforts in the field, the RMAB problem in its generic form still remains open. Until today,very little result is reported on the structure of the optimal policy. Obtaining the optimal policy for a general RMAB prCognitive radio, first envisioned by Mitola, is the key enabling technology for future genera- tions of wireless systems that addresses critical challenges in spectrum efficiency, interference management, and coexistence of heterogeneous networks. The core concept in c:ognitive radio networks is opportunistic spectrum access, whose objective is to solve the imbalance between spectrum scarcity and spectrum under-utilization.
In the thesis, we address the fundamental problem of opportunistic spectrum access in a multi-channel communication system. Specifically, we consider a conununication system in which a user has access to multiple channels, but is limited to sensing and transmitting only on part of them at a given time. We explore how the smart user should exploit past observations and the knowledge of the stochastic properties of these channels to maximize its transmission rate by switching channels opportunistically.oblem is often intractable due to the exponential computation complexity. Hence, a natural alternative is to seek a simple myopic policy maximizing the short-term reward.
We start by conducting a generic analysis in Chapter 3 on the optimality of the myopic sensing policy where the user senses more than one channel each time and gets one unit of reward if at least one of the sensed channels is in the good state. Through mathematical analysis, we show that the myopic sensing policy is optimal only for a small subset of cases where the user is allowed to sense two channels each slot. In the general case, we give counterexamples to illustrate that the myopic sensing policy is not optimal.
Motivated by the above analysis, we then study the following natural while fundamentally important, question in Chapter 4 (for the homogeneous system consisting of i.i.d. channels) and Chapter 5 (for the heterogeneous system consisting of non i.i.d. channels): under what conditions is the myopic policy guaranteed to be optimal? We answer the above posed question by performing an axiomatic study. More specifically, we develop three axioms characterizing a family of functions which we refer to as regular functions, which are generic and practically important. We then establish the optimality of the myopic policy when the reward function can be expressed as a regular function and the discount factor is bounded by a closed-form threshold determined by the reward function. We also illustrate how the derived results, generic in nature, are applied to analyze a class of RMAB problems arising from multi-channel opportunistic access.
In Chapter 6, we further investigate the more challenging problem where the user has to decide the number of channels to sense iri each slot in order to maximize its utility (e.g., throughput). We formulate t.he corresponding optimization problem which hinges on the fol- lowing tradeoff between exploit.ation and explorat.ion: sensing more channels can help learn and predict the future channel state, thus increasing the long-term reward, but at the price of sacrificing the reward at current slot as sensing more channels reduces the time for data t.ransmission, thus decreasing the throughput in the current slot. Therefore, to find the optimal uumber of channels to sense consists of striking a balance between the above exploitation and exploration. After showing the exponential complexity of the problem, we develop a heuristic v-step look-ahead strategy which consists of sensing channels in a myopic way and stopping sensing when the expected aggregated utility from the current slot t to slot t+v begins to de- crease. In the developed strategy, the parameter v allows to achieve a desired tradeoff between social efficiency and computation complexity. We demonstrate the benefits of the proposed strategy via numerical experiments on several typical settings.
Finally, Chapter 7 concludes the thesis and outlines several important future research di- rections in this field. Note that despite the focus of this thesis in the domain of opportunistic communication, the problem formulation is applicable in many other engineering fields such as communication jamming, scheduling and object tracking. Hence the results presented in t,his t.hesis are generically applicable in a large range of domains beyond the scope of opportunistic spectrum access.