next up previous
Next: About this document ...

23 March 2008

University of Cincinnati
Department of Electrical & Computer Engineering and Computer Science


20 ENFD 112 - Fundamentals of Programming

\framebox{\sc Laboratory 8: Random Variables, Functions}

Spring 2008


{\bb 1. Objective}


The objective is to practice the use of MATLAB random variables to better understand random variables, distributions, and their use in engineering problems.



{\bb 2. Background}


A random variable $x$ is a variable that can take any one of a set of values according to some probability distribution. In this lab we will make things very simple (no integration, for example) and suppose that random variables only take integer values so that in what follows, mention of $x$ without modifiers implicitly means $x$ is considered to be a random integer variable. We use the notation $Pr(x
= v)$ to denote the probability that $x$ has value $v$. If $V$ is the complete set of values that $x$ can take, we must have

$\displaystyle \sum_{v\in V}Pr(x = v) = 1$     (1)

which means it is certain that $x$ takes some value from its allowed set $V$. If $x$ takes all values in $V$ with equal probability, then we say that $x$ is uniformly distributed over $V$. For example, if $x$ is uniformly distributed over $V=\{1,2,...,1000\}$, then $Pr(x =
v) = 1/1000$ for any $1\leq v\leq 1000$ and $Pr(x = v) = 0$ for any $v < 1$ and any $v > 1000$.

Two important parameters of distributions are the mean and the standard deviation. The mean tells us something about the value we might expect for $x$ the next time it is assigned a value. The standard deviation tells us something about how close to the mean the next value is going to be. The mean of $x$ is defined as

\begin{displaymath}
\mu_x = \sum_{v\in V}v\cdot Pr(x = v)
\end{displaymath}

where we use $\mu_x$ as a shorthand representation of this sum in this and following labs. The standard deviation is defined as
$\displaystyle \sigma_x = \sqrt{\sum_{v\in V}(x-\mu_x)^2Pr(x = v)}.$     (2)

Think of $\sigma_x$ as the mean value of the random variable $(x-\mu_x)^2$. It is easy to see that if $x$ has a high probability of taking a value very close to $\mu_x$, then $\sigma_x$ is not going to be high.

It is straightforward to compute $\mu_x$ and $\sigma_x$ if $x$ is uniformly distributed between numbers $a$ and $b$ (that is, $V=\{a,(a+1),...,(b-1),b\}$). From equation (1) and the fact that there are $b-a+1$ numbers in $V$, $Pr(x = v) = 1/(b-a+1)$. Therefore, $\mu_x = \sum_{v\in V}v\cdot Pr(x=v) =
\sum_{i=a}^bi/(b-a+1)$. The sum $\sum_{i=a}^bi$ written out is

$\displaystyle a+(a+1)+(a+2)+...+(b-2)+(b-1)+b$     (3)

To see what this is, just pair the left $a$ with the right $b$ to get $(a+b)$, then pair the 2nd left $(a+1)$ with the 2nd right $(b-1)$ to get $(a+b)$ again. Pairing 3rd left and 3rd right terms gives $(a+b)$ again. If $V$ has an even number of numbers, We can do this pairing $(b-a+1)/2$ times, each time adding $(a+b)$. So, in total, we add $(a+b)*(b-a+1)/2$ if $\vert V\vert$ is even. If $\vert V\vert$ is odd, we get $(a+b)*(b-a)/2$ for the pairing total but there is a middle, unpaired term of $(a+b)/2$. Hence the total for $\vert V\vert$ odd is $(a-b)*(b-a+1)/2$ which is the same as for the case $\vert V\vert$ even. Therefore, the sum (3) is $(a+b)(b-a+1)/2$ and
$\displaystyle \mu_x = \sum_{v\in V}{i\over b-a+1} = {(a+b)\cdot (b-a+1)\over
2(b-a+1)} = {a+b\over 2}.$     (4)

In other words, $\mu_x$ is right in the middle of the range of values $x$ can take.

What about the standard deviation? If $x$ is uniformly distributed between $a$ and $b$ the standard deviation, from equations (2) and (4), is

\begin{eqnarray*}
\sigma_x = \sqrt{\sum_{i=1}^b\left(\left(i-{b+a\over 2}\right)^2\cdot {1\over b-a+1}\right)}
\end{eqnarray*}

A closed form equivalent expression can be worked out for $\sigma_x$ as was done for $\mu_x$. The result is $\sigma_x = \vert(b-a)\vert/\sqrt{12}$.

Now consider another distribution, called the triangle distribution. If $x$ has the triangle distribution from $a$ to $b$, then

\begin{eqnarray*}
Pr(x = v) = {2(x-a+1)\over (b-a+2)(b-a+1)}.
\end{eqnarray*}

In other words, the probability that $x$ has value $v$ rises linearly from $a$ until $b$ and is 0 outside that range. It is not hard to see that $\sum_{i=a}^bPr(x = i) = 1$ as required. It is also straightforward to compute $\mu_x = a+2(b-a)/3$. A value for $\sigma_x$ can be computed but the result is a rather complex expression and, since we will not need it, we will not show it.

Up to now we have considered integer random variables to avoid heavy duty math you may not have seen yet. But all the results above are similar to results obtained if $x$ were allowed to take non-integer values as well. So, from now on we lift the restriction of integer values.

For many applications the Gaussian distribution is the most important distribution available to us. Avoiding the very complicated definition of the Guassian distribution, it suffices for this lab to state the following amazing properties it has:

  1. It is completely characterized by its mean and its standard deviation. In other words, all I have to say is Gaussian with parameters $\mu_x$ and $\sigma_x$ and you know exactly what distribution I am talking about.
  2. In most cases, the sum of a collection of random variables taken from any distribution, integer or real-valued, tends to a Gaussian distribution.
It is the latter property, better known as the Central Limit Theorem, that makes the use and study of Gaussian distributions so important. We will investigate this more closely in this lab.

{\bb 3. Problems}


There are six problems.



{\bt 3.1 Uniform distribution}


Write an m-file that asks for a mean ($\mu_x$) and then generates $1000\cdot\mu_x$ random real values from the uniform distribution with $a=0$ and $b=2*\mu_x$. Use the MATLAB built-in function rand to generate the values. Create an array v of buckets so that, at the end, v(i) contains the number of values generated between the integer $i$, exclusive, and the integer $i+1$, inclusive. Then plot the result (the $y$ coordinates are the numbers in v(i) and the $x$ coordinates are bucket indices i). Make sure the $y$ axis takes a range of values from 0 to 1.2 times the maximum of v(i) over all i. See Figure 1 for an example output where the mean was set to 100.

\begin{figure}\centerline{\epsfysize=2.2in\epsfbox{Fig/sim1.ps}
\hspace*{6mm}\ep...
...t for problem 3.1.
\hspace*{22mm}Figure 2: Output for problem 3.2.}
\end{figure}



{\bt 3.2 Triangle distribution}


Repeat problem 3.1 for the triangle distribution. The distribution rises linearly to $(3/2)\cdot \mu_x$. Example output is shown in Figure 2 for a mean of 100. This is trickier than problem 3.1 because there is no triangle distribution generator of values as there was for the uniform distribution. But we can use the uniform generator to get triangle values using the following transformation:

\begin{displaymath}
v = (3/2)\cdot \mu_v\cdot \sqrt{x}
\end{displaymath}

where $x$ is uniformly distributed between 0 and 1, and $v$ is distributed according to the triangle distribution with mean $\mu_v$.

Major hint: Where you used $rand(1)*2*\mu_x$ in the first problem, now use $(3/2)*sqrt(rand(1))*\mu_v$.



{\bt 3.3 Gaussian distribution}


Repeat problem 3.1 for the Gaussian distribution except the input should include standard deviation. Example output is shown in Figure 3 for a mean of 100 and standard deviation 50. Use the MATLAB function randn which generates a normal distribution with mean 0 and standard deviation 1. To translate to a Gaussian distribution of mean $\mu_v$ and standard deviation $\sigma_v$ use:

\begin{displaymath}
v = \mu_v + \sigma_v\cdot x
\end{displaymath}

where $x$ has a normal distribution, and $v$ is distributed according to the Gaussian distribution with mean $\mu_v$ and standard deviation $\sigma_v$.

\begin{figure}\centerline{\epsfysize=2.2in\epsfbox{Fig/sim3.ps}
\hspace*{6mm}\ep...
...$.
\hspace*{8mm}Figure 6: Problem 3.5: 25 values, $\mu_{v_i}=400$.}
\end{figure}

Major hint: Just like before, where you used something like $rand(1)*2*\mu_x$, now use $\sigma_v*randn(1)+\mu_v$.



Major hint: $randn(1)$ gives a value from minus infinity to plus infinity. Therefore the vector you are keeping statistics in will never be able to hold all the values generated. The solution is an $if$ statement that check random values before you attempt to store them. For example,

   if rand_value > 2*mean
      high = high + 1;
   elseif rand_value < 1
      low = low + 1;
   else
      v(rand_value) = v(rand_value) + 1;  % This is where the stats are kept
   end
   ...
   disp('[High error=' num2str(high/npts) ' low error=' num2str(low/npts)]);



{\bt 3.4 Sum of uniformly distributed values}


Repeat problem 3.1 except this time each recorded value is the sum of values of uniformly distributed random values. Input from the user should be the number of values to sum and the mean of each of those values. Make several runs and generate graphs such as those shown in Figure 4, where the number of values summed is 100 and the mean of each value is 1, and Figure 5, where the number of values summed is 10 and the mean of each is 10. Observe that for larger means the distribution of the sum approaches that of a Gaussian distribution. Experiment to find the relationship between the mean and standard deviation of the resulting Gaussian distribution and the number of values summed and their means.



                                                                                                                              

Major hint: You might need an outside for loop to run the experiments and an inside for loop to sum a bunch of random variables.



{\bt 3.5 Sum of triangle distributed values}


Repeat problem 3.4 except this time the values to sum are distributed according to the triangle distribution. Input from the user should be the number of values to sum and the mean of each of those values. Figure 6 shows output when summing 100 values, each of mean 100. What is the relationship between the mean and standard deviation of the Gaussian distribution and the number of summed values and mean of each?



                                                                                                                              

Major hint: Just a straightforward change to the solution of the previous problem is needed.

{\bt 3.6 Craps}


Craps is played with 2 dice. Each die is a small cube, marked on its faces with spots from one to six. Craps is played as a sequence of betting rounds. The first roll of the dice in a betting round is called the come out roll. If the sum of the spots on the come out roll is 2, 3, 7, 11, or 12, the round is over and a new betting round begins. Otherwise, the sum (one of 4, 5, 6, 8, 9, 10) is called the point and subsequent rolls follow until either point is made again or a 7 is rolled. In either case the betting round ends.

The following enumerates some of the types of bets that can be placed in a betting round:

  1. Pass Line Bet: You win if the come out roll is a natural (7, 11) and lose if it is craps (2, 3, 12). If a point is rolled (4, 5, 6, 8, 9, 10) it must be repeated before a 7 is thrown in order to win. If 7 is rolled before the point you lose.
  2. Come Bet: It has the same rules as the Pass Line bet. However, you can make this bet only after the point on the pass line has been determined. After you place the bet, the first roll will set the come point. You win if it is a natural (7, 11) and lose if it is craps (2, 3, 12). Other rolls will make you a winner if the come point is repeated before a 7 is rolled. If a 7 is rolled first you lose.
  3. Don't Pass Line Bet: This is the reversed Pass Line bet. If the first roll of a dice is a natural (7, 11) you lose and if it is a 2 or a 3 you win. A dice roll of 12 means you have a tie or push with the casino. If the roll is a point (4, 5, 6, 8, 9, 10) a 7 must come out before that point is repeated to make you a winner. If the point is rolled again before the 7 you lose.
  4. Don't Come Bet: The reversed come bet. After the come point has been established you win if it is a 2 or 3 and lose for 7 or 11. A roll of 12 is a tie and other dice rolls will make you win only if a 7 appears before them on the following throws.
For each of these bet types, estimate the expected payoff (or loss) for repeatedly betting $1 in a round. Estimate the average gain or loss per round. Plot a history of gain/loss for each betting type. The plot should look something like that shown in Figure 7. Assume the dice are completely fair and that the uniform random number generator of MATLAB is perfect.

Suppose we wish to manufacture loaded dice that will result in a positive gain over a long period. We want the dice to be designed so it is difficult for a roller to tell the dice are loaded. Find a probability distribution on the numbers from 1 to 6 of a die that will do this for pass line bets. Assume the same probability change for all dice. If $p_{max}$ is the highest probability that a die lands on a particular number and $p_{min}$ is the lowest, make $p_{max}-p_{min}$ as small as possible. Figure 8 shows a history where the $p_{max}-p_{min}=.024$ See if you can beat this (you should have no problem). Write your probabilities in the table below:

Number Probability Number Probability Number Probability
1          2          3         
4          5          6         

\begin{figure}\centerline{\epsfysize=2.2in\epsfbox{Fig/sim7.ps}
\hspace*{6mm}\ep...
...fil
\hspace*{6mm}Figure 8: Loaded dice, $p_{max}-p_{min} = 0.024$.}
\end{figure}



{\bb 4. Coding Help}


At the MATLAB prompt type 'help rand' and 'help randn' to get help on using the uniform and Gaussian random number generators. The function rand returns a number between 0 and 1. If you need a single uniform random number, say r, between numbers a (low) and b (high) use 'r = (b-a)*rand(1)+a;'. But r will not be an integer. If r must be an integer between integers a and b, inclusive, use 'r = floor((b-a+1)*rand(1))+a;'. The function randn returns a Normally distributed random value with mean 0 and standard deviation 1. To create Gaussian values with mean m and standard deviation s use 'r = s*randn(1)+m;'.

In problems 3.1-3.5 you will need to keep track of the number of trials with values in particular intervals. This can be done with an array. You can initialize the array, say stats of n elements, to all zeros using something like this: 'stats = zeros(1,n);'. Then, if a trial has a real value v, you can do something like this: 'stats(floor(v)) = stats(floor(v)) + 1;'. When finished, you can 'plot(stats)'.

When plotting, you can set the limits on the $x$ and $y$ axis printed using the axis function. Use 'help axis' to see how to do this.

Although problems 3.1-3.5 require five separate pieces of code, there is little difference between them. Moreover, the number of lines of each piece, including all input, end, plot, axis, and so on, statements, should be no greater than 20.

For problem 3.6, you might want to develop a getRoll function that returns an integer from 1 to 6. This allows convenient adjustment of the probabilities of returning the numbers without affecting any other code that you write. The getRoll function is simply an if-then-else statement with six tests: each test looks, for example, like 'if 3/6 <= n & n < 4/6 + 0.023'.

The getRoll function can be called by a function called getRounds which returns a cell array, each cell containing a vector of integers representing rolls during a single legal betting round. Use the getRoll like this: 'roll = getRoll + getRoll;'. A round can be constructed using an array, say tmp which may be initialized like this: 'tmp = [];'. Then each roll is added to tmp like this: 'tmp = [tmp roll];'. If getRounds is to create n rounds, a round can be saved in a cell array like this: 'rndsi = tmp;' where i is a for index variable. Then rnds can be returned (that is, the function line of getRounds should look something like this: 'function rnds = getRounds(n);').

A function which plots a bet history need only take a cell array, say rnds, as input and implement a loop, for example, 'for i=1:length(rnds)', the body of which makes the appropriate betting checks. For example, in the case of pass line betting, let 'v = rnds(i);' and test like this:

   if v(1) == 7 | v(1) == 11 | length(v) > 1 & v(length(v)) == 7
All bet placing functions perform similar actions and make similar tests.



5. Submission


Submit six m files which solve the six problems of Section 3 on or before June 5 using blackboard. See the course webpage at

http://gauss.ececs.uc.edu/Courses/HTML/E112.html
for instructions.




next up previous
Next: About this document ...
John Franco 2008-05-29