Statistika For Life: 25 new messages in 8 topics

sci.stat.math
http://groups.google.com/group/sci.stat.math?hl=en

sci.stat.math@googlegroups.com

Today's topics:

==============================================================================
TOPIC: Combinatorial probability problem
http://groups.google.com/group/sci.stat.math/browse_thread/thread/d26220a7e3943029?hl=en
==============================================================================

== 1 of 3 ==
Date: Thurs, Nov 15 2007 1:09 am
From: Ray Koopman

On Nov 14, 4:08 am, John Uebersax <jsueber...@gmail.com> wrote:
> Would someone kindly tell me what is the *formula* to answer this
> question:
>
> An urn contains red, blue, and green balls, in equal proportions.
> Drawing four balls (with replacement), what is the probability
> that at least one color will not be represented among the four.
>
> Thanks in advance. (No, this is not a homework problem ;) )
> --
> John Uebersax PhD
> http://ourworld.compuserve.com/homepages/jsuebersax/agree.htm

Let k = the # of colors, let n = the # of draws, and let [x1,...,xk]
denote the number of times each color appeared. Sum xj = n.
Then the usual multinomial probability formula, with all pj = 1/k,
gives P[x1,...,xk] = n!/(k^n Prod xj!).

The probability that at least one color is not represented is the
sum of the probabilities of all possible outcome vectors that have
at least one zero; or, alternatively, one minus the sum of the
probabilities of all possible outcome vectors that have no zeros.

That's about as close as you're likely to get to a general *formula*.

Here's an example for k = 4, n = 6. The "patterns" are all possible
nonincreasing sequences of k non-negative integers that sum to n.

pattern # of distinguishable permutations
6 0 0 0 4
5 1 0 0 4*3
4 2 0 0 4*3
4 1 1 0 4*3
3 3 0 0 4C2
3 2 1 0 4!
3 1 1 1 4
2 2 2 0 4
2 2 1 1 4C2

6! 4 4C2 195
P[no zeros] = ---*(----------- + -----------) = ---
4^6 3!*1!*1!*1! 2!*2!*1!*1! 512

== 2 of 3 ==
Date: Thurs, Nov 15 2007 1:22 am
From: jos jansen

John Uebersax schreef:
> On Nov 14, 3:48 pm, iandjmsm...@aol.com wrote:
>> On 14 Nov, 14:14, John Uebersax <jsueber...@gmail.com> wrote:
>>
>>
>>
>>
>>
>>> Thank you Ian.
>>> Two questions:
>>> I seem, perhaps mistakenly, to enumerate 15 possible combinations, of
>>> which only 3 include all
>>> colors, or P = 3/15. Am I overlooking something obvious:
>>> R B G
>>> -----
>>> 4 0 0
>>> 3 1 0
>>> 3 0 1
>>> 2 2 0
>>> 2 0 2
>>> 2 1 1
>>> 1 3 0
>>> 1 2 1
>>> 1 1 2
>>> 1 0 3
>>> 0 4 0
>>> 0 3 1
>>> 0 2 2
>>> 0 1 3
>>> 0 0 4
>>> 2. What is the corresponding probability/formula for all colors being
>>> represented given 5 balls drawn instead of 4?
>>> Thanks,
>>> John Uebersax
>>> On Nov 14, 2:06 pm, iandjmsm...@aol.com wrote:
>>>> On 14 Nov, 12:08, John Uebersax <jsueber...@gmail.com> wrote:
>>>>> Would someone kindly tell me what is the *formula* to answer this
>>>>> question:
>>>>> An urn contains red, blue, and green balls, in equal proportions.
>>>>> Drawing four balls (with replacement), what is the probability that at
>>>>> least one color will not be represented among the four.
>>>>> Thanks in advance. (No, this is not a homework problem ;) )
>>>>> --
>>>>> John Uebersax PhDhttp://ourworld.compuserve.com/homepages/jsuebersax/agree.htm
>>>> The numbers of red, green and blue balls will have a multi-nomial
>>>> distribution.
>>>> The only way you will have all 3 colours represented is if there are 2
>>>> reds, 1 green and one blue ball or 1 red, 2 greens and one blue ball
>>>> or 1 red, 1 green and two blue balls.
>>>> The probabality is 3*4!/(2!*1!*1!)*(1/3)^3
>>>> The probability of at least one colour not being selected is therefore
>>>> 5/9.
>>>> Ian Smith- Hide quoted text -
>>> - Show quoted text -
>> The probabilities are
>>
>> R B G
>> -----
>> 4 0 0 1/81
>> 3 1 0 4/81
>> 3 0 1 4/81
>> 2 2 0 6/81
>> 2 0 2 6/81
>> 2 1 1 12/81
>> 1 3 0 4/81
>> 1 2 1 12/81
>> 1 1 2 12/81
>> 1 0 3 4/81
>> 0 4 0 1/81
>> 0 3 1 4/81
>> 0 2 2 6/81
>> 0 1 3 4/81
>> 0 0 4 1/81
>>
>> and the sum of the 3 which include all is 36/81 or 4/9.
>>
>> With similar logic, 3*(prob of selecting 3,1,1 + prob of selecting
>> 2,2,1), the probability that at least one color will not be
>> represented among the five balls drawn is 31/81.
>>
>> Ian Smith- Hide quoted text -
>>
>> - Show quoted text -
>
> Okay, I see now.
>
> Thanks,
>
> John
These answers concern drawing balls without replacement; your question
was about drawing with replacement?

== 3 of 3 ==
Date: Thurs, Nov 15 2007 1:34 am
From: Ray Koopman

On Nov 15, 1:22 am, jos jansen <post...@jkb.demon.nl> wrote:
> John Uebersax schreef:
>
> > On Nov 14, 3:48 pm, iandjmsm...@aol.com wrote:
> >> On 14 Nov, 14:14, John Uebersax <jsueber...@gmail.com> wrote:
>
> >>> Thank you Ian.
> >>> Two questions:
> >>> I seem, perhaps mistakenly, to enumerate 15 possible combinations, of
> >>> which only 3 include all
> >>> colors, or P = 3/15. Am I overlooking something obvious:
> >>> R B G
> >>> -----
> >>> 4 0 0
> >>> 3 1 0
> >>> 3 0 1
> >>> 2 2 0
> >>> 2 0 2
> >>> 2 1 1
> >>> 1 3 0
> >>> 1 2 1
> >>> 1 1 2
> >>> 1 0 3
> >>> 0 4 0
> >>> 0 3 1
> >>> 0 2 2
> >>> 0 1 3
> >>> 0 0 4
> >>> 2. What is the corresponding probability/formula for all colors being
> >>> represented given 5 balls drawn instead of 4?
> >>> Thanks,
> >>> John Uebersax
> >>> On Nov 14, 2:06 pm, iandjmsm...@aol.com wrote:
> >>>> On 14 Nov, 12:08, John Uebersax <jsueber...@gmail.com> wrote:
> >>>>> Would someone kindly tell me what is the *formula* to answer this
> >>>>> question:
> >>>>> An urn contains red, blue, and green balls, in equal proportions.
> >>>>> Drawing four balls (with replacement), what is the probability that at
> >>>>> least one color will not be represented among the four.
> >>>>> Thanks in advance. (No, this is not a homework problem ;) )
> >>>>> --
> >>>>> John Uebersax PhDhttp://ourworld.compuserve.com/homepages/jsuebersax/agree.htm
> >>>> The numbers of red, green and blue balls will have a multi-nomial
> >>>> distribution.
> >>>> The only way you will have all 3 colours represented is if there are 2
> >>>> reds, 1 green and one blue ball or 1 red, 2 greens and one blue ball
> >>>> or 1 red, 1 green and two blue balls.
> >>>> The probabality is 3*4!/(2!*1!*1!)*(1/3)^3
> >>>> The probability of at least one colour not being selected is therefore
> >>>> 5/9.
> >>>> Ian Smith- Hide quoted text -
> >>> - Show quoted text -
> >> The probabilities are
>
> >> R B G
> >> -----
> >> 4 0 0 1/81
> >> 3 1 0 4/81
> >> 3 0 1 4/81
> >> 2 2 0 6/81
> >> 2 0 2 6/81
> >> 2 1 1 12/81
> >> 1 3 0 4/81
> >> 1 2 1 12/81
> >> 1 1 2 12/81
> >> 1 0 3 4/81
> >> 0 4 0 1/81
> >> 0 3 1 4/81
> >> 0 2 2 6/81
> >> 0 1 3 4/81
> >> 0 0 4 1/81
>
> >> and the sum of the 3 which include all is 36/81 or 4/9.
>
> >> With similar logic, 3*(prob of selecting 3,1,1 + prob of selecting
> >> 2,2,1), the probability that at least one color will not be
> >> represented among the five balls drawn is 31/81.
>
> >> Ian Smith- Hide quoted text -
>
> >> - Show quoted text -
>
> > Okay, I see now.
>
> > Thanks,
>
> > John
>
> These answers concern drawing balls without replacement; your question
> was about drawing with replacement?

No, those answers (and mine) are for sampling *with* replacement,
which gives a multinomial distribution.

==============================================================================
TOPIC: Turn a uniform number to normal random numbers
http://groups.google.com/group/sci.stat.math/browse_thread/thread/054d911605199f7c?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Nov 15 2007 1:29 am
From: Ray Koopman

On Nov 14, 6:09 pm, Yves <sunder_1...@yahoo.com> wrote:
> Thanks all, for your replies.
>
> What are some of the quicker and accurate methods?

See Luc Devroye, Non-Uniform Random Variate Generation.
http://cg.scs.carleton.ca/~luc/rnbookindex.html

==============================================================================
TOPIC: wrong R-Squared value??
http://groups.google.com/group/sci.stat.math/browse_thread/thread/259e11ac412a3219?hl=en
==============================================================================

== 1 of 6 ==
Date: Thurs, Nov 15 2007 1:40 am
From: jantunes

Java programming, then (upon the strange results) Excel.

== 2 of 6 ==
Date: Thurs, Nov 15 2007 1:53 am
From: jantunes

> I believe that when you take a validation data set,
> you can indeed
> have such wild predictions that SSR > SST, or SSE >
> SST. This
> indicates that there is a problem with the way the
> model fits to your
> validation data.
>
> I don't think R-squared is the proper way to compare
> the validation
> fit to the training fit. I would compare the MSE from
> the validation
> data set to the MSE of the training data set -- if
> they are close,
> that's good, if they are widely different, that's
> bad.

I've done some searching on MSE and it appears that MSE is good to compare different statistics models (lowest MSE = best model), but not in giving an abslotute value like R-squared.
I'm not familiar with these but I've seen them in the literature: Chi-squared, F-test, p-value. Isn't there any (abslolute) statistic measure I can produce that validates y' (prediction) against the real and full data?

== 3 of 6 ==
Date: Thurs, Nov 15 2007 5:22 am
From: Paige Miller

On Nov 15, 4:53 am, jantunes <jasantu...@gmail.com> wrote:
> > I believe that when you take a validation data set,
> > you can indeed
> > have such wild predictions that SSR > SST, or SSE >
> > SST. This
> > indicates that there is a problem with the way the
> > model fits to your
> > validation data.
>
> > I don't think R-squared is the proper way to compare
> > the validation
> > fit to the training fit. I would compare the MSE from
> > the validation
> > data set to the MSE of the training data set -- if
> > they are close,
> > that's good, if they are widely different, that's
> > bad.
>
> I've done some searching on MSE and it appears that MSE is good to compare different statistics models (lowest MSE = best model), but not in giving an abslotute value like R-squared.

Modelling techniques such as Partial Least Squares typically uses a
measure of residual error to compare models from a training set to a
validation set. I see no reason why a similar measure can't be used in
an Ordinary Least Squares model as well.

> I'm not familiar with these but I've seen them in the literature: Chi-squared, F-test, p-value. Isn't there any (abslolute) statistic measure I can produce that validates y' (prediction) against the real and full data?

I suppose you could do an F-test of MSE(training set)/MSE(validation
set). I'm not sure if this violates any of the standard
assumptions ... I'll have to think about that.

--
Paige Miller
paige\dot\miller \at\ kodak\dot\com

== 4 of 6 ==
Date: Thurs, Nov 15 2007 9:54 am
From: m00es

On Nov 15, 10:53 am, jantunes <jasantu...@gmail.com> wrote:
> > I believe that when you take a validation data set,
> > you can indeed
> > have such wild predictions that SSR > SST, or SSE >
> > SST. This
> > indicates that there is a problem with the way the
> > model fits to your
> > validation data.
>
> > I don't think R-squared is the proper way to compare
> > the validation
> > fit to the training fit. I would compare the MSE from
> > the validation
> > data set to the MSE of the training data set -- if
> > they are close,
> > that's good, if they are widely different, that's
> > bad.
>
> I've done some searching on MSE and it appears that MSE is good to compare different statistics models (lowest MSE = best model), but not in giving an abslotute value like R-squared.
> I'm not familiar with these but I've seen them in the literature: Chi-squared, F-test, p-value. Isn't there any (abslolute) statistic measure I can produce that validates y' (prediction) against the real and full data?

Why not just correlate the predicted values with the observed values
and square that?

So, you start with one set of data and find the least squares
regression line:

Y_hat = b0 + b1 X

Note that the squared correlation between the observed Y values and
the predicted values from this model (i.e., the Y_hat values) is equal
to R^2.

Now take a new set of data. Predict Y_hat using the least squares
regression line found in the first dataset. Calculate the squared
correlation between the (new) Y values and the predicted Y_hat values.
This will be between 0 and 1 and indicates how much of the variance in
the Y values (in the new dataset) can be accounted for based on
knowing X and using the regression equation found using the first
dataset.

m00es

== 5 of 6 ==
Date: Thurs, Nov 15 2007 11:54 am
From: Ray Koopman

On Nov 15, 1:53 am, jantunes <jasantu...@gmail.com> wrote:
> I've done some searching on MSE and it appears that MSE is good
> to compare different statistics models (lowest MSE = best model),
> but not in giving an abslotute value like R-squared.

You've got it backwards. R-square is relative, not absolute.
It depends on the size of the errors, relative to the overall
variability. MSE is an absolute measure of the size of the errors,
although you should be looking at sqrt(MSE), the root-mean-square
error, because it's in the right units. Similarly, if you insist
on using a relative measure, you should look at sqrt(1 - R^2),
the rms error relative to the overall SD.

== 6 of 6 ==
Date: Thurs, Nov 15 2007 12:29 pm
From: Richard Ulrich

On Wed, 14 Nov 2007 13:07:01 EST, jantunes <jasantunes@gmail.com>
wrote:

> Hi all,
>
> I'm doing a linear regression to produce a trendline that can predict (more or less) some future data. The data is very correlated (something like R=0.98).
>
> This is what I do:
> 1) get 200 data points (x is a time series; y is CPU usage)
> 2) do linear regression based on those 200 points, resulting in some y'=a + bx
> 3) get R-squared (R^2=0.96) for the y'

At this point -- Are you doing anything to get rid of
"spurious" correlations based on simple trends, etc.?
OLS regression may give useful results, in some sense,
but it won't give legitimate tests, and values such as
R-squared might be approximately useless, even when
they are as big as 0.96, if the whole thing represents
a simple trend.

>
> Then, I want to validate that trendline/prediction by comparing it with more real data:
> 4) get more data points, past the 200 points (eg 10000)

Wow! I though only astronomers used 50 times the
baseline for projections. What *is* this problem, anyway?

> 5) get R-squared for the y' (this time against the new data)

Use the squared deviations from the predicted values,
predicting from the original 200 points. It is hard to find much
fault with the RMS error. Isn't this exercise supposed to be
validation of the original equation? - then you have to use
the original equation.

Which R-squared? You show, yourself, you can't figure out
what to use to make a "proper" R-squared, or else, how to make
decent sense of R-squared.

>
> The problem is that this new R-squared has very strange values (depending on the equation), either <0 (SSE/SST>1), >1 (SSR>SST), or near 0,99 (when in fact the trendline is not accurate).
> Has I said I have already tried different ways of calculating the R-squared. They all give the same value in 3), but strange values in 5).
>
> Am I doing some wrong assumption here? I pretty sure the calculations are correct... How can I validate my trendlines (linear regression models)?

--
Rich Ulrich, wpilib@pitt.edu

http://www.pitt.edu/~wpilib/index.html

==============================================================================
TOPIC: Matlab newb - CDF
http://groups.google.com/group/sci.stat.math/browse_thread/thread/650d7751d4c8a87e?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Nov 15 2007 1:52 am
From: "adshaikh.hipnet@googlemail.com"

Hi there...all wondering if anyone can help me please.

Now I have an array of about 1 x 260 elements varying randomly.

Now assume I have that array as x.

I understand I can take the CDF by using ecdf(x) or cdfplot(x);

But I also realise that the curves are not smooth and I have read
somewhere thats because the empirical cdf given here only takes
factorf of 1/n into account..and therefore not smooth..I dont know if
I am right...but what I really want is smoothened CDF plot without any
steps..

I have tried looking at matlab examples of evcdf and normcdf ..but
that dint help much..

Could someone help me with the command for this...

Anyway help will be deeply appreciated.

Thanks

ads

==============================================================================
TOPIC: YOU THINK I AM A SUCKER , John ( R ) Smith?
http://groups.google.com/group/sci.stat.math/browse_thread/thread/9038fc3be1e9cc3c?hl=en
==============================================================================

== 1 of 7 ==
Date: Thurs, Nov 15 2007 2:46 am
From: "Luis A. Afonso"

Do you believe (DO YOU REALLY BELIEVE) that H. Lilliefors took the trouble to evaluate a large thousandths of simulations to get the table of values he presents in his PAPER and they HAD NO UTILITY to test the hypotheses the Population in study was normal?
DO YOU BELIEVE John (R) Smith?
He is a very DIFFERENT kind of Statistician as you are!
Your statement that
***
Either the parameter is INSIDE or OUTSIDE. The probability that the parameter is inside is either 100% or 0%. Same for the probability that the parameter is outside. ***
Is the proof that you don't understand, NOTHING AT ALL; how Monte Carlo method is able to derive Critical Values.

***********************
Luis Amaral Afonso

YOU THINK I AM A SUCKER , John ( R ) Smith?

John

You think I am a SUCKER?
BY PHONE, NEVER EVER!!!!!!
I DEMAND Hubert W, Lilliefors to deny here, in Sci Stat Math. his assertion (I copied EXACTLY FROM THE PAPER) that using HIS TABLE WE ARE able to get CONFIDENCE INTERVALS of the K-S TEXT .when the Population mean and std, deviation are unknown.
What a CRIMINAL evasive: YOU ARE AN INDECENT LIAR.
A PHONE CALL!!!!!!!!!!!!!!!!!!!!!
(I expect yet - the PHONE CALL TRICK - from you )
ARE YOU AWARE THAT NOBOBY BELIEVE IN THAT PHONE CALL ????????
___________

Luis Amaral Afonso

== 2 of 7 ==
Date: Thurs, Nov 15 2007 4:45 am
From: "Luis A. Afonso"

Three kinds of qualities could be asserted to John ( R ) Smith

1) Inability to connect mathematical results:
( I gave the Theoretical bases to which by a simulation procedure
Hypotheses Tests can be achieved)
2) Horrifying ignorance
(Monte Carlo in Statistics has a 40 years history and is used systematically since)
3) Absolute LACK OF ETHICS
(The invented telephone call to H. Lilliefors, which if true leads him to deny COMPLETELLY what EXPLICITLY stated in his fundamental June 1967 paper)

_Comment
If J.S. (and Jack Tomsky, Bob Ling, OMU) is a random sample of the US STARISTICIAN´S POPULATION, I can have a fair evaluation of its expertise and ethical behavior.
___________

Luis Amaral Afonso

***********************
Luis Amaral Afonso

YOU THINK I AM A SUCKER , John ( R ) Smith?

John

You think I am a SUCKER?
BY PHONE, NEVE EVER!!!!!!
I DEMAND Hubert W, Lilliefors to deny here, in Sci Stat Math. his assertion (I copied EXACTLY FROM THE PAPER) that using HIS TABLE WE ARE able to get CONFIDENCE INTERVALS of the K-S TEXT .when the Population mean and std, deviation are unknown.
What a CRIMINAL evasive: YOU ARE AN INDECENT LIAR
PHONE!!!!!!!!!!!!!!!!!!!!!
(I expect yet - the PHONE TRICK - from you )
ARE YOU AWARE THAT NOBOBY BELIEVE IN THAT PHONE????????
___________

Luis Amaral Afonso

== 3 of 7 ==
Date: Thurs, Nov 15 2007 6:15 am
From: John Smith

Luis,

I just called the asylum. They want you to return.

Do I think you're a sucker?

I'll tell you tomorrow...stay tuned.

John

PS -

I wrote: Either the parameter is INSIDE or OUTSIDE. The probability that the parameter is inside is either 100% or 0%. Same for the probability that the parameter is outside. ***

Your wrote: Is the proof that you don't understand, NOTHING AT ALL; how Monte Carlo method is able to derive Critical Values.

My response: are the critical values statistics or parameters???? Do you even know the difference?

== 4 of 7 ==
Date: Thurs, Nov 15 2007 7:21 am
From: "Luis A. Afonso"

Date: Nov 15, 2007 9:15 AM
Author: John Smith
Subject: Re: YOU THINK I AM A SUCKER
, John ( R ) Smith?

Luis,
I just called the asylum. They want you to return. Do I think you're a sucker? I'll tell you tomorrow...stay tuned. JohnPS - I wrote: Either the parameter is INSIDE or OUTSIDE. The probability that the parameter is inside is either 100% or 0%. Same for the probability that the parameter is outside. ***Your wrote: Is the proof that you don?t understand, NOTHING AT ALL; how Monte Carlo method is able to derive Critical Values. My response: are the critical values statistics or parameters???? Do you even know the difference?
******************

MY RESPONSE

I´m expecting Police is just ring to put in jail a FORGER.

Lilliefors´s paper DOESN`T care (RIGHTLY) the population parameters, because data is reduced to
______________Z=(X-Xbar)/sqr(s^2)
before being tested by the K-S Statistics.
CONSEQUENTLY
One can use normal standard data: X=N(0,1).
I did so and found EXACTLY the same as Lilliefors.
DO YOU HAVE CONSCIENCE that the more you point out in this matter the striking the evidence of your ignorance grows?

BY THE WAY

What about the idea you contact Lilliefors to WRITE HERE in Sci. Stat. Math. What is thinks about
a) The way he performed his evaluations
b) What was his GOAL?
c) The table of cumulative relative frequencies he named (page 400):
___TABLE 1, TABLE OF CRITICAL VALUES of D
are able to from them to obtain CONFIDENCE INTERVALS (my thesis) or are useless (your thesis) because * either the parameter is INSIDE or OUTSIDE *

Is urgent to invite him because he is the perfect person to judge who is right.

_____

Luis Amaral Afonso

== 5 of 7 ==
Date: Thurs, Nov 15 2007 9:52 am
From: Jack Tomsky

> YOU THINK I AM A SUCKER , John ( R ) Smith?
>
> John
>
> You think I am a SUCKER?
> BY PHONE, NEVE EVER!!!!!!
> I DEMAND Hubert W, Lilliefors to deny here, in Sci
> Stat Math. his assertion (I copied EXACTLY FROM THE
> PAPER) that using HIS TABLE WE ARE able to get
> CONFIDENCE INTERVALS of the K-S TEXT .when the
> Population mean and std, deviation are unknown.
> What a CRIMINAL evasive: YOU ARE AN INDECENT LIAR
> PHONE!!!!!!!!!!!!!!!!!!!!!
> (I expect yet - the PHONE TRICK - from you )
> ARE YOU AWARE THAT NOBOBY BELIEVE IN THAT
> PHONE????????
> ___________
>
>
> Luis Amaral Afonso

Afonso again falsely accused Hubert Lilliefors of not knowing the difference between parameters and sample statistics. The word "confidence" appears nowhere in Lilliefors paper.

Jack

== 6 of 7 ==
Date: Thurs, Nov 15 2007 9:56 am
From: John Smith

Luisa,

Of course you're a sucker.

I wrote: Either the parameter is INSIDE or OUTSIDE. The probability that the parameter is inside is either 100% or 0%. Same for the probability that the parameter is outside. ***
Your wrote: Is the proof that you don't understand, NOTHING AT ALL; how Monte Carlo method is able to derive Critical Values.

My response: are the critical values statistics or parameters???? Do you even know the difference?

John

== 7 of 7 ==
Date: Thurs, Nov 15 2007 11:16 am
From: "Luis A. Afonso"

Critical Values: Lilliefors´Tables expected accuracy

Lilliefors evaluations (Table 1 , Critical Values with 3 decimals) are based on N= 1,000 or more samples (p. 399 infra). Because

_________|d| <= sqr[ log(2/p) / (2*N)]

Putting p=0.95, N=2000 we got |d|<=0 .014

Therefore the DKW inequality guarantees at maximum TWO DECIMAL EXACT PLACES.

In fact because this is an inequality it is expected a rather better approximation.
Results posted in this News I used 100´000 samples for each sample size and |d| is better than 2E-3.
As a senior research member I was able to use 1E6 samples which guaranties the THREE DECIMAL PLACE´s EXACTNESS. (Instituto Nacional de Engenharia e Tecnologia Industrial, INETI, Lisbon , Portugal).
______

Luis Amaral Afonso

==============================================================================
TOPIC: Interesting (but difficult) question - calculating 'implied'
probabilities of a wager
http://groups.google.com/group/sci.stat.math/browse_thread/thread/245aac10e44fcef8?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Nov 15 2007 4:15 am
From: "Pavel314"

"Anonymous" <no.reply@here.com> wrote in message
news:Ou6dnXpmPox49KTanZ2dnUVZ8s-qnZ2d@bt.com...
> Here is a hypotheical scenario.
>
> A friend and I decide to visit the local county fair. There is a
> competition to see who can throw a heavy ball the highest. I bet my friend
> that I can throw the heavy metal ball more than X metres high.
>
> He in turn, says "I'll pay you a dollar for every Y centimeters that you
> can throw the ball above X meters - BUT to make it worth my while, you
> have to PAY ME Z dollars for me to take on the bet".
>
> From the above, my friend has calculated (implicitly from the wager he has
> made), the probability of me being able to throw the ball above X metres.
> How may I calculate the probaility, so I can work out the (implied) odds
> of my success?
>
> What methodology/logic/technique may I use to calculate the probability of
> me throwing the ball above X metres (based on the wager given above)?

You're right, this is interesting but difficult. I haven't solved it myself
but I have a few ideas. I cross-posted to sci.stat.math to see if we could
get some help from over there.

You are assuming that you can throw the ball (100 * X) + D centimeters,
where D is the amount by which you exceed 100 * X centimeters. It's not
stated, but let B be the amount of your initial bet that you can do this.

Your friend must also be assuming that your throw will be greater than 100 *
X centimeters or he would have taken your original bet.

Your friend must be assuming that your throw will be less than (100 * X) +
(D / Y) centimeters, where D is the distance by which you exceed X meters
and Y is his $1 incremental bet. Since he's betting Z, he's assuming Z > D /
Y.

I think that what needs to be done here is to map the expectations of each
player at a 95% confidence interval and use Bayesian inference to work out
the joint probability but I don't have time to do that right now, got to run
off to the day job.

Paul

==============================================================================
TOPIC: Linear Model Problem
http://groups.google.com/group/sci.stat.math/browse_thread/thread/cbdae4ed0080ff7a?hl=en
==============================================================================

== 1 of 1 ==
Date: Thurs, Nov 15 2007 8:42 am
From: sonicb11

I am working on a linear model problem.

Here is the model:

Wi = a*Zi + b*Zi*Xi + Zi*Ei

Wi is the observed response. a and b are the regression
coefficients. Zi follows a Beta distribution with parameters alpha and
beta (they are both unknown). Xi is a predictor variable, and Ei is
the error with mean zero and variance sigma^2.

I need to estimate a, b, alpha, beta, and sigma^2 using method of
moments. I want formulas for them. I've never seen a linear model like
this before and have no idea how to tackle it. Any help or hints would
be greatly appreciated. Thanks in advance.

==============================================================================
TOPIC: slope test in linear regression with known intercept and known error
variance
http://groups.google.com/group/sci.stat.math/browse_thread/thread/74ab6da6d7a0f55e?hl=en
==============================================================================

== 1 of 5 ==
Date: Thurs, Nov 15 2007 11:01 am
From: elodie.gillain@gmail.com

Dear Forum,

Suppose the following model
Y_ij=1+beta*X_i+eps_ij
with j=1,2,...n_i and i=1,2,3,4.
the eps_ij are iid standard normal rvs

The goal is to test
Ho:beta =<0
vs. H1:beta >0

Question 1:
Consider the first following procedure:
Run a two-sample test based only on Y_1j, j=1,...,100 (sample1) and
Y_4j, j=1,...,100 (sample 2). Here, n1=n4=100, n2=n3=0.
Find the 0.05-level test of this procedure.

I am thinking that I can use a t test. the test would reject if

(betaols-0)/s(betahat)>t_{200-1}(0.95)

where betaols=SSY/SSX is the OLS estimate of beta and s(betaols) is
computed as usual with the MSE. Should I be using a t test? it is
given that the eps_ij are standard normal, so I could use a normal
test, right? In this case, the test would be

(betaols-0)*sqrt{SSX}/1>z(0.95)

Question 2:
Take n replications at each x_i (n1=n2=n3=n4=n). Obtain the MLE and
base the test of that estimator.
Find the 0.05-level test of this procedure.

I am thinking that the distribution of betamle is Normal(beta, sigma^2/
SSX). So I would run the following the normal test
(betamle-0)*sqrt(SSX)/1>z(0.95)

Where betamle is SSY/SSX, but this time the SSX and SSY are different
from those of question 1.

I greatly appreciate your help. In a third question, I need to compare
the power of each of the tests, so I will need to express
SSY(question2) as a function of SSY(question1), and SSX(question2) as
a function of SSX(question1). Can anybody help me do that?

== 2 of 5 ==
Date: Thurs, Nov 15 2007 11:27 am
From: Jack Tomsky

> Dear Forum,
>
> Suppose the following model
> Y_ij=1+beta*X_i+eps_ij
> with j=1,2,...n_i and i=1,2,3,4.
> the eps_ij are iid standard normal rvs
>
> The goal is to test
> Ho:beta =<0
> vs. H1:beta >0
>
> Question 1:
> Consider the first following procedure:
> Run a two-sample test based only on Y_1j, j=1,...,100
> (sample1) and
> Y_4j, j=1,...,100 (sample 2). Here, n1=n4=100,
> n2=n3=0.
> Find the 0.05-level test of this procedure.
>
> I am thinking that I can use a t test. the test would
> reject if
>
> (betaols-0)/s(betahat)>t_{200-1}(0.95)
>
> where betaols=SSY/SSX is the OLS estimate of beta and
> s(betaols) is
> computed as usual with the MSE. Should I be using a t
> test? it is
> given that the eps_ij are standard normal, so I could
> use a normal
> test, right? In this case, the test would be
>
> (betaols-0)*sqrt{SSX}/1>z(0.95)
>
> Question 2:
> Take n replications at each x_i (n1=n2=n3=n4=n).
> Obtain the MLE and
> base the test of that estimator.
> Find the 0.05-level test of this procedure.
>
> I am thinking that the distribution of betamle is
> Normal(beta, sigma^2/
> SSX). So I would run the following the normal test
> (betamle-0)*sqrt(SSX)/1>z(0.95)
>
> Where betamle is SSY/SSX, but this time the SSX and
> SSY are different
> from those of question 1.
>
> I greatly appreciate your help. In a third question,
> I need to compare
> the power of each of the tests, so I will need to
> express
> SSY(question2) as a function of SSY(question1), and
> SSX(question2) as
> a function of SSX(question1). Can anybody help me do
> that?

YOu can use all the data. Let Zij = Yij-1.

Then the LS estimate of beta is

betahat = Sum[(Xi)Sum(Zij)]/Sum(Ni*Xi^2)

where the sums go from i = 1, ..., 4 and j = 1, ..., Ni.

The varaince of betahat is

Var(betahat) = 1/Sum(Ni*Xi^2)

Then

betahat/Sqrt(Var(betahat)) ~ N(0,1).

Jack

== 3 of 5 ==
Date: Thurs, Nov 15 2007 11:49 am
From: elodie.gillain@gmail.com

On Nov 15, 2:27 pm, Jack Tomsky <jtom...@ix.netcom.com> wrote:
> > Dear Forum,
>
> > Suppose the following model
> > Y_ij=1+beta*X_i+eps_ij
> > with j=1,2,...n_i and i=1,2,3,4.
> > the eps_ij are iid standard normal rvs
>
> > The goal is to test
> > Ho:beta =<0
> > vs. H1:beta >0
>
> > Question 1:
> > Consider the first following procedure:
> > Run a two-sample test based only on Y_1j, j=1,...,100
> > (sample1) and
> > Y_4j, j=1,...,100 (sample 2). Here, n1=n4=100,
> > n2=n3=0.
> > Find the 0.05-level test of this procedure.
>
> > I am thinking that I can use a t test. the test would
> > reject if
>
> > (betaols-0)/s(betahat)>t_{200-1}(0.95)
>
> > where betaols=SSY/SSX is the OLS estimate of beta and
> > s(betaols) is
> > computed as usual with the MSE. Should I be using a t
> > test? it is
> > given that the eps_ij are standard normal, so I could
> > use a normal
> > test, right? In this case, the test would be
>
> > (betaols-0)*sqrt{SSX}/1>z(0.95)
>
> > Question 2:
> > Take n replications at each x_i (n1=n2=n3=n4=n).
> > Obtain the MLE and
> > base the test of that estimator.
> > Find the 0.05-level test of this procedure.
>
> > I am thinking that the distribution of betamle is
> > Normal(beta, sigma^2/
> > SSX). So I would run the following the normal test
> > (betamle-0)*sqrt(SSX)/1>z(0.95)
>
> > Where betamle is SSY/SSX, but this time the SSX and
> > SSY are different
> > from those of question 1.
>
> > I greatly appreciate your help. In a third question,
> > I need to compare
> > the power of each of the tests, so I will need to
> > express
> > SSY(question2) as a function of SSY(question1), and
> > SSX(question2) as
> > a function of SSX(question1). Can anybody help me do
> > that?
>
> YOu can use all the data. Let Zij = Yij-1.
>
> Then the LS estimate of beta is
>
> betahat = Sum[(Xi)Sum(Zij)]/Sum(Ni*Xi^2)
>
> where the sums go from i = 1, ..., 4 and j = 1, ..., Ni.
>
> The varaince of betahat is
>
> Var(betahat) = 1/Sum(Ni*Xi^2)
>
> Then
>
> betahat/Sqrt(Var(betahat)) ~ N(0,1).
>
> Jack

Many thanks for your reply. Is there a book or a paper that would be a
good reference for this problem?

Many thanks for your help.

== 4 of 5 ==
Date: Thurs, Nov 15 2007 12:38 pm
From: Jack Tomsky

> On Nov 15, 2:27 pm, Jack Tomsky
> <jtom...@ix.netcom.com> wrote:
> > > Dear Forum,
> >
> > > Suppose the following model
> > > Y_ij=1+beta*X_i+eps_ij
> > > with j=1,2,...n_i and i=1,2,3,4.
> > > the eps_ij are iid standard normal rvs
> >
> > > The goal is to test
> > > Ho:beta =<0
> > > vs. H1:beta >0
> >
> > > Question 1:
> > > Consider the first following procedure:
> > > Run a two-sample test based only on Y_1j,
> j=1,...,100
> > > (sample1) and
> > > Y_4j, j=1,...,100 (sample 2). Here, n1=n4=100,
> > > n2=n3=0.
> > > Find the 0.05-level test of this procedure.
> >
> > > I am thinking that I can use a t test. the test
> would
> > > reject if
> >
> > > (betaols-0)/s(betahat)>t_{200-1}(0.95)
> >
> > > where betaols=SSY/SSX is the OLS estimate of beta
> and
> > > s(betaols) is
> > > computed as usual with the MSE. Should I be using
> a t
> > > test? it is
> > > given that the eps_ij are standard normal, so I
> could
> > > use a normal
> > > test, right? In this case, the test would be
> >
> > > (betaols-0)*sqrt{SSX}/1>z(0.95)
> >
> > > Question 2:
> > > Take n replications at each x_i (n1=n2=n3=n4=n).
> > > Obtain the MLE and
> > > base the test of that estimator.
> > > Find the 0.05-level test of this procedure.
> >
> > > I am thinking that the distribution of betamle is
> > > Normal(beta, sigma^2/
> > > SSX). So I would run the following the normal
> test
> > > (betamle-0)*sqrt(SSX)/1>z(0.95)
> >
> > > Where betamle is SSY/SSX, but this time the SSX
> and
> > > SSY are different
> > > from those of question 1.
> >
> > > I greatly appreciate your help. In a third
> question,
> > > I need to compare
> > > the power of each of the tests, so I will need to
> > > express
> > > SSY(question2) as a function of SSY(question1),
> and
> > > SSX(question2) as
> > > a function of SSX(question1). Can anybody help me
> do
> > > that?
> >
> > YOu can use all the data. Let Zij = Yij-1.
> >
> > Then the LS estimate of beta is
> >
> > betahat = Sum[(Xi)Sum(Zij)]/Sum(Ni*Xi^2)
> >
> > where the sums go from i = 1, ..., 4 and j = 1,
> ..., Ni.
> >
> > The varaince of betahat is
> >
> > Var(betahat) = 1/Sum(Ni*Xi^2)
> >
> > Then
> >
> > betahat/Sqrt(Var(betahat)) ~ N(0,1).
> >
> > Jack
>
> Many thanks for your reply. Is there a book or a
> paper that would be a
> good reference for this problem?
>
> Many thanks for your help.
>
>

All I did was to put it into the form of a general linear model, Z = X*beta, where z is a column vector of Z_11, ..., Z_4,N4, X is a vector of X1, ..., X1, ...,X4, ..., X4 and beta is a scalar.

Then the LS estimate of beta is

betahat = X'Z/X'X.

Algebraic manipulations reduce it to the form I gave.

Since the covariance matrix of Z is given as the identity matrix,

Var(betahat) = X'X/(X'X)^2 = 1/(X'X).

Hope this helps.

Jack

== 5 of 5 ==
Date: Thurs, Nov 15 2007 12:56 pm
From: elodie.gillain@gmail.com

On Nov 15, 3:38 pm, Jack Tomsky <jtom...@ix.netcom.com> wrote:
> > On Nov 15, 2:27 pm, Jack Tomsky
> > <jtom...@ix.netcom.com> wrote:
> > > > Dear Forum,
>
> > > > Suppose the following model
> > > > Y_ij=1+beta*X_i+eps_ij
> > > > with j=1,2,...n_i and i=1,2,3,4.
> > > > the eps_ij are iid standard normal rvs
>
> > > > The goal is to test
> > > > Ho:beta =<0
> > > > vs. H1:beta >0
>
> > > > Question 1:
> > > > Consider the first following procedure:
> > > > Run a two-sample test based only on Y_1j,
> > j=1,...,100
> > > > (sample1) and
> > > > Y_4j, j=1,...,100 (sample 2). Here, n1=n4=100,
> > > > n2=n3=0.
> > > > Find the 0.05-level test of this procedure.
>
> > > > I am thinking that I can use a t test. the test
> > would
> > > > reject if
>
> > > > (betaols-0)/s(betahat)>t_{200-1}(0.95)
>
> > > > where betaols=SSY/SSX is the OLS estimate of beta
> > and
> > > > s(betaols) is
> > > > computed as usual with the MSE. Should I be using
> > a t
> > > > test? it is
> > > > given that the eps_ij are standard normal, so I
> > could
> > > > use a normal
> > > > test, right? In this case, the test would be
>
> > > > (betaols-0)*sqrt{SSX}/1>z(0.95)
>
> > > > Question 2:
> > > > Take n replications at each x_i (n1=n2=n3=n4=n).
> > > > Obtain the MLE and
> > > > base the test of that estimator.
> > > > Find the 0.05-level test of this procedure.
>
> > > > I am thinking that the distribution of betamle is
> > > > Normal(beta, sigma^2/
> > > > SSX). So I would run the following the normal
> > test
> > > > (betamle-0)*sqrt(SSX)/1>z(0.95)
>
> > > > Where betamle is SSY/SSX, but this time the SSX
> > and
> > > > SSY are different
> > > > from those of question 1.
>
> > > > I greatly appreciate your help. In a third
> > question,
> > > > I need to compare
> > > > the power of each of the tests, so I will need to
> > > > express
> > > > SSY(question2) as a function of SSY(question1),
> > and
> > > > SSX(question2) as
> > > > a function of SSX(question1). Can anybody help me
> > do
> > > > that?
>
> > > YOu can use all the data. Let Zij = Yij-1.
>
> > > Then the LS estimate of beta is
>
> > > betahat = Sum[(Xi)Sum(Zij)]/Sum(Ni*Xi^2)
>
> > > where the sums go from i = 1, ..., 4 and j = 1,
> > ..., Ni.
>
> > > The varaince of betahat is
>
> > > Var(betahat) = 1/Sum(Ni*Xi^2)
>
> > > Then
>
> > > betahat/Sqrt(Var(betahat)) ~ N(0,1).
>
> > > Jack
>
> > Many thanks for your reply. Is there a book or a
> > paper that would be a
> > good reference for this problem?
>
> > Many thanks for your help.
>
> All I did was to put it into the form of a general linear model, Z = X*beta, where z is a column vector of Z_11, ..., Z_4,N4, X is a vector of X1, ..., X1, ...,X4, ..., X4 and beta is a scalar.
>
> Then the LS estimate of beta is
>
> betahat = X'Z/X'X.
>
> Algebraic manipulations reduce it to the form I gave.
>
> Since the covariance matrix of Z is given as the identity matrix,
>
> Var(betahat) = X'X/(X'X)^2 = 1/(X'X).
>
> Hope this helps.
>
> Jack

I understand, many thanks!

Question 2 asks for a test based on the MLE. I can say that the MLE is
just the same as the OLS, right?

I greatly appreciate your help.

==============================================================================

You received this message because you are subscribed to the Google Groups "sci.stat.math"
group.

To post to this group, visit http://groups.google.com/group/sci.stat.math?hl=en

To unsubscribe from this group, send email to sci.stat.math-unsubscribe@googlegroups.com

To change the way you get mail from this group, visit:
http://groups.google.com/group/sci.stat.math/subscribe?hl=en

To report abuse, send email explaining the problem to abuse@googlegroups.com

==============================================================================
Google Groups: http://groups.google.com?hl=en

Statistika For Life

Kamis, 15 November 2007

25 new messages in 8 topics - digest

Tidak ada komentar:

Arsip Blog

Mengenai Saya