Rabu, 21 November 2007

15 new messages in 7 topics - digest

sci.stat.math
http://groups.google.com/group/sci.stat.math?hl=en

sci.stat.math@googlegroups.com

Today's topics:

* wrong R-Squared value?? - 2 messages, 2 authors
http://groups.google.com/group/sci.stat.math/browse_thread/thread/259e11ac412a3219?hl=en
* The only thing THAT MATTERS - 3 messages, 2 authors
http://groups.google.com/group/sci.stat.math/browse_thread/thread/db29382ab441eab8?hl=en
* Turn a uniform number to normal random numbers - 1 messages, 1 author
http://groups.google.com/group/sci.stat.math/browse_thread/thread/054d911605199f7c?hl=en
* Sample Size - 4 messages, 2 authors
http://groups.google.com/group/sci.stat.math/browse_thread/thread/ff810bbd9cf63993?hl=en
* complete sufficient = minimal sufficient? - 1 messages, 1 author
http://groups.google.com/group/sci.stat.math/browse_thread/thread/05c766fa3c882185?hl=en
* Time Series Analysis Help - 3 messages, 2 authors
http://groups.google.com/group/sci.stat.math/browse_thread/thread/dd67a6fca7d0207e?hl=en
* Statistical Methods for Ranks? - 1 messages, 1 author
http://groups.google.com/group/sci.stat.math/browse_thread/thread/d02d4f899aae90fe?hl=en

==============================================================================
TOPIC: wrong R-Squared value??
http://groups.google.com/group/sci.stat.math/browse_thread/thread/259e11ac412a3219?hl=en
==============================================================================

== 1 of 2 ==
Date: Tues, Nov 20 2007 2:29 am
From: jantunes


> For a time series, the obvious spurious correlations
> involve simple linear trends in separate variables.
> Or cycles.
> If two variables separately have a similar trend,
> they will have a positive correlation.
>
> If Sequence-number correlates with your raw data, you
> have a potential problem.


> > I'm trying to predict the resource usage for a
> > given computer task (x = number of times the taks
> > is repeated). So, I get a different y and y'
> > (prediction) for a different type of resource (CPU,
> > memory, etc).
>
> This *sounds* like a matter of bench marking. For
> that, the "time series" aspect should be incidental
> and irrelevant.
> Each separate "experiment" should give the same
> results regardless of when it is run. Is there *any*
> sort of proper carry-over between experiments?
>
> One really onerous way that these data could resemble
> a time series is if you recorded the x and y as
> cumulative counters, and never subtracted in order to
> find the data for the separate experiments.
>
> That would create a strong correlation that is
> "spurious" and essentially useless.
>
> Why is there any carry-over between experiments?


There is no separate experiment. This is a one experiment only, which consists in repeating the same task (e.g., a client/server request) several times.
But yes, there is a natural cumulative data because I'm measuring the current resource usage of the application (memory, disk, etc).

Thanks

== 2 of 2 ==
Date: Tues, Nov 20 2007 1:35 pm
From: Richard Ulrich


On Tue, 20 Nov 2007 05:29:06 EST, jantunes <jasantunes@gmail.com>
wrote:

[snip]
RU > >
> > This *sounds* like a matter of bench marking. For
> > that, the "time series" aspect should be incidental
> > and irrelevant.

Okay, for computer server benchmarks, the background
information might predict the speed of response. Response
will be faster when you are not competing with a couple of
users who are downloading movies, for example.

RU > >
> > Each separate "experiment" should give the same
> > results regardless of when it is run. Is there *any*
> > sort of proper carry-over between experiments?
> >
> > One really onerous way that these data could resemble
> > a time series is if you recorded the x and y as
> > cumulative counters, and never subtracted in order to
> > find the data for the separate experiments.
> >
> > That would create a strong correlation that is
> > "spurious" and essentially useless.
> >
> > Why is there any carry-over between experiments?
>
ja >
> There is no separate experiment. This is a one experiment only,
> which consists in repeating the same task (e.g., a client/server
> request) several times.

Why is it not true that each measurement is a separate experiment,
with its own separate timing?

Why is your application different from every other bench marking
I've read of?


> But yes, there is a natural cumulative data because I'm measuring
> the current resource usage of the application (memory, disk, etc).

SEE what I wrote last time, above.

--
Rich Ulrich, wpilib@pitt.edu

http://www.pitt.edu/~wpilib/index.html


==============================================================================
TOPIC: The only thing THAT MATTERS
http://groups.google.com/group/sci.stat.math/browse_thread/thread/db29382ab441eab8?hl=en
==============================================================================

== 1 of 3 ==
Date: Tues, Nov 20 2007 6:46 am
From: "Luis A. Afonso"


19, 2007 9:45 PM
Author: John Smith
Subject: Re: The only thing THAT MATTERS

Nobody should believe anything Afonso writes until he answers these simple questions. When a Monte Carlo created distribution is created, are the 1% and 99% percentiles statistics or parameters? Bonus question: what is the role of the parameter in a Monte Carlo? John ****

MY RESPONSE

Who believes in such a person that is not an author of a paper concerning Monte Carlo Method?
Who believes someone that did not learn that don't know that the Box-Muller transformation is a rigorous way to get normal values?
How credible is a person that ignores the Conovers Textbook Practical Nonparametric Statistics where one learns how to obtain the Cumulative Frequencies from data?
What to say from a so-called statistician that never met the 1956 Dvorestky- Kiefer- Wolfovitz inequality that teach us how a Empirical Distribution Function is close to the Theoretical one?
How can we trust a person that is ZERO AWARE from thousands papers where Monte Carlo has been used to confirm or deny approach results the Theory provide?

What to say from a person that ask inappropriate questions about a simple DISTANCE (which is the realm of the Lilliefors-Kolmogorov-Smirnov test on normality'


When the samples are created,
In a Monte Carlo Simulation,
One gets the first step merely,
A preliminary point, a situation
To have the Test Distribution.

This one is then constructed
In a way we want, no exception,
It's elementary, do you see?
Only imbeciles find confusion,
No trouble for true statistician.

Lilliefors test, not excluded
From this clear classification,
Test is only a DISTANCE simply
Any sample how far measuring
The model: what other thing?

The Distance by two guys founded:
Kolmogorov-Smirnov celebration.
Who are able, really, to fight me?
Nobody except an idiot boring
Full addicted in talk rambling.

His litany has any real reaching:
Statistics or Parameters is asking
Not at all, neither, exclusively,
A DISTANCE, not more, clearly.

Who believes some Smith boys,
So much ignorant in this matter
Never Monte Carlo got annoys?
How much Empirical, how better,
Can approach Distribution Function
Should, sure, DKW have instruction!
Such a thing he yet met never.


I've not time to spend, surely,
View so great, asinine, opacity.

******
Luis Amaral Afonso

== 2 of 3 ==
Date: Tues, Nov 20 2007 10:48 am
From: John Smith


See how Alfonso responds to a simple statistics question with lots of nonsense? He is trying to hide the fact that he cannot answer the simple question, because he doesn't know any statistics.

John

== 3 of 3 ==
Date: Tues, Nov 20 2007 11:18 am
From: "Luis A. Afonso"


John Smith

Are you saying, IGNORANT JACK, that the Test Statistics of the Lilliefors (Kolmogorov - Smirnov) Test of Normality IS NOT A DISTANCE and trough Monte Carlo one are not able to get he Critical Values?
What paper are you authoring in this area (in a decent Journal) say to us, the Readers?
Don't you feel ridiculous?

***

Luis Amaral Afonso


==============================================================================
TOPIC: Turn a uniform number to normal random numbers
http://groups.google.com/group/sci.stat.math/browse_thread/thread/054d911605199f7c?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Nov 20 2007 8:12 am
From: "Luis A. Afonso"


David


FACTS ARE FACTS

My post dated Nov 14, 2007 2:05 PM (presented in this thread) a solution that is more general than yours Nov 19, 2007 11.28 AM one.
I DIDN´T say you copied me: only that your post is useless.
Furthermore I identified the algorithm, Box- Muller's, an useful information to the OP for ulterior self checking.

Be aware: those that prefer OPPINIONS to FACTS are incapable to perform whatsoever in SCIENCE.


****
Luis Amaral Afonso


==============================================================================
TOPIC: Sample Size
http://groups.google.com/group/sci.stat.math/browse_thread/thread/ff810bbd9cf63993?hl=en
==============================================================================

== 1 of 4 ==
Date: Tues, Nov 20 2007 12:19 pm
From: John Smith


Luisa,

Still can't answer a simple question, can you?

John

PS -- Have someone who knows English read posts by myself and by Tomsky. Only a moron would mistake the writings styles but, guess what?

== 2 of 4 ==
Date: Tues, Nov 20 2007 12:29 pm
From: "Luis A. Afonso"


N(0,1)

N(0,2)


N(0,3)


**** Date: Nov 20, 2007 3:19 PM
Author: John Smith
Subject: Re: Sample Size

Luisa,

Still can't answer a simple question, can you?

John

PS -- Have someone who knows English read posts by myself and by Tomsky. Only a moron would mistake the writings styles but, guess what?****


Jean, Joan

In what concern STUPIDITY I found no difference between you and Jackie, Jacqueline.
****

Luis Amaral Afonso

== 3 of 4 ==
Date: Tues, Nov 20 2007 12:30 pm
From: "Luis A. Afonso"


**** Date: Nov 20, 2007 3:19 PM
Author: John Smith
Subject: Re: Sample Size

Luisa,

Still can't answer a simple question, can you?

John

PS -- Have someone who knows English read posts by myself and by Tomsky. Only a moron would mistake the writings styles but, guess what?****


Jean, Joan

In what concern STUPIDITY I found no difference between you and Jackie, Jacqueline.
****

Luis Amaral Afonso

== 4 of 4 ==
Date: Tues, Nov 20 2007 1:17 pm
From: John Smith


Luisa,

I wrote:
PS -- Have someone who knows English read posts by myself and by Tomsky. Only a moron would mistake the writings styles but, guess what?****


you wrote: In what concern STUPIDITY I found no difference between you and Jackie, Jacqueline.
****

It's obvious you can't answer a simple statistics question, but can't you follow instructions? I said "some who knows English"; that obviously excludes you.

John


==============================================================================
TOPIC: complete sufficient = minimal sufficient?
http://groups.google.com/group/sci.stat.math/browse_thread/thread/05c766fa3c882185?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Nov 20 2007 1:08 pm
From: leading


1. As any statistics textbook points out, a complete sufficient
statistic is necessarily minimal sufficient.
Conversely is minimal sufficient statistic also complete sufficient?
2. If G is a complete sufficient statistic, and f is a function such
that f(G) is a sufficient statistic, is f(G) also complete?
Thanks


==============================================================================
TOPIC: Time Series Analysis Help
http://groups.google.com/group/sci.stat.math/browse_thread/thread/dd67a6fca7d0207e?hl=en
==============================================================================

== 1 of 3 ==
Date: Tues, Nov 20 2007 3:24 pm
From: Idgarad


Ok I am not a statistics guru I admit but I have trying to do some
basic forecasting that would meeting some basic statistical
requirements. I have the following data:

Date MIPS
1/5/2004 306.203
1/12/2004 364.29
1/19/2004 384.779
1/26/2004 387.91
2/2/2004 339.041
2/9/2004 414.383
2/16/2004 313.764
2/23/2004 335.001
3/1/2004 323.978
3/8/2004 312.729
3/15/2004 343.589
3/22/2004 333.252
3/29/2004 376.878
4/5/2004 390.825
4/12/2004 356.892
4/19/2004 383.517
4/26/2004 325.227
5/3/2004 254.279
5/10/2004 255.221
5/17/2004 266.575
5/24/2004 270.073
5/31/2004 293.269
6/7/2004 309.114
6/14/2004 311.633
6/21/2004 350.444
6/28/2004 296.203
7/5/2004 332.153
7/12/2004 306.23
7/19/2004 368.466
7/26/2004 334.271
8/2/2004 349.002
8/9/2004 378.682
8/16/2004 333.731
8/23/2004 380.037
8/30/2004 298.417
9/6/2004 288.728
9/13/2004 342.81
9/20/2004 382.866
9/27/2004 419.828
10/4/2004 379.289
10/11/2004 400.749
10/18/2004 453.514
10/25/2004 388.742
11/1/2004 333.935
11/8/2004 341.659
11/15/2004 281.586
11/22/2004 305.749
11/29/2004 310.391
12/6/2004 317.704
12/13/2004 380.804
12/20/2004 319.389
12/27/2004 361.442
1/3/2005 369.1764612
1/10/2005 416.6238169
1/17/2005 459.5359423
1/24/2005 365.4009445
1/31/2005 413.3630776
2/7/2005 291.3910135
2/14/2005 305.105
2/21/2005 464.8482752
2/28/2005 363.0336105
3/7/2005 264.7677899
3/14/2005 344.880868
3/21/2005 325.8519595
3/28/2005 321.1775701
4/4/2005 404.5693965
4/11/2005 392.0416371
4/18/2005 430.7946661
4/25/2005 427.1631644
5/2/2005 411.8648374
5/9/2005 386.8547968
5/16/2005 383.4840298
5/23/2005 381.5493873
5/30/2005 315.0086187
6/6/2005 354.5324168
6/13/2005 327.772
6/20/2005 369.0157653
6/27/2005 408.0830566
7/4/2005 434.5275972
7/11/2005 371.5106324
7/18/2005 408.1991382
7/25/2005 405.0429881
8/1/2005 373.8240641
8/8/2005 364.0034462
8/15/2005 369.6471424
8/22/2005 382.0108071
8/29/2005 410.7909099
9/5/2005 330.9051756
9/12/2005 368.7685134
9/19/2005 270.4893379
9/26/2005 404.0606091
10/3/2005 383.8872826
10/10/2005 466.5515718
10/17/2005 486.673
10/24/2005 448.0580021
10/31/2005 373.5319544
11/7/2005 358.4208151
11/14/2005 398.9761027
11/21/2005 318.3299946
11/28/2005 358.0366431
12/5/2005 344.9174087
12/12/2005 386.8313941
12/19/2005 294.1100542
12/26/2005 293.881162
1/2/2006 433.7141952
1/9/2006 476.274226
1/16/2006 475.7067041
1/23/2006 459.1203218
1/30/2006 361.2039406
2/6/2006 363.7221527
2/13/2006 380.1952852
2/20/2006 442.1721436
2/27/2006 357.9469694
3/6/2006 395.7442366
3/13/2006 450.9923943
3/20/2006 367.7855186
3/27/2006 402.778072
4/3/2006 493.4095257
4/10/2006 493.468
4/17/2006 469.1306141
4/24/2006 450.0128534
5/1/2006 442.5117675
5/8/2006 428.8031172
5/15/2006 470.2158386
5/22/2006 446.2431756
5/29/2006 317.8183222
6/5/2006 369.3162037
6/12/2006 410.4558021
6/19/2006 443.1421911
6/26/2006 397.1971946
7/3/2006 481.3922888
7/10/2006 525.2947246
7/17/2006 473.5077361
7/24/2006 517.5520329
7/31/2006 466.9906984
8/7/2006 431.1475016
8/14/2006 399.5471642
8/21/2006 440.8823488
8/28/2006 439.6991779
9/4/2006 362.8644597
9/11/2006 406.762618
9/18/2006 363.0828509
9/25/2006 491.8909378
10/2/2006 527.5336233
10/9/2006 516.9000381
10/16/2006 554.2020878
10/23/2006 650.9110702
10/30/2006 527.429268
11/6/2006 520.5231633
11/13/2006 419.1709031
11/20/2006 441.3769311
11/27/2006 407.7421329
12/4/2006 423.0796675
12/11/2006 541.489909
12/18/2006 395.1153918
12/25/2006 407.3078582
1/1/2007 555.9770864
1/8/2007 484.9516878
1/15/2007 554.6924101
1/22/2007 547.1910996
1/29/2007 498.570364
2/5/2007 532.9759432
2/12/2007 432.4194752
2/19/2007 497.8181418
2/26/2007 407.4818148
3/5/2007 463.2326725
3/12/2007 547.1052888
3/19/2007 499.1447529
3/26/2007 441.1002226
4/2/2007 435.5250358
4/9/2007 510.0561347
4/16/2007 460.6838179
4/23/2007 508.6014031
4/30/2007 514.7918906
5/7/2007 506.1699276
5/14/2007 538.0826675
5/21/2007 497.6096175
5/28/2007 434.4788358
6/4/2007 528.1184467
6/11/2007 432.9866137
6/18/2007 510.1264458
6/25/2007 487.4279266
7/2/2007 495.274668
7/9/2007 508.7542205
7/16/2007 572.8591187
7/23/2007 657.6611519
7/30/2007 594.0857848
8/6/2007 590.5344634
8/13/2007 604.0715949
8/20/2007 533.396821
8/27/2007 498.3182266
9/3/2007 491.3865539
9/10/2007 548.296464
9/17/2007 459.3107549
9/24/2007 543.1050647

That data is weekly usage of a system. I have done what research I
have and done some basic forecasting comparing previous year and doing
forecasts based on that. I am trying to find a more accurate way to
forecast this and my research has brought me to the ARIMA method for
looking at seasonal data.

Pouring through that resources I have I have found Gretl as a
potential tool. I need to generate a forecast up to 24 weeks in
advance. But I am at a loss. Each time I try, to the best of my
ability to process a forecast I am not getting any results that are
realistic due to my lack of statistical knowledge and a poor
understanding of most statistical software (Gretl included.) I keep
coming back to ARIMA(0,1,1)(0,1,1) with a seasonal period of 12 weeks.
I know this to be wrong but without a strong math background (I am a
technical guru, not a statistical guru) and I have hit a brick wall.

Can someone help explain what I need to do, using Gretl or some
similar tool in how to do accurate forecasting based on the above
data. I need to repeat this process weekly.

The activity is roughly quarterly but there is some drift on when a
quarter starts and ends (by up to two weeks either direction) so ARIMA
seemed to be the best method for forecasting.

Help!

== 2 of 3 ==
Date: Tues, Nov 20 2007 5:33 pm
From: dave@autobox.com


On Nov 20, 6:24 pm, Idgarad <idga...@gmail.com> wrote:
> Ok I am not a statistics guru I admit but I have trying to do some
> basic forecasting that would meeting some basic statistical
> requirements. I have the following data:
>
> Date MIPS
> 1/5/2004 306.203
> 1/12/2004 364.29
> 1/19/2004 384.779
> 1/26/2004 387.91
> 2/2/2004 339.041
> 2/9/2004 414.383
> 2/16/2004 313.764
> 2/23/2004 335.001
> 3/1/2004 323.978
> 3/8/2004 312.729
> 3/15/2004 343.589
> 3/22/2004 333.252
> 3/29/2004 376.878
> 4/5/2004 390.825
> 4/12/2004 356.892
> 4/19/2004 383.517
> 4/26/2004 325.227
> 5/3/2004 254.279
> 5/10/2004 255.221
> 5/17/2004 266.575
> 5/24/2004 270.073
> 5/31/2004 293.269
> 6/7/2004 309.114
> 6/14/2004 311.633
> 6/21/2004 350.444
> 6/28/2004 296.203
> 7/5/2004 332.153
> 7/12/2004 306.23
> 7/19/2004 368.466
> 7/26/2004 334.271
> 8/2/2004 349.002
> 8/9/2004 378.682
> 8/16/2004 333.731
> 8/23/2004 380.037
> 8/30/2004 298.417
> 9/6/2004 288.728
> 9/13/2004 342.81
> 9/20/2004 382.866
> 9/27/2004 419.828
> 10/4/2004 379.289
> 10/11/2004 400.749
> 10/18/2004 453.514
> 10/25/2004 388.742
> 11/1/2004 333.935
> 11/8/2004 341.659
> 11/15/2004 281.586
> 11/22/2004 305.749
> 11/29/2004 310.391
> 12/6/2004 317.704
> 12/13/2004 380.804
> 12/20/2004 319.389
> 12/27/2004 361.442
> 1/3/2005 369.1764612
> 1/10/2005 416.6238169
> 1/17/2005 459.5359423
> 1/24/2005 365.4009445
> 1/31/2005 413.3630776
> 2/7/2005 291.3910135
> 2/14/2005 305.105
> 2/21/2005 464.8482752
> 2/28/2005 363.0336105
> 3/7/2005 264.7677899
> 3/14/2005 344.880868
> 3/21/2005 325.8519595
> 3/28/2005 321.1775701
> 4/4/2005 404.5693965
> 4/11/2005 392.0416371
> 4/18/2005 430.7946661
> 4/25/2005 427.1631644
> 5/2/2005 411.8648374
> 5/9/2005 386.8547968
> 5/16/2005 383.4840298
> 5/23/2005 381.5493873
> 5/30/2005 315.0086187
> 6/6/2005 354.5324168
> 6/13/2005 327.772
> 6/20/2005 369.0157653
> 6/27/2005 408.0830566
> 7/4/2005 434.5275972
> 7/11/2005 371.5106324
> 7/18/2005 408.1991382
> 7/25/2005 405.0429881
> 8/1/2005 373.8240641
> 8/8/2005 364.0034462
> 8/15/2005 369.6471424
> 8/22/2005 382.0108071
> 8/29/2005 410.7909099
> 9/5/2005 330.9051756
> 9/12/2005 368.7685134
> 9/19/2005 270.4893379
> 9/26/2005 404.0606091
> 10/3/2005 383.8872826
> 10/10/2005 466.5515718
> 10/17/2005 486.673
> 10/24/2005 448.0580021
> 10/31/2005 373.5319544
> 11/7/2005 358.4208151
> 11/14/2005 398.9761027
> 11/21/2005 318.3299946
> 11/28/2005 358.0366431
> 12/5/2005 344.9174087
> 12/12/2005 386.8313941
> 12/19/2005 294.1100542
> 12/26/2005 293.881162
> 1/2/2006 433.7141952
> 1/9/2006 476.274226
> 1/16/2006 475.7067041
> 1/23/2006 459.1203218
> 1/30/2006 361.2039406
> 2/6/2006 363.7221527
> 2/13/2006 380.1952852
> 2/20/2006 442.1721436
> 2/27/2006 357.9469694
> 3/6/2006 395.7442366
> 3/13/2006 450.9923943
> 3/20/2006 367.7855186
> 3/27/2006 402.778072
> 4/3/2006 493.4095257
> 4/10/2006 493.468
> 4/17/2006 469.1306141
> 4/24/2006 450.0128534
> 5/1/2006 442.5117675
> 5/8/2006 428.8031172
> 5/15/2006 470.2158386
> 5/22/2006 446.2431756
> 5/29/2006 317.8183222
> 6/5/2006 369.3162037
> 6/12/2006 410.4558021
> 6/19/2006 443.1421911
> 6/26/2006 397.1971946
> 7/3/2006 481.3922888
> 7/10/2006 525.2947246
> 7/17/2006 473.5077361
> 7/24/2006 517.5520329
> 7/31/2006 466.9906984
> 8/7/2006 431.1475016
> 8/14/2006 399.5471642
> 8/21/2006 440.8823488
> 8/28/2006 439.6991779
> 9/4/2006 362.8644597
> 9/11/2006 406.762618
> 9/18/2006 363.0828509
> 9/25/2006 491.8909378
> 10/2/2006 527.5336233
> 10/9/2006 516.9000381
> 10/16/2006 554.2020878
> 10/23/2006 650.9110702
> 10/30/2006 527.429268
> 11/6/2006 520.5231633
> 11/13/2006 419.1709031
> 11/20/2006 441.3769311
> 11/27/2006 407.7421329
> 12/4/2006 423.0796675
> 12/11/2006 541.489909
> 12/18/2006 395.1153918
> 12/25/2006 407.3078582
> 1/1/2007 555.9770864
> 1/8/2007 484.9516878
> 1/15/2007 554.6924101
> 1/22/2007 547.1910996
> 1/29/2007 498.570364
> 2/5/2007 532.9759432
> 2/12/2007 432.4194752
> 2/19/2007 497.8181418
> 2/26/2007 407.4818148
> 3/5/2007 463.2326725
> 3/12/2007 547.1052888
> 3/19/2007 499.1447529
> 3/26/2007 441.1002226
> 4/2/2007 435.5250358
> 4/9/2007 510.0561347
> 4/16/2007 460.6838179
> 4/23/2007 508.6014031
> 4/30/2007 514.7918906
> 5/7/2007 506.1699276
> 5/14/2007 538.0826675
> 5/21/2007 497.6096175
> 5/28/2007 434.4788358
> 6/4/2007 528.1184467
> 6/11/2007 432.9866137
> 6/18/2007 510.1264458
> 6/25/2007 487.4279266
> 7/2/2007 495.274668
> 7/9/2007 508.7542205
> 7/16/2007 572.8591187
> 7/23/2007 657.6611519
> 7/30/2007 594.0857848
> 8/6/2007 590.5344634
> 8/13/2007 604.0715949
> 8/20/2007 533.396821
> 8/27/2007 498.3182266
> 9/3/2007 491.3865539
> 9/10/2007 548.296464
> 9/17/2007 459.3107549
> 9/24/2007 543.1050647
>
> That data is weekly usage of a system. I have done what research I
> have and done some basic forecasting comparing previous year and doing
> forecasts based on that. I am trying to find a more accurate way to
> forecast this and my research has brought me to the ARIMA method for
> looking at seasonal data.
>
> Pouring through that resources I have I have found Gretl as a
> potential tool. I need to generate a forecast up to 24 weeks in
> advance. But I am at a loss. Each time I try, to the best of my
> ability to process a forecast I am not getting any results that are
> realistic due to my lack of statistical knowledge and a poor
> understanding of most statistical software (Gretl included.) I keep
> coming back to ARIMA(0,1,1)(0,1,1) with a seasonal period of 12 weeks.
> I know this to be wrong but without a strong math background (I am a
> technical guru, not a statistical guru) and I have hit a brick wall.
>
> Can someone help explain what I need to do, using Gretl or some
> similar tool in how to do accurate forecasting based on the above
> data. I need to repeat this process weekly.
>
> The activity is roughly quarterly but there is some drift on when a
> quarter starts and ends (by up to two weeks either direction) so ARIMA
> seemed to be the best method for forecasting.
>
> Help!


Idgarad,

Please review http://www.autobox.com/idgarad and find some output from
AUTOBOX.

http://www.autobox.com/idgarad/accff.jpg

You will find in this case

1. There are significant level shifts at time point 65 and 114 ...both
to the upside ...NO TREND HERE ...just two level shifts.
2. There are a number of anomalous observations which need to be
accomodated so that they don't mask the model.
3. A number of Holidays are important.
4. There is a strong week of the year effect.
5. the ARIMA MODEL is simply a (1,1)

[(1- .746B** 1)]**-1 [(1- .276B** 1)]

At this juncture you can simply buy AUTOBOX or some similar commercial
program or simply program the following

a. Detect simultaneously the presence of

Pulses, Level Shifts, Seasonal Pulses , Local Time Trends

The point(s)in time where the parameters of the model may have
changed suggesting too much data

The form of the SARIMA MODEL

Any needed transformations to homogeneize the variance of the
errors

What Holiday indicators are important and what the temporal
response is to each ( viz. contemporaneous , lag , lead )

What weeks of the year are important.

Pursue all of these until the plot of your residuals looks like

http://www.autobox.com/idgarad/res.jpg which suggests that the signal
has been removed from the data

http://www.autobox.com/idgarad/actfore.jpg

The R-Squared for the final model was 86.5%

T

There are a number of success stories on our web site regarding daily
and weekly predictive models.

If I can help please give me a call.

Dave Reilly

== 3 of 3 ==
Date: Tues, Nov 20 2007 5:53 pm
From: dave@autobox.com


On Nov 20, 6:24 pm, Idgarad <idga...@gmail.com> wrote:
> Ok I am not a statistics guru I admit but I have trying to do some
> basic forecasting that would meeting some basic statistical
> requirements. I have the following data:
>
> Date MIPS
> 1/5/2004 306.203
> 1/12/2004 364.29
> 1/19/2004 384.779
> 1/26/2004 387.91
> 2/2/2004 339.041
> 2/9/2004 414.383
> 2/16/2004 313.764
> 2/23/2004 335.001
> 3/1/2004 323.978
> 3/8/2004 312.729
> 3/15/2004 343.589
> 3/22/2004 333.252
> 3/29/2004 376.878
> 4/5/2004 390.825
> 4/12/2004 356.892
> 4/19/2004 383.517
> 4/26/2004 325.227
> 5/3/2004 254.279
> 5/10/2004 255.221
> 5/17/2004 266.575
> 5/24/2004 270.073
> 5/31/2004 293.269
> 6/7/2004 309.114
> 6/14/2004 311.633
> 6/21/2004 350.444
> 6/28/2004 296.203
> 7/5/2004 332.153
> 7/12/2004 306.23
> 7/19/2004 368.466
> 7/26/2004 334.271
> 8/2/2004 349.002
> 8/9/2004 378.682
> 8/16/2004 333.731
> 8/23/2004 380.037
> 8/30/2004 298.417
> 9/6/2004 288.728
> 9/13/2004 342.81
> 9/20/2004 382.866
> 9/27/2004 419.828
> 10/4/2004 379.289
> 10/11/2004 400.749
> 10/18/2004 453.514
> 10/25/2004 388.742
> 11/1/2004 333.935
> 11/8/2004 341.659
> 11/15/2004 281.586
> 11/22/2004 305.749
> 11/29/2004 310.391
> 12/6/2004 317.704
> 12/13/2004 380.804
> 12/20/2004 319.389
> 12/27/2004 361.442
> 1/3/2005 369.1764612
> 1/10/2005 416.6238169
> 1/17/2005 459.5359423
> 1/24/2005 365.4009445
> 1/31/2005 413.3630776
> 2/7/2005 291.3910135
> 2/14/2005 305.105
> 2/21/2005 464.8482752
> 2/28/2005 363.0336105
> 3/7/2005 264.7677899
> 3/14/2005 344.880868
> 3/21/2005 325.8519595
> 3/28/2005 321.1775701
> 4/4/2005 404.5693965
> 4/11/2005 392.0416371
> 4/18/2005 430.7946661
> 4/25/2005 427.1631644
> 5/2/2005 411.8648374
> 5/9/2005 386.8547968
> 5/16/2005 383.4840298
> 5/23/2005 381.5493873
> 5/30/2005 315.0086187
> 6/6/2005 354.5324168
> 6/13/2005 327.772
> 6/20/2005 369.0157653
> 6/27/2005 408.0830566
> 7/4/2005 434.5275972
> 7/11/2005 371.5106324
> 7/18/2005 408.1991382
> 7/25/2005 405.0429881
> 8/1/2005 373.8240641
> 8/8/2005 364.0034462
> 8/15/2005 369.6471424
> 8/22/2005 382.0108071
> 8/29/2005 410.7909099
> 9/5/2005 330.9051756
> 9/12/2005 368.7685134
> 9/19/2005 270.4893379
> 9/26/2005 404.0606091
> 10/3/2005 383.8872826
> 10/10/2005 466.5515718
> 10/17/2005 486.673
> 10/24/2005 448.0580021
> 10/31/2005 373.5319544
> 11/7/2005 358.4208151
> 11/14/2005 398.9761027
> 11/21/2005 318.3299946
> 11/28/2005 358.0366431
> 12/5/2005 344.9174087
> 12/12/2005 386.8313941
> 12/19/2005 294.1100542
> 12/26/2005 293.881162
> 1/2/2006 433.7141952
> 1/9/2006 476.274226
> 1/16/2006 475.7067041
> 1/23/2006 459.1203218
> 1/30/2006 361.2039406
> 2/6/2006 363.7221527
> 2/13/2006 380.1952852
> 2/20/2006 442.1721436
> 2/27/2006 357.9469694
> 3/6/2006 395.7442366
> 3/13/2006 450.9923943
> 3/20/2006 367.7855186
> 3/27/2006 402.778072
> 4/3/2006 493.4095257
> 4/10/2006 493.468
> 4/17/2006 469.1306141
> 4/24/2006 450.0128534
> 5/1/2006 442.5117675
> 5/8/2006 428.8031172
> 5/15/2006 470.2158386
> 5/22/2006 446.2431756
> 5/29/2006 317.8183222
> 6/5/2006 369.3162037
> 6/12/2006 410.4558021
> 6/19/2006 443.1421911
> 6/26/2006 397.1971946
> 7/3/2006 481.3922888
> 7/10/2006 525.2947246
> 7/17/2006 473.5077361
> 7/24/2006 517.5520329
> 7/31/2006 466.9906984
> 8/7/2006 431.1475016
> 8/14/2006 399.5471642
> 8/21/2006 440.8823488
> 8/28/2006 439.6991779
> 9/4/2006 362.8644597
> 9/11/2006 406.762618
> 9/18/2006 363.0828509
> 9/25/2006 491.8909378
> 10/2/2006 527.5336233
> 10/9/2006 516.9000381
> 10/16/2006 554.2020878
> 10/23/2006 650.9110702
> 10/30/2006 527.429268
> 11/6/2006 520.5231633
> 11/13/2006 419.1709031
> 11/20/2006 441.3769311
> 11/27/2006 407.7421329
> 12/4/2006 423.0796675
> 12/11/2006 541.489909
> 12/18/2006 395.1153918
> 12/25/2006 407.3078582
> 1/1/2007 555.9770864
> 1/8/2007 484.9516878
> 1/15/2007 554.6924101
> 1/22/2007 547.1910996
> 1/29/2007 498.570364
> 2/5/2007 532.9759432
> 2/12/2007 432.4194752
> 2/19/2007 497.8181418
> 2/26/2007 407.4818148
> 3/5/2007 463.2326725
> 3/12/2007 547.1052888
> 3/19/2007 499.1447529
> 3/26/2007 441.1002226
> 4/2/2007 435.5250358
> 4/9/2007 510.0561347
> 4/16/2007 460.6838179
> 4/23/2007 508.6014031
> 4/30/2007 514.7918906
> 5/7/2007 506.1699276
> 5/14/2007 538.0826675
> 5/21/2007 497.6096175
> 5/28/2007 434.4788358
> 6/4/2007 528.1184467
> 6/11/2007 432.9866137
> 6/18/2007 510.1264458
> 6/25/2007 487.4279266
> 7/2/2007 495.274668
> 7/9/2007 508.7542205
> 7/16/2007 572.8591187
> 7/23/2007 657.6611519
> 7/30/2007 594.0857848
> 8/6/2007 590.5344634
> 8/13/2007 604.0715949
> 8/20/2007 533.396821
> 8/27/2007 498.3182266
> 9/3/2007 491.3865539
> 9/10/2007 548.296464
> 9/17/2007 459.3107549
> 9/24/2007 543.1050647
>
> That data is weekly usage of a system. I have done what research I
> have and done some basic forecasting comparing previous year and doing
> forecasts based on that. I am trying to find a more accurate way to
> forecast this and my research has brought me to the ARIMA method for
> looking at seasonal data.
>
> Pouring through that resources I have I have found Gretl as a
> potential tool. I need to generate a forecast up to 24 weeks in
> advance. But I am at a loss. Each time I try, to the best of my
> ability to process a forecast I am not getting any results that are
> realistic due to my lack of statistical knowledge and a poor
> understanding of most statistical software (Gretl included.) I keep
> coming back to ARIMA(0,1,1)(0,1,1) with a seasonal period of 12 weeks.
> I know this to be wrong but without a strong math background (I am a
> technical guru, not a statistical guru) and I have hit a brick wall.
>
> Can someone help explain what I need to do, using Gretl or some
> similar tool in how to do accurate forecasting based on the above
> data. I need to repeat this process weekly.
>
> The activity is roughly quarterly but there is some drift on when a
> quarter starts and ends (by up to two weeks either direction) so ARIMA
> seemed to be the best method for forecasting.
>
> Help!


idgarad,

Please review http://www.autobox.com/idgarad

and note that a reasonable weekly model yielding an r_squared of 86%
can be accomplished by programming
a procedure to detect level shifts and local time trends
the importance of a number of possible holidays
tests for constancy of parameters over time
tests for homogeneity of variance of the errors

http://www.autobox.com/idgarad/ab50pro.123
http://www.autobox.com/idgarad/accff.jpg
http://www.autobox.com/idgarad/actfore.jpg
http://www.autobox.com/idgarad/actres.jpg
http://www.autobox.com/idgarad/res.jpg
http://www.autobox.com/idgarad/fore.jpg
http://www.autobox.com/idgarad/stat.htm
http://www.autobox.com/idgarad/model.bmp
http://www.autobox.com/idgarad/verbal.txt

You can try it out by downloading the FREEWARE VERSION of AUTOBOX
called FREEFORE

http://www.autobox.com/freef.exe

Just form your data like http://www.autobox.com/idgarad/idgard.asc

and you should be able to run the free software each week ...develop a
model automatically ...and even get your 1 week ahead forecast all
without charge.

Hope this helps

Dave Reilly
Automatic Forecasting Systems
http://www.autobox.com
215-675-0652



==============================================================================
TOPIC: Statistical Methods for Ranks?
http://groups.google.com/group/sci.stat.math/browse_thread/thread/d02d4f899aae90fe?hl=en
==============================================================================

== 1 of 1 ==
Date: Tues, Nov 20 2007 9:25 pm
From: mprocopio@gmail.com


I have made some progress; I am able to apply Spearman's Rank
Correlation Test (using the SPSS implementation) to determine pairwise
"closeness" of, say, two different metrics. This is a hypothesis test
at a given confidence level(with associated critical value), and I can
determine independence from this manner. If not independent, I think
it's fair to say that the metrics are "measuring the same thing".

Here's one thought. Is it principled to take the average over ALL
frames in ALL datasets:

Dataset DS_ALL_AVG

Metric M1 M2 M3
Alg

Alg. A x x x
Alg. B x x x
Alg. C x x x
Alg. D x x x
Alg. E x x x


Importantly, I am not obfuscating the result by combining metrics, but
I still get an overall answer and ranking.

So one final question is: Consider that the rankings of the algorithms
M1, M2, and M3 are SIMILAR but NOT IDENTICAL. The Spearman test may or
may not give a statistical basis to say that they're independent. Even
if they are NOT independent--how do you obtain a final ranking of the
algorithms?

The other tricky part is applying the rank-oriented tests when your
values are means with confidence intervals, and the confidence
intervals overlap.

==============================================================================

You received this message because you are subscribed to the Google Groups "sci.stat.math"
group.

To post to this group, visit http://groups.google.com/group/sci.stat.math?hl=en

To unsubscribe from this group, send email to sci.stat.math-unsubscribe@googlegroups.com

To change the way you get mail from this group, visit:
http://groups.google.com/group/sci.stat.math/subscribe?hl=en

To report abuse, send email explaining the problem to abuse@googlegroups.com

==============================================================================
Google Groups: http://groups.google.com?hl=en

Tidak ada komentar: