I'm dealing with call center data with several overlapping cyclical trends, known movable events and general trends. My forecasting is dependent on the previous years among other factors, i have this annoying skew in the data because the 2010 and 2009 weeks don't align.

I noticed in my local bussiness paper (Dagens Næringsliv) showing the same problem comparing power-price commodities from last year to this year, the graph had an apparent volatility to it in comparing week by week, but shifting one of the datasets showed that the two years had a positive correlation. As in this example from nve.no showing water reserves, where I think the red 2010 data should be closer to the blue (2009) and the black median due to a ISO week artifact skewing the data. Water reserves source:nve.no

Is there some best practices in comparing two years?

I would have liked to have a day-by-day alignment so that I could show the previous year, then my forecast, then the actual result after the fact.

(Also, R seems to fight me all the way on this one wich is often an indicator that I really should do some reading).

Update 17.07

Discovered some interessting tools on the US Cencus Bureau, which might get me closer to a complete answer.

asked Jul 07 '10 at 17:15

Tov%20Are%20Jacobsen's gravatar image

Tov Are Jacobsen

edited Jul 17 '10 at 16:51

I take it that shifting one of the series by a fixed amount, say 10 days, is not enough?

(Jul 09 '10 at 07:56) Amaç Herdağdelen

Yes, Shifting days is just one of the dubious things i do to my own dataset to do my predictions :-)

(Jul 12 '10 at 14:39) Tov Are Jacobsen

2 Answers:

I will take a stab at this. Go by Fiscal Week number? I'm sure you've already explored this, and it is ultimately, like a lot of date math, a matter of opinion, or some arbitrary demarcation, like say the date for the Easter holiday. My suggestion would be to pick a fix point in the past as Day 0, and just number each day consecutively since that date, and then just compare say days 200-300 to days 600-700 (obviously not correct, but perhaps by factoring out the arbitrary nature of weeks, months, and years...) This is how a lot of dates are calculated in systems known as using the Epoch notation, and is famously January 1st, 1980, or I think Lotus 123 had an even weirder date. Also, since about every answer I have comes back to Python somehow, the Python "datetime" module is the best date math paradigm I've seen. That module itself is worth it alone to at least try reformatting your data with Python and then finishing with R. Hope this helps!

answered Jul 08 '10 at 16:44

th0ma5's gravatar image



Kudos for taking a stab at this. Using demarcations is a good idea and something I've sort of been using to do estimates because one of my monthly cycles is payday where I isolate days before and after payday using the weekday mean as a filter to guess how much of it is payday-related (day of week is one of my main cycles). I which there was some canonical howto list.

(Jul 08 '10 at 17:18) Tov Are Jacobsen

You could try an autocorrelation plot (in R, see ?plot.acf) and see if there is a spike after 52 weeks, or if it has moved slightly. That way you can tell if the records are out of sync for each year.

answered Jul 13 '10 at 08:19

Richie%20Cotton's gravatar image

Richie Cotton

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.