Popular Posts

Sunday, July 27, 2014

Writing Computer Code

There are two aims for this chapter: learning how to write computer code and learning a computer language to write code in.
First, we need to learn how to write computer code. Several of the computer technologies that we encounter will involve writing computer code in a particular computer language, so it is essential that we learn from the start how to produce computer code in the right way.
Learning how to write code could be very dull if we only discussed code writing in abstract concepts, so the second aim of this chapter is to learn a computer language, with which to demonstrate good code writing.
The language that we will learn in this chapter is the Hypertext Markup Language, HTML.
As most people know, HTML is the language that describes web pages on the world wide web. In the world wide web of today, web pages are used by almost everyone for almost any purpose--sharing videos, online sales, banking, advertising, the list goes on and on--and web pages consist of much more than just plain HTML.
However, the original motivation for HTML was a simple, intuitive platform that would allow researchers from all over the world to publish and share their ideas, data, and results.
The primary inventor of HTML was Tim Berners-Lee, the father of several fundamental technologies underlying the world wide web. His original work was driven by a number of important requirements: it should be simple to create web pages; web pages should work on any computer hardware and software; and web pages should be able to link to each other, so that information can be related and cross-referenced, but remain distributed around the world. It was also an early intention to provide a technology that could be processed and understood by computers as well as viewed by human eyes.
This history and these requirements make HTML an appropriate starting point for learning about computer languages for the management of research data.
HTML is simple, so it is a nice easy way to start writing computer code. HTML began as an open technology and remains an open standard, which is precisely the sort of technology we are most interested in. We require only a web browser to view web pages and these are widely available on any modern desktop computer. We will also, later in this chapter, observe a resonance with the idea that HTML documents simply point to each other rather than require a copy of every piece of information on every computer. In Chapter 9, we will return to the importance of being able to automatically process HTML.
Finally, although HTML is now used for all sorts of commercial and private purposes, it still remains an important technology for publishing and sharing research output.
The aim of this chapter is to elucidate the process of writing, checking, and running computer code and to provide some guidelines for carrying out these tasks effectively and efficiently.
How this chapter is organized
This chapter begins with an example of a simple web page and gives a quick introduction to how HTML computer code relates to the web pages that we see in a web browser such as Firefox. The main point is to demonstrate that computer code can be used to control what the computer does.
In Section 2.2, we will emphasize the idea that computer languages have strict rules that must be followed. Computer code must be exactly correct before it can be used to control the computer. The specific rules for HTML code are introduced in this section.
Section 2.3 addresses the issue of what computer code means. What instructions do we have to use in order to make the computer do what we want? This section looks in a bit more detail at some examples of HTML code and shows how each piece of code relates to a specific feature of the resulting web page.
Section 2.4 looks at the practical aspects of writing computer code. We will discuss the software that should be used for writing code and emphasize the importance of writing tidy and well-organized code.
Sections 2.5 and 2.6 look at getting our computer code to work properly. Sections 2.5 focuses on the task of checking that computer code is correct--that it follows the rules of the computer language--and Section 2.6 focuses on how to run computer code. We will look at some tools for checking that HTML code is correct and we will briefly discuss the software that produces web pages from HTML code. We will also discuss the process of fixing problems when they occur.
Section 2.7 introduces some more ideas about how to organize code and work efficiently. The Cascading Style Sheets language is introduced to provide some simple demonstrations.


Subsections Paul Murrell

Wednesday, July 9, 2014

time series

Time Series Analysis: The Basics
 


WHAT IS A TIME SERIES?

A time series is a collection of observations of well-defined data items obtained through repeated measurements over time. For example, measuring the value of retail sales each month of the year would comprise a time series. This is because sales revenue is well defined, and consistently measured at equally spaced intervals. Data collected irregularly or only once are not time series.

An observed time series can be decomposed into three components: the trend (long term direction), the seasonal (systematic, calendar related movements) and the irregular (unsystematic, short term fluctuations).


WHAT ARE STOCK AND FLOW SERIES?

Time series can be classified into two different types: stock and flow.

A stock series is a measure of certain attributes at a point in time and can be thought of as “stocktakes”. For example, the Monthly Labour Force Survey is a stock measure because it takes stock of whether a person was employed in the reference week.

Flow series are series which are a measure of activity over a given period. For example, surveys of Retail Trade activity. Manufacturing is also a flow measure because a certain amount is produced each day, and then these amounts are summed to give a total value for production for a given reporting period.

The main difference between a stock and a flow series is that flow series can contain effects related to the calendar (trading day effects). Both types of series can still be seasonally adjusted using the same seasonal adjustment process.


WHAT ARE SEASONAL EFFECTS?

A seasonal effect is a systematic and calendar related effect. Some examples include the sharp escalation in most Retail series which occurs around December in response to the Christmas period, or an increase in water consumption in summer due to warmer weather. Other seasonal effects include trading day effects (the number of working or trading days in a given month differs from year to year which will impact upon the level of activity in that month) and moving holidays (the timing of holidays such as Easter varies, so the effects of the holiday will be experienced in different periods each year).

WHAT IS SEASONAL ADJUSTMENT AND WHY DO WE NEED IT?

Seasonal adjustment is the process of estimating and then removing from a time series influences that are systematic and calendar related. Observed data needs to be seasonally adjusted as seasonal effects can conceal both the true underlying movement in the series, as well as certain non-seasonal characteristics which may be of interest to analysts.


WHY CAN'T WE JUST COMPARE ORIGINAL DATA FROM THE SAME PERIOD IN EACH YEAR?

A comparison of original data from the same period in each year does not completely remove all seasonal effects. Certain holidays such as Easter and Chinese New Year fall in different periods in each year, hence they will distort observations. Also, year to year values will be biased by any changes in seasonal patterns that occur over time. For example, consider a comparison between two consecutive March months i.e. compare the level of the original series observed in March for 2000 and 2001. This comparison ignores the moving holiday effect of Easter. Easter occurs in April for most years but if Easter falls in March, the level of activity can vary greatly for that month for some series. This distorts the original estimates. A comparison of these two months will not reflect the underlying pattern of the data. The comparison also ignores trading day effects. If the two consecutive months of March have different composition of trading days, it might reflect different levels of activity in original terms even though the underlying level of activity is unchanged. In a similar way, any changes to seasonal patterns might also be ignored. The original estimates also contains the influence of the irregular component. If the magnitude of the irregular component of a series is strong compared with the magnitude of the trend component, the underlying direction of the series can be distorted.

However, the major disadvantage of comparing year to year original data, is lack of precision and time delays in the identification of turning points in a series. Turning points occur when the direction of underlying level of the series changes, for example when a consistently decreasing series begins to rise steadily. If we compare year apart data in the original series, we may miss turning points occurring during the year. For example, if March 2001 has a higher original estimate than March 2000, by comparing these year apart values, we might conclude that the level of activity has increased during the year. However, the series might have increased up to September 2000 and then started to decrease steadily.

WHEN IS SEASONAL ADJUSTMENT INAPPROPRIATE?

When a time series is dominated by the trend or irregular components, it is nearly impossible to identify and remove what little seasonality is present. Hence seasonally adjusting a non-seasonal series is impractical and will often introduce an artificial seasonal element.

WHAT IS SEASONALITY?

The seasonal component consists of effects that are reasonably stable with respect to timing, direction and magnitude. It arises from systematic, calendar related influences such as:
        • Natural Conditions
            weather fluctuations that are representative of the season
            (uncharacteristic weather patterns such as snow in summer would be considered irregular influences)
        • Business and Administrative procedures
            start and end of the school term
        • Social and Cultural behaviour
            Christmas

It also includes calendar related systematic effects that are not stable in their annual timing or are caused by variations in the calendar from year to year, such as:
        • Trading Day Effects
            the number of occurrences of each of the day of the week in a given month will differ from year to year
            - There were 4 weekends in March in 2000, but 5 weekends in March of 2002
        • Moving Holiday Effects
            holidays which occur each year, but whose exact timing shifts
            - Easter, Chinese New Year

HOW DO WE IDENTIFY SEASONALITY?

Seasonality in a time series can be identified by regularly spaced peaks and troughs which have a consistent direction and approximately the same magnitude every year, relative to the trend. The following diagram depicts a strongly seasonal series. There is an obvious large seasonal increase in December retail sales in New South Wales due to Christmas shopping. In this example, the magnitude of the seasonal component increases over time, as does the trend.
Figure 1: Monthly Retail Sales in New South Wales (NSW) Retail Department Stores Graph - Monthly Retail Sales in New South Wales (NSW) Retail Department Stores

WHAT IS AN IRREGULAR?

The irregular component (sometimes also known as the residual) is what remains after the seasonal and trend components of a time series have been estimated and removed. It results from short term fluctuations in the series which are neither systematic nor predictable. In a highly irregular series, these fluctuations can dominate movements, which will mask the trend and seasonality. The following graph is of a highly irregular time series:
Figure 2: Monthly Value of Building Approvals, Australian Capital Territory (ACT) Graph - Monthly Value of Building Approvals, Australian Capital Territory (ACT)
WHAT IS THE TREND?

The ABS trend is defined as the 'long term' movement in a time series without calendar related and irregular effects, and is a reflection of the underlying level. It is the result of influences such as population growth, price inflation and general economic changes. The following graph depicts a series in which there is an obvious upward trend over time:
Figure 3: Quarterly Gross Domestic Product
Graph - Quarterly Gross Domestic Product


WHAT ARE THE UNDERLYING MODELS USED TO DECOMPOSE THE OBSERVED TIME SERIES?

Decomposition models are typically additive or multiplicative, but can also take other forms such as pseudo-additive.

Additive Decomposition

In some time series, the amplitude of both the seasonal and irregular variations do not change as the level of the trend rises or falls. In such cases, an additive model is appropriate.

In the additive model, the observed time series (Ot) is considered to be the sum of three independent components: the seasonal St, the trend Tt and the irregular
It.


That is



Each of the three components has the same units as the original series. The seasonally adjusted series is obtained by estimating and removing the seasonal effects from the original time series. The estimated seasonal component is denoted by The seasonally adjusted estimates can be expressed by:



In symbols,

The following figure depicts a typically additive series. The underlying level of the series fluctuates but the magnitude of the seasonal spikes remains approximately stable.
Figure 4: General Government and Other Current Transfers to Other Sectors
Graph - General Government and Other Current Transfers to Other Sectors

Multiplicative Decomposition

In many time series, the amplitude of both the seasonal and irregular variations increase as the level of the trend rises.In this situation, a multiplicative model is usually appropriate.

In the multiplicative model, the original time series is expressed as the product of trend, seasonal and irregular components.


or



The seasonally adjusted data then becomes:


or

Under this model, the trend has the same units as the original series, but the seasonal and irregular components are unitless factors, distributed around 1.

Most of the series analysed by the ABS show characteristics of a multiplicative model. As the underlying level of the series changes, the magnitude of the seasonal fluctuations varies as well.
Figure 5: Monthly NSW ANZ Job Advertisements Graph - Monthly NSW ANZ Job Advertisements
Pseudo-Additive Decomposition

The multiplicative model cannot be used when the original time series contains very small or zero values. This is because it is not possible to divide a number by zero. In these cases, a pseudo additive model combining the elements of both the additive and multiplicative models is used. This model assumes that seasonal and irregular variations are both dependent on the level of the trend but independent of each other.

The original data can be expressed in the following form:


The pseudo-additive model continues the convention of the multiplicative model to have both the seasonal factor St and the irregular factor It centred around one. Therefore we need to subtract one from St and It to ensure that the terms Tt x (St - 1) and Tt x (It - 1) are centred around zero. These terms can be interpreted as the additive seasonal and additive irregular components respectively and because they are centred around zero the original data Ot will be centred around the trend values Tt .

The seasonally adjusted estimate is defined to be:


where and are the trend and seasonal component estimates. In the pseudo-additve model, the trend has the same units as the original series, but the seasonal and irregular components are unitless factors, distributed around 1.

An example of series that requires a pseudo-additive decomposition model is shown below. This model is used as cereal crops are only produced during certain months, with crop production being virtually zero for one quarter each year.
Figure 6: Quarterly Gross Value for the Production of Cereal Crops
Graph - Quarterly Gross Value for the Production of Cereal Crops


Example: Shiskin Decomposition

The Shiskin decomposition gives graphs of the original series, seasonally adjusted series, trend series, residual (irregular) factors and the between month (seasonal) and within month (trading day) factors that are combined to form the combined adjustment factors. The residual (irregular) factors are found by dividing the seasonally adjusted series by the trend series. Figure 7 shows a Shiskin decomposition for the Australian Retail series.

Figure 7: Shiskin decomposition for Australian Total Retail Turnover, May 1990 to May 2000
Graph - Shiskin decomposition for Australian Total Retail Turnover, May 1990 to May 2000


HOW DO I KNOW WHICH DECOMPOSITION MODEL TO USE?

To choose an appropriate decomposition model, the time series analyst will examine a graph of the original series and try a range of models, selecting the one which yields the most stable seasonal component. If the magnitude of the seasonal component is relatively constant regardless of changes in the trend, an additive model is suitable. If it varies with changes in the trend, a multiplicative model is the most likely candidate. However if the series contains values close or equal to zero, and the magnitude of seasonal component appears to be dependent upon the trend level, then pseudo-additive model is most appropriate.

WHAT IS A SEASONAL AND IRREGULAR (SI) CHART?

Once the trend component is estimated, it can be removed from the original data, leaving behind the combined seasonal and irregular components or SIs. A seasonal and irregular or SI chart graphically presents the SI's for particular months or quarters in the series span.

The following graph is an SI chart for a monthly series, using a multiplicative decomposition model.
Figure 8: Seasonal and Irregular (SI) Chart - Value of Building Approvals, ACT
Graph - Seasonal/Irregular (S-I) Chart - Value of Building Approvals, ACT

The points represent the SIs obtained from the time series, while the solid line shows the seasonal component. The seasonal component is calculated by smoothing the SI's, to remove irregular influences.

SI charts are useful in determining whether short-term movements are caused by seasonal or irregular influences. In the graph above, the SIs can be seen to fluctuate erratically, which indicates the time series under analysis is dominated by its irregular component.

SI charts are also used to identify seasonal breaks , moving holiday patterns and extreme values in a time series.

Bookmark and Share. Opens in a new window