Skip to content

Estimation Examples

Mats Maggi edited this page Mar 28, 2017 · 9 revisions

Are we sure we have a global maximum?

In the section Estimation we have discussed an atypical example where the estimation results of a model could depend on whether we use the EM algorithm, which in our opinion should be used only to obtain starting values, or numerical optimization. We can actually conclude, based on the vintage of data available at the middle of November of 1999, that the second alternative results on the highest likelihood for the model.

LikelihoodEvolution

In both cases we are dealing with exactly the same model specification, but it can be interpreted as if there were two different models. Before tackling the subject of Real-Time Simulations which can be used to assess which of the alternatives yields better forecasts, we can already have a general overview of the two alternative maximum-likelihood results. The following discussion also serves the purpose of illustrating the possibility a global maximum may be difficult to find when the models are largely parameterized and therefore users are encouraged to check whether alternative starting values and estimation options help to improve the likelihood.

Which method provides us with a better in-sample fit for GDP growth?

All the results available in the output tab can be compared across models using the graphical tools available in JDemetra+ (e.g. Tools>Container>Chart). Note that the data underlying all the graphs can also be copied and pasted to Excel, so users have some flexibility to analyse the results and perform exercises such as the one described here. We first open the two models in different tabs, and then we compare actual GDP growth with the signal extracted using the the two alternative estimation methods. As shown in the video, OUTPUT tab contains these results inside the Estimation branch/FIT/ "Signals vs Data". The "signal" data or the fitted values displayed in the graph are copied onto the chart and are renamed:

CompareFitGDP

As mentioned earlier, Actual GDP growth is plotted at the middle of the quarter, so we have the wrong impression that it leads the signal, which is plotted for every month, and represents a weighted average of the factors. Still, it is very clear from the picture that the numerical procedure results on a signal that accounts for high frequency fluctuations of GDP growth, while the signal obtained with numerical optimization is smoother. The standard deviations of the difference between the actual data and the signal (i.e. the residual) are equal to 0.23 and 0.12, respectively. Those residuals can be further analysed by clicking on the branch "residuals" inside FIT.

What are the essential differences between both models?

It is left as an exercise to look at "Signal vs Data" in FIT to verify that in this example the model based on the EM algorithm obtains factors that account for a large proportion of the variance of GDP growth and survey data, while the model based on numerical optimization tuns out to yields a better fit for oil prices at the cost of accounting for a smaller fraction of GDP growth, as highlighted above.

As more and more data enters the model, both estimation procedures turn out to yield exactly the same parameter estimates, so both models turn out to be equivalent. However, that information was not revealed to us in 1999 with the hypothetical information set that we are using to estimate this model. In general, one should be aware of the following issues:

  • We know that a good in-sample fit for GDP growth does not guarantee that the model will work out-of-sample.
  • Two alternative parameter estimates may be translated into alternative correlation patterns in the data, which may imply two different ways to read the economy.

Conditional forecasts

Once the model has been estimated, we can use it to forecast all the variables included in the measurement equation. The two alternative parameter estimates corresponding to the same model in our example above define two alternative ways to represent the data. The exercise described below serves as an introduction to the two topics that will be discussed later on: Reading News and Real-Time Simulations

This problem of nowcasting in the presence of timely information or leading indicators is from a computational point of view identical to forecasting conditional on a certain information set containing the future evolution of certain variables. The example described below is simple, but it can inspire more interesting exercises beyond the scope of nowcasting. At this stage, there are only two concepts that need to be defined:

  1. Unconditional forecast: E[GDP(t+h) | Info available in October 1999 ]
  2. Conditional forecast: E[GDP(t+h) | Info available in October 1999 + Surveys and Financial data until T ].

The concept of conditional forecast can be understood as follows. Suppose that GDP and all hard data stops being published from 1999 onwards and we have to obtain a simple estimate of growth for the euro area on the basis of Surveys and financial data. Can we count on both models to extract the growth signal during the Great recession? In practice, we need to calculate the expected growth rate conditional on the surveys and financial variables available for the whole sample, and compare those results with the actual growth rates of the economy.

How to calculate the conditional forecasts in this example?

  • Update in your excel file all the series you want to incorporate in your conditioning information set, e.g. realizations of surveys and financial data in our example, but it could also be done with assumptions regarding the evolution of those variables.
  • Refresh your data (remember how to do it)
  • Click on the green arrow of the processing tab (estimation icon ComplileArrow) to re-run the model with the refreshed data. This action triggers a run of the kalman smoother alone without re-estimating the model, since the estimation options are, by default, unchecked after refreshing.

Results

For each one of the two "models", we have copied the fitted values corresponding to GDP growth from the OUTPUT tab (inside Estimation branch/ FIT/ "Signals vs Data", we copy the "Signal" displayed in the graph). Interestingly, those fitted values are available for the the period of time we want to predict conditioning on the Surveys and Financial blocks (conditional forecast).

It turns out that the model with the highest likelihood is also the one that whose conditional expectations for growth are most comparable to the reality.

Scenario

  • Understanding the results

Can we decompose the difference between the unconditional forecast and the one conditional on the realization of surveys and financial variables. Such decomposition is needed to understand what are the indicators helping us to have a good estimation of the GDP growth rates.

Because of the dependence of all indicators on five common factors, understanding what are the most relevant variables in the conditioning information set is not straightforward. Luckily, such decomposition is given by the news analysis that will be introduced later on.

  • Which model yields the best forecasts?

We have compared the conditional forecasts of both models over the Great Recession, and concluded by looking at a simple graph that one of the models is clearly superior. However, a more systematic evaluation of the forecasts would be required (real-time simulations ) if we aim to answer questions such as:

  • Which model produces better GDP forecasts two months before the official release?
  • How does forecasting uncertainty decrease when more and more information enters the information set?
  • etc.