-
Notifications
You must be signed in to change notification settings - Fork 11
Estimation Examples
##Example 1: simple model comparisons## In the section Estimation we have discussed an example where the estimation results of a model could depend on whether we use the EM algorithm or numerical optimization. We can actually conclude, based on the vintage of data available at the middle of November of 1999, that the second alternative provides the model with a higher likelihood.
![LikelihoodEvolution](https://github.com/nbbrd/jdemetra-nowcasting/wiki/images/LikelihoodEvolution.png)Here, we will compare the resulting output of both approaches. In both cases we are dealing with exactly the same model specification, but it can be interpreted as if there were two different models. Before tackling the subject of Real-Time Simulations which can be used to assess which of the alternatives yields better forecasts, we can already have a general overview of the two alternative maximum-likelihood results.
###Which method provides us with a better in-sample fit for GDP growth?###
All the results available in the output tab can be compared across models using the graphical tools available in JDemetra*+* (e.g. Tools>Container>Chart). Note that the data underlying all the graphs can also be copied and pasted to Excel, so users have some flexibility to analyse the results and perform exercises such as the one described here. We first open the two models in different tabs, and then we compare actual GDP growth with the signal extracted using the the two alternative estimation methods. As shown in the video, OUTPUT
tab contains these results inside the Estimation branch/FIT
/ "Signals vs Data". The "signal" data or the fitted values displayed in the graph are copied onto the chart and are renamed:
As mentioned earlier, Actual GDP growth is plotted at the middle of the quarter, so we have the wrong impression that it leads the signal, which is plotted for every month, and represents a weighted average of the factors. Still, it is very clear from the picture that the numerical procedure results on a signal that accounts for high frequency fluctuations of GDP growth, while the signal obtained with numerical optimization is smoother. The standard deviations of the difference between the actual data and the signal (i.e. the residual) are equal to 0.23 and 0.12, respectively. Those residuals can be further analysed by clicking on the branch "residuals" inside FIT
.
It is left as an exercise to look at "Signal vs Data" in FIT
to verify that in this example the model based on the EM algorithm obtains factors that account for a large proportion of the variance of GDP growth and survey data, while the model based on numerical optimization tuns out to yields a better fit for oil prices at the cost of accounting for a smaller fraction of GDP growth, as highlighted above.
As more and more data enters the model, both estimation procedures turn out to yield exactly the same parameter estimates, so both models turn out to be equivalent. However, that information was not revealed to us in 1999 with the hypothetical information set that we are using to estimate this model. In general, one should be aware of the following issues:
- We know that a good in-sample fit for GDP growth does not guarantee that the model will work out-of-sample.
- Two alternative parameter estimates may be translated into alternative correlation patterns in the data, which may imply two different ways to read the economy.
##Example 2: Conditional forecasts ##
The two alternative parameter estimates corresponding to the same model (see example above) can imply alternative ways to represent the data. The exercise described below serves as an introduction to the two topics that will be discussed later on: Reading News and Real-Time Simulations
This problem of nowcasting in the presence of timely information or leading indicators is from a computational point of view identical to forecasting conditional on a certain information set containing the future evolution of certain variables. The example described below is simple, but it can inspire more interesting exercises beyond the scope of nowcasting. At this stage, there are only two concepts that need to be defined:
- Unconditional forecast: E[GDP(t+h) | Info available in October 1999 ]
- Conditional forecast: E[GDP(t+h) | Info available in October 1999 + Surveys and Financial data until T ].
The concept of conditional forecast can be understood as follows. Suppose that GDP and all hard data stops being published from 1999 onwards and we have to obtain a simple estimate of growth for the euro area on the basis of Surveys and financial data. Can we count on both models to extract the growth signal during the Great recession? In practice, we need to calculate the expected growth rate conditional on the surveys and financial variables available for the whole sample, and compare those results with the actual growth rates of the economy.
###How to calculate the conditional forecasts in this example?###
- Update in your excel file all the series you want to incorporate in your conditioning information set, e.g. realizations of surveys and financial data in our example, but it could also be done with assumptions regarding the evolution of those variables.
- Refresh your data (remember how to do it)
- Click on the green arrow of the processing tab (
estimation
icon ) to re-run the model with the refreshed data. This action triggers a run of the kalman smoother alone without re-estimating the model, since the estimation options are, by default, unchecked after refreshing.
###Results###
For each one of the two "models", we have copied the fitted values corresponding to GDP growth from the OUTPUT
tab (inside Estimation branch/ FIT
/ "Signals vs Data", we copy the "Signal" displayed in the graph). Interestingly, those fitted values are available for the the period of time we want to predict conditioning on the Surveys and Financial blocks (conditional forecast).
It turns out that the model with the highest likelihood is also the one that whose conditional expectations for growth are most comparable to the reality.
Can we decompose the difference between the unconditional forecast and the one conditional on the realization of surveys and financial variables. Such decomposition is needed to understand what are the indicators helping us to have a good estimation of the GDP growth rates.
Because of the dependence of all indicators on five common factors, understanding what are the most relevant variables in the conditioning information set is not straightforward. Luckily, such decomposition is given by the news analysis that will be introduced later on.
We have compared the conditional forecasts of both models over the Great Recession, and concluded by looking at a simple graph that one of the models is clearly superior. However, a more systematic evaluation of the forecasts would be required (real-time simulations ) if we aim to answer questions such as:
- Which model produces better GDP forecasts two months before the official release?
- How does forecasting uncertainty decrease when more and more information enters the information set?
- etc.