GTAPinGAMS: The Dataset and Static Model -- 3.5 Batch Files

Go backward to 3.4 Dataset Contents and File Formats
Go up to 3 Practicalities

3.5 Batch Files

Most economic modelers using the GTAP database will want to build their own model, making decisions about the structure of technology, preferences and policy parameters. The tools provided here are intended to simply facilitate the use of GTAP data. A modeler would typically run these programs once to produce a dataset as part of a modelling exercise.

All of the dataset aggregation and recalibration tools provided here are packaged in DOS batch files. The command files include:

BUILD
Unpack the GTAP V4 distribution data into GAMS-readable formats, and generate a filtered version of the full dataset suitable for large-scale computation. The "filtering" step rounds all values in the dataset to the nearest 100,000 $; and all tax rates are rounded to the nearest percent.
GTAPAGGR
Aggregate a larger GTAP dataset into a smaller GTAP dataset.
IMPOSE
Generate a new dataset by imposing an exogenous set of tax rates on an existing GTAP dataset. This permits adjustment of tariffs, export taxes, sales taxes and factor taxes.
CHKDATA
Read a dataset and check benchmark consistency, producing an echoprint of base year GDP and trade shares.
ZIP2HAR
A utility routine to move GTAPinGAMS datasets between GAMS ZIP and GEMPACK header-array formats.³²

3.5.1 BUILD.BAT

This program is typically run once to generate a GAMS-readable dataset from the original GEMPACK distribution file GSDDAT.HAR. This begins by translating the full GTAP dataset into GAMS-readable format (GTAPV4.ZIP). This is done using the GEMPACK utility SEEHAR.EXE, a small Fortran program REWRITE.EXE and a GAMS program SCALE.GMS. The last of these programs scales trade and production data from billions of dollars to tens of billions of dollars.

The next step in the translation is to "filter" the GTAPV4 dataset removing all very small coefficients, extreme tax rates and various other inconsistencies. The default filter tolerance is 0.001 (one tenth of one percent), defined in FILTER.GMS. I use this tolerance to name the filtered dataset GTAP4001. When using GTAP version 4 data, I would normally aggregate using the GTAP4001 dataset as a source. The filtering process improves numerical robustness in large-scale models while introducing very small changes in the results. If you are working with a highly aggregate model, however, it should be possible to aggregate directly from the unfiltered dataset GTAPV4.

Specific steps in this program are as follows:

Round tax rates to the nearest percentage, and round all values to the nearest $100,000.
Remove small intermediate inputs in production and trade. All of the following filtering steps involve the scalar parameter TOLERANCE which equals 0.001 in the central application.
Filter the Trade Matrix
Eliminate any imports of a good into a region where the total value of imports is less than TOLERANCE times the combined value of import and domestic demand.
Define the MAGNITUDE of a trade flow as maximum of the ratio of the trade flow net of tax to the associated aggregate export level or the trade flow gross of tax to the associated aggregate import level.
Drop all trade flows which have MAGNITUDE less than TOLERANCE.
Rescale remaining trade flows to maintain consistent values of aggregate imports and aggregate transport cost.
Filter the Production Matrices
Define the MAGNITUDE of an intermediate input as the maximum of the ratio of the input value gross of tax to total cost, or the ratio of the input value net of tax to total domestic supply.
Define the MAGNITUDE of a factor input as the ratio of the factor payment gross of tax to the value of output gross of tax.
Drop all intermediate inputs and factor inputs which have MAGNITUDE less than TOLERANCE.
Even with a very small TOLERANCE (0.1%), the filtering just described generates a substantial reduction in the number of nonzeros:
```
 
PARAMETER DENSITY       summary of changes in matrix density
 
                   BEFORE       AFTER
        TRADE      53.074      43.357
        PROD       81.942      46.824
 
```
Filter Final Demand
Finally, we do the same thing with final demand (private and public), filtering both imports and domestic demand. We also filter inputs to the international transport activity. This removes all tiny coefficients from the dataset.
Recalibrate the Dataset
The foregoing assignments represent a large number of small changes to the model data, and it is certain that we have introduced some inconsistencies which show up as violations of the profit and market clearance conditions defined in chkeq. For this reason, at this point we use a modified least-squares procedure to restore consistency, holding the international trade matrices fixed and recalibrating each of the regional economic flows.
This is the step where it is very helpful to use a complementarity formulation and the PATH solver, as the solution is extremely difficult with MINOS or CONOPT due to the large number of accumlated superbasics. I have include model definitions here for an equivalent nonlinear programming approach, but I have not included this as a standard feature because I have found the NLP codes to be somewhat unreliable. If you own an NLP solver but don't have PATH, then it will be necessary to convert the SOLVE statements from MCP to NLP. If this proves difficult, contact GTAP and we can arrange for you to get a copy of GTAP4001.ZIP, the filtered dataset.
Ferris and Rutherford [1998] present details of how we have set up constraints and objective function which are interesting but not essential to understanding the program. The key point is that at this point we have changed some of the base year value flows to reinstate equilibrium, holding all tax rates fixed.
Reconstruct the input-output flows.
For energy-related analysis, I find it helpful to maintain a process-oriented representation of the oil sector. For this purpose, I have included code which routes all crude oil flows in each region through the refined oil sector. This involves some careful programming to assure that tax payments and all base year transactions remains constant.

Table 18: Files Referenced by BUILD.BAT

Inputs:
        ..\gtapdata\gsddat.har
        ..\defines\gtapv4.set
        ..\defines\gtap4001.set
 
Outputs:
 
        ..\data\gtapv4.zip
        ..\data\gtap4001.zip

3.5.2 GTAPAGGR.BAT

Once you have built the initial GTAPinGAMS dataset GTAP4001 (or GTAPV4), you can begin to think about a particular application and which aggregations of the original GTAP data would be appropriate for studying those issues. I typically create two aggregations for any new model, one with a minimal number of regions and commodities and another with a larger number of dimensions. I use the small aggregation for model development and bring out the larger dataset whenever I am confident that the model is running reliably and producing sensible results.

The GTAPAGGR.BAT program is used to aggregate a GTAPinGAMS dataset. A command line argument defines the name of the target aggregation. You only need to provide the batch file with the target because the target's mapping file defines the source. Before running GTAPAGGR.BAT, you must create two files, one defining the sets of commodities, regions and primary factors in the target dataset, and another defining the name of the source dataset and a correspondence between elements of the source and target. The aggregation routine produces a brief report of GDP and trade shares in the new dataset. This is written to a file in the build directory.

Table 19: Files Referenced by GTAPAGGR.BAT

Inputs:
 
        Command line argument: target
 
        ..\defines\%target%.set
        ..\defines\%target%.map       (defines source)
        ..\data\%source%.zip
 
Output:
 
        ..\data\%target%.zip
        ..\build\%target%.ech

The SET and MAP files for a new dataset are GAMS-readable files located in the defines subdirectory.

Table 20 a sample set file defining dataset DOEMACRO. The file defines the sets of goods, regions, and primary factors which are in the model. Commodity CGD, the investment-savings composite, must be included in every aggregation:

Table 20: Set Definitions for a GTAPinGAMS Aggregation DOEMACRO

$TITLE   An Aggregation of the DOE Dataset
 
SET   I   Sectors/
  Y     Aggregate output
  COL   Coal
  OIL   Petroleum and coal products (refined)
  CRU   Crude oil
  GAS   Natural gas
  ELE   Electricity
  CGD   Savings good /;
 
SET R Aggregated Regions /
  USA   United States
  JPN   Japan
  EUR   Europe
  OOE   Other OECD
  CHN   China
  FSU   Former Soviet Union
  CEA   Central European Associates
  ASI   Other Asia
  MPC   Mexico plus OPEC
  ROW   Other countries /;
 
SET   F   Factors of production /
  LAB   Labor,
  CAP   Capital /;

Table 21 presents the associated mapping file, DOEMACRO.MAP. The file provides a definition of the source dataset together with mapping definitions for commodities and factors. When no mapping is defined for the set of regions, the aggregation routine retains the same set as in the source data.

Table 21: Mapping Definitions for DOEMACRO

$SETGLOBAL source doe
 
*       -------------------------------------------------------------------
*       The target dataset has fewer sectors, so we need to specify how
*       each sector in the source dataset is mapped to a sector in the
*       target dataset:
 
SET   MAPI   Sectors/
        MTL.Y   Metals-related industry (IRONSTL & NONFERR)
        EIS.Y   Other energy intensive (CHEMICAL & PAPERPRO)
        MFR.Y   Other manufactures
        SER.Y   Other Services
        COL.COL Coal
        OIL.OIL Petroleum and coal products (refined)
        CRU.CRU Crude oil
        GAS.GAS Natural gas
        ELE.ELE Electricity
        CGD.CGD Savings good /;
 
*       The following statements illustrate how to aggregate
*       factors of production in the model.  Unlike the aggregation
*       of sectors or regions, you need to declare the set of
*       primary in the source as set FF, then you can specify the
*       mapping from the source to the target sets.
 
*       The reason for this special treatment is to permit the
*       aggregation program to operate with both GTAP version 4 and
*       GTAP version 3 data.  Sorry for the inconvenience!  TFR
 
set ff /LND,SKL,LAB,CAP,RES/;
SET MAPF mapping of primary factors /LND.CAP,SKL.LAB,LAB.LAB,CAP.CAP,RES.CAP/;
 
*       NB: There is no need to specify a MAPR array for generating
*       DOEMACRO from the DOE dataset.  This implies that the
*       source and target datasets have the identical sets of
*       regions, so there is no need to specify an set named MAPR.
*       The aggregation routine will automatically assign a
*       one-to-one mapping from the source to the target regions.

Here are a couple of exercises which could help a new user learn about the error messages returned by GTAPAGGR: (i) Comment out the line with MFR.Y mapping and run GTAPAGGR. (You will get an error message indicating the MFR has not been mapped). (ii) Change the COL.COL mapping to COL.OIL and run GTAPAGGR. You will get an error message indicating that sector COL in the target dataset has no sector mapped to it.

3.5.3 IMPOSE.BAT

This program is used principally to create a new dataset by imposing a new set of benchmark rates on an existing GTAP dataset. Two command line arguments define the target and source datasets. The source dataset must be in the DATA subdirectory, and a file defining benchmark tax rates for the target dataset is specified in the DEFINES (see Table 22). This program also generates a summary echo-print of trade and GDP shares for the new dataset and places this file in the BUILD subdirectory.

When you write the definitions file for adjusting tax rates, bear in mind that a gross basis tax (TY) is defined as a percentage of the gross-of-tax price, hence these tax rates have a maximum value of 100% and no minimum. A net basis tax, such as TF, TP, TG, TX or TM is defined as a percentage of the net-of-tax price, hence these tax rates have no maximum value and a minimum value of -100%.

Table 22: Benchmark Tax Definitions File: NOTAX.DEF

*       Set up a benchmark equilibrium in which we eliminate all domestic taxes:
 
ty(i,r) = 0;
tp(i,r) = 0;
tf(f,i,r) = 0;
tg(i,r) = 0;
ti(i,j,r) = 0;

Table 23: Files Referenced by IMPOSE.BAT

Inputs:
 
        Command line arguments: source target
 
        defines\%source%.set
        defines\%source%.map
        defines\%target%.def
        data\%source%.zip
 
Outputs:
        defines\%target%.set
        defines\%target%.map
        data\%target%.zip
        build\%target%.ech

3.5.4 ZIP2HAR.BAT

This utility routine reads a GTAP dataset in GAMS-ZIP format and writes the data in a self-extracting compressed header array format.

Table 24: Files Referenced by ZIP2HAR.BAT

Inputs:
        Command line argument: dataset
        data\%dataset%.zip
 
Outputs:
        data\%dataset%.har

October 23, 1998