Understanding U.S. Electric System Operating Data #254

dt-woods · 2024-08-20T21:31:16Z

dt-woods
Aug 20, 2024
Collaborator

The question has come up as to whether to use the public API for EBA.zip (https://www.eia.gov/opendata/bulk/EBA.zip) or the EIA's API. The cause of this concern is that different vintages of EBA.zip have influenced the ElectricityLCI's baselines for the same year (e.g., 2022). For data that are about two years old, it is concerning that updates to EBA.zip should exhibit variance. So, let's look at this dataset.

I procured three versions of EBA.zip cached at different times (i.e., 12 March 2024, 14 August 2024, and 19 August 2024) to see what changes in 2022 data between them.

The zip file contains a single text document, EBA.txt
The text document has a large file size (>2 GB)
The document is in JSON format, but fails JSON decoding out-of-the box (non-whitespace character)
The 'description' metadata for each item in the JSON notes that hourly dates/times are in ISO8601 format
- yyyymmddThh-zz
- where 'yyyy' is four digit year
- 'mm' is the two-digit zero-padded month
- 'dd' is the two-digit zero-padded day
- 'T' is the time separator
- 'hh' is the two-digit hour
- '-zz' is the time zone hourly adjustment (e.g., -7 is seven hours behind Zulu time, or Pacific daylight savings time)
- Note that March 12, 2024 'H' format includes the Zulu (20220312T07Z), which is dropped in later version (e.g., 20220312T07)
- Pandas has an inference method that seems to be able to pick up these nuances.
Each line has a set of keys, the 'data' key includes a list of lists, each sublist is of length two (timestamp, value)
- 'series_id',
- 'name',
- 'units',
- 'f': 'H' for hourly UTC and 'HL' for hourly local time
- 'description',
- 'start',
- 'end',
- 'last_updated',
- 'geoset_id', (retired key in August 2024)
- 'data'
The last several rows (~540) have a different set of keys (e.g., in March 2024)
- 'category_id',
- 'parent_category_id',
- 'name',
- 'notes',
- 'childseries'
The data of interest are those with a 'series_id' and an 'f' of value 'H'
The number of rows changes by vintage
- March 12, 2024: 2,871 (537 non-series rows; 2,334)
- August 14, 2024: 2,342
- August 19, 2024: 2,341

dt-woods · 2024-08-20T21:32:13Z

dt-woods
Aug 20, 2024
Collaborator Author

Start investigating what type(s) of data we have.

import json
import os
from zipfile import ZipFile

data_dir = os.path.join(os.path.expanduser("~"), "Workspace", "bulk_data")
mar12_data = os.path.join(data_dir, "2024-03-12", "EBA.zip")
aug14_data = os.path.join(data_dir, "2024-08-14", "EBA.zip")
aug19_data = os.path.join(data_dir, "2024-08-19", "EBA.zip")


def to_date(d_str):
    datetime = None
    try:
        datetime = pd.to_datetime(
            d_str,
            utc=True,
            format='%Y%m%dT%HZ'
        )
    except ValueError:
        try:
            datetime = pd.to_datetime(
                d_str+":00",
                format='%Y%m%dT%H%z'
            )
        except ValueError:
            try:
                datetime = pd.to_datetime(
                    d_str,
                    format='%Y%m%dT%H'
            )
            except ValueError:
                try:
                    datetime = pd.to_datetime(d_str)
                except:
                    print("Failed")
                else:
                    print("Last ditch effort")
            else:
                print("Alt UTC")
        else:
            print("Local")
    else:
        print("UTC")

    return datetime


def read_line(zf, ln=1):
    """Read a specific line from a zip file."""
    data = {}
    z = ZipFile(zf, 'r')
    fn = None
    fn = None
    # Assumes only one file is zipped (e.g., EBA.txt)
    for tmp in z.namelist():
        fn = tmp
    if fn:
        with z.open(fn) as f:
            for i in range(0, ln, 1):
                line = f.readline()
            data = json.loads(line)

    return data


def read_zip(zf):
    data = {
        'ng': 0,
        'idh': 0,
        'd': 0,
        'junk': []
        }
    z = ZipFile(zf, 'r')
    fn = None
    # Assumes only one file is zipped (e.g., EBA.txt)
    for tmp in z.namelist():
        fn = tmp
    if fn:
        with z.open(fn) as f:
            for line in f:
                try:
                    f_json = json.loads(line)
                except:
                    print("failed on line '%s'" % line.decode('utf-8')[0:255])
                else:
                    # Data of interest have a 'series_id'
                    if 'series_id' in f_json.keys():
                        # Filter for UTC hourly
                        if f_json.get('f', 'HL') == 'H':
                            series_id = f_json.get("series_id", None)
                            if series_id:
                                series_parts = series_id.split(".")
                                # Note that if len is 5, then it's
                                # fuel specific
                                if len(series_parts) == 4:
                                    d = series_parts[2]
                                    # net generation
                                    if d == 'NG':
                                        data['ng'] += 1
                                    # Interchange (BA-to-BA)
                                    elif d == 'ID':
                                        data['idh'] += 1
                                    # Demand
                                    elif d == 'D':
                                        data['d'] +=1
                                    else:
                                        data['junk'].append(series_id)
                                else:
                                    data['junk'].append(series_id)
    z.close()
    return data


data_list = [mar12_data, aug14_data, aug19_data]

for data in data_list:
    yfn = ":".join(data.split("/")[-2:])
    print(yfn, os.path.isfile(data))

my_json = read_line(aug14_data, 1) # read any line from EBA.zip
d = read_zip(aug14_data)

4 replies

dt-woods Aug 20, 2024
Collaborator Author

August 14, 2024:

{'ng': 82, 'idh': 371, 'd': 154}

dt-woods Aug 20, 2024
Collaborator Author

August 19, 2024

{'ng': 82, 'idh': 371, 'd': 154}

dt-woods Aug 20, 2024
Collaborator Author

March 12, 2024

{'ng': 82, 'idh': 371, 'd': 151}

dt-woods Aug 21, 2024
Collaborator Author

The "junk" series IDs include five-parameter codes (e.g., all the fuel-specific net-generations, "EBA.CISO-ALL.NG.OIL.H" and "EBA.TEN-ALL.NG.NUC.H") and a few I don't recognize including "TI" and "DF" ("EBA.NWMT-ALL.TI.H" and "EBA.AECI-ALL.DF.H").

dt-woods · 2024-08-21T13:36:30Z

dt-woods
Aug 21, 2024
Collaborator Author

BTW: Why are we using the same UTC time zone for across all balancing authorities?

1 reply

dt-woods Aug 22, 2024
Collaborator Author

Deep in the comments of eLCI, it looks like different authorities were reporting in local, UTC, but not both. Seems like it was a path to get as many authorities as possible. Since the advent of the API, it seems like all data are available in all timezones (suspected but not confirmed).

Note that if you use the daily data on EIA's API, the time zone options are the five associated with the U.S. (i.e., eastern, central, mountain, Arizona, and pacific), which will give you 5x the data you were expecting if you don't pick one.

dt-woods · 2024-08-21T15:54:37Z

dt-woods
Aug 21, 2024
Collaborator Author

Convert df_net_gen into a summation of BAs.

file_name = "net_gen_2024-08-19.csv"
my_str = "BA_Code,Annual_NG\n"
for col in df_net_gen.columns:
    my_str += "%s,%0.1f\n" % (col, df_net_gen[col].sum())
with open(file_name, 'w') as f:
    f.write(my_str)

6 replies

dt-woods Aug 21, 2024
Collaborator Author

Table: Annual Net Generation (ANG) from three versions of EBA.zip

	BA_Code	ANG_Mar12	ANG_Aug14	ANG_Aug19	ANG_API
0	AEC	0	1.13065e+07	0	0
1	AECI	2.45942e+07	2.27466e+07	0	2.45987e+07
2	AVA	8.49792e+06	1.13065e+07	0	8.4967e+06
3	AVRN	5.69802e+06	1.13065e+07	1.67756e+07	5.70135e+06
4	AZPS	2.13066e+07	1.13065e+07	1.67756e+07	2.131e+07
5	BANC	1.38157e+07	1.13065e+07	0	1.05135e+07
6	BPAT	9.82562e+07	1.13065e+07	0	9.82281e+07
7	CEN	0	0	0	0
8	CFE	0	0	0	0
9	CHPD	9.47112e+06	1.13065e+07	0	9.46825e+06
10	CISO	1.70072e+08	1.13065e+07	0	1.70133e+08
11	CPLE	6.24745e+07	1.13065e+07	0	6.26292e+07
12	CPLW	4.34461e+06	1.13065e+07	0	4.34636e+06
13	DEAA	2.71965e+06	1.13065e+07	0	2.71788e+06
14	DOPD	4.48925e+06	1.13065e+07	0	4.48788e+06
15	DUK	1.09993e+08	1.13065e+07	0	1.09999e+08
16	EEI	0	1.13065e+07	0	0
17	EPE	4.78144e+06	1.13065e+07	0	4.78156e+06
18	ERCO	4.29051e+08	1.13065e+07	0	4.2903e+08
19	FMPP	1.84257e+07	1.13065e+07	0	1.84572e+07
20	FPC	4.54867e+07	1.13065e+07	0	4.54853e+07
21	FPL	1.32988e+08	2.27466e+07	0	1.40573e+08
22	GCPD	1.03584e+07	1.13065e+07	0	1.03512e+07
23	GLHB	1.82119e+06	1.13065e+07	0	1.82149e+06
24	GRID	1.46711e+07	1.13065e+07	0	1.46677e+07
25	GRIF	2.95292e+06	1.13065e+07	0	2.95468e+06
26	GRMA	0	1.13065e+07	0	0
27	GVL	1.95314e+06	2.27466e+07	0	1.95858e+06
28	GWA	571264	2.27466e+07	0	571773
29	HGMA	2.63634e+06	1.13065e+07	0	2.63636e+06
30	HST	204	2.27466e+07	0	204
31	IID	6.71702e+06	2.27466e+07	0	6.71687e+06
32	IPCO	1.12261e+07	1.13065e+07	0	1.12223e+07
33	ISNE	1.00206e+08	1.13065e+07	0	1.00206e+08
34	JEA	9.5172e+06	2.27466e+07	0	9.51662e+06
35	LDWP	1.96298e+07	1.13065e+07	0	1.9628e+07
36	LGEE	3.45312e+07	1.13065e+07	1.67756e+07	3.4532e+07
37	MISO	6.39592e+08	1.13065e+07	0	6.39837e+08
38	NEVP	3.28138e+07	1.13065e+07	0	3.2813e+07
39	NSB	0	2.27466e+07	0	0
40	NWMT	1.77001e+07	1.13065e+07	0	1.77039e+07
41	NYIS	1.24369e+08	1.13065e+07	0	1.24371e+08
42	OVEC	0	1.13065e+07	0	0
43	PACE	5.59495e+07	1.13065e+07	0	5.59504e+07
44	PACW	8.41722e+06	1.13065e+07	0	8.41077e+06
45	PGE	5.6623e+06	2.27466e+07	0	5.6629e+06
46	PJM	8.39653e+08	2.27466e+07	0	8.41954e+08
47	PNM	1.64365e+07	2.27466e+07	0	1.64376e+07
48	PSCO	4.49874e+07	1.13065e+07	0	4.49827e+07
49	PSEI	5.34451e+06	1.13065e+07	0	5.3465e+06
50	SC	1.67756e+07	1.13065e+07	0	1.67791e+07
51	SCEG	2.71427e+07	1.13065e+07	0	2.71425e+07
52	SCL	6.20931e+06	2.27466e+07	0	6.20798e+06
53	SEC	1.13065e+07	2.27466e+07	0	1.13497e+07
54	SEPA	1.52257e+06	1.13065e+07	0	1.52336e+06
55	SOCO	2.44853e+08	1.13065e+07	0	2.44839e+08
56	SPA	5.82877e+06	2.27466e+07	0	5.86083e+06
57	SRP	5.63065e+07	2.27466e+07	0	5.54596e+07
58	SWPP	2.88056e+08	1.13065e+07	0	2.8804e+08
59	TAL	3.05258e+06	2.27466e+07	0	3.05249e+06
60	TEC	2.27466e+07	2.27466e+07	0	2.27434e+07
61	TEPC	9.43182e+06	1.13065e+07	0	9.43643e+06
62	TIDC	1.94595e+06	1.13065e+07	0	1.94746e+06
63	TPWR	2.7652e+06	1.13065e+07	0	2.76408e+06
64	TVA	1.59633e+08	2.27466e+07	0	1.59634e+08
65	WACM	3.70695e+07	1.13065e+07	0	3.70617e+07
66	WALC	9.81858e+06	1.13065e+07	0	9.82932e+06
67	WAUE	0	0	0	0
68	WAUW	587819	1.13065e+07	0	587880
69	WWA	675101	2.27466e+07	0	675836
70	YAD	780152	2.27466e+07	0	780468

dt-woods Aug 21, 2024
Collaborator Author

So, the March 12, 2024 net gen data is correct and the August 14, 2024, is not. No idea where it came from!

dt-woods Aug 21, 2024
Collaborator Author

The March 12 and August 14 interchange data are the same and match the EIA API.

dt-woods Aug 21, 2024
Collaborator Author

Demand is also all over the place with August 2024!

The EIA API for ISNE December 31, 2022:

The August 14, 2024, demand for ISNE December 31, 2022:

The March 12, 2024, demand for ISNE December 31, 2022:

dt-woods Aug 22, 2024
Collaborator Author

^^^ I updated the table of demand results with an implementation of using EIA's API. The differences are likely due to the fact that I had to pick a random timezone for daily results, which I compromised and chose 'Central'. Overall, fairly good comparison with the March 12, 2024, EBA.zip.

dt-woods · 2024-08-21T20:28:47Z

dt-woods
Aug 21, 2024
Collaborator Author

The API Way

Here's a model for querying the EIA API for demand (D), net generation (NG) and balancing authority interchanges (ID):

Updated for daily data requests; shortens the amount of data required to be transferred and shouldn't impact results, which are aggregated in the end.

import requests


def decode_str(bstring):
    """Return a Python string.

    Decodes a byte string.

    Args:
        bstring (bytes): An encoded byte string.

    Returns:
        (str): A Python string.
    """
    if isinstance(bstring, bytes):
        try:
            bstring = bstring.decode("utf-8")
        except:
            bstring = ""
    elif isinstance(bstring, str):
        pass
    else:
        bstring = ""
    return bstring


def get_request(url, url_try=0, max_tries=5):
    """Return a JSON data response from EIA's API.

    Args:
        url (str): The URL in proper syntax.
        url_try (int): Internal counter for URL retries; default is 0
        max_tries (int): When to stop retrying; default is 5

    Returns:
        (dict, int):
            The JSON response and URL try count.
            The JSON dictionary includes keys:

            -   'response' (dict): with keys:

                -   'total' (int): count of records in 'data'
                -   'dateFormat' (str): For example, 'YYYY-MM-DD"T"HH24'
                -   'frequency' (str): For example, 'hourly'
                -   'description' (str): Data description
                -   'data' (list): Dictionaries with keys:

                    -   'period'
                    -   'fromba': for ID only
                    -   'fromba-name': for ID only
                    -   'toba': for ID only
                    -   'toba-name': for ID only
                    -   'respondent': for D and NG only
                    -   'respondent-name': for D and NG only
                    -   'type': for D and NG only
                    -   'type-name': for D and NG only
                    -   'value'
                    -   'value-units'

            -   'request' (dict): Parameters sent to the API
            -   'apiVersion' (str): API version string (e.g., '2.1.7')
            -   'ExcelAddInVersion' (str): AddIn version string (e.g., '2.1.0')
    """
    r_dict = {}
    url_try += 1
    r = requests.get(url)
    r_status = r.status_code
    if r_status == 200:
        r_content = r.content
        try:
            r_dict = r.json()
        except:
            # If at first you, fail...
            r_content = decode_str(r_content)
            r_dict = json.loads(r_content)
    else:
        if url_try < max_tries:
            # Retry
            r_dict, url_try = get_request(url, url_try)
        else:
            print("Requests failed!")

    return (r_dict, url_try)


# Based on eia_utils.py from scenario-modeler
_baseurl = "https://api.eia.gov/v2/"
_sub_domain_h = "electricity/rto/region-data/data/"
_sub_domain_d = "electricity/rto/daily-region-data/data/"
_sub_domain2_h = "electricity/rto/interchange-data/data/"
_sub_domain2_d = "electricity/rto/daily-interchange-data/data/"
_api_key = "" # Use your own key!
_freq = "daily" # or 'local-hourly' or 'daily'
_region_id = "ISNE"
# NOTE: if using 'local-hourly' these times need to be in the timezone format!
# NOTE: the API time filter is based on day (not hour)!
_start = "2022-12-30"
_end = "2022-12-31"

# Correct URL based on frequency (daily vs hourly)
_sub_domain = _sub_domain_h
_sub_domain2 = _sub_domain2_h
if _freq == 'daily':
    _sub_domain = _sub_domain_d
    _sub_domain2 = _sub_domain2_d

# Net generation and demand:
url = (
    f"{_baseurl}{_sub_domain}?api_key={_api_key}&out=json"
    f"&frequency={_freq}"
    f"&start={_start}"
    f"&end={_end}"
    f"&facets[respondent][]={_region_id}"
    f"&facets[type][]=D"
    f"&facets[type][]=NG"
    "&data[]=value"
)
# Interchange
url2 = (
    f"{_baseurl}{_sub_domain2}?api_key={_api_key}&out=json"
    f"&frequency={_freq}"
    f"&start={_start}"
    f"&end={_end}"
    f"&facets[fromba][]={_region_id}"
    "&data[]=value"
)

my_ngd, url_tries = get_request(url)
my_id, url_tries = get_request(url2)

1 reply

dt-woods Aug 22, 2024
Collaborator Author

Lessons learned:

For daily data, make sure you add a facet for timezone and make a best choice (e.g., 'Central') unless you plan to map each region with its own timezone.
For daily data with a timezone, I still can't get unique timestamps: there are always two for each day. The values are identical, so I just implemented another dictionary key-value to make each day a unique entry. Reverse construct the dictionary back into a list of lists to comply with the format of EBA.zip

m-jamieson · 2024-08-22T13:48:04Z

m-jamieson
Aug 22, 2024
Maintainer

Great work! So what I summarize is that it's currently hard to trust the EBA.zip. The biggest concern with using the API is making sure we're getting the right timezone? Why do we need to pick timezones. From what I can tell from the above all data in the API is reported as UTC?

4 replies

dt-woods Aug 22, 2024
Collaborator Author

The timezone will change the first/last day of the month's results on the daily scale, which when aggregated to annual values should only see differences in results from the first and last day of the year. It's hardcoded as "Central" time. It would require mapping regions to their best-suited timezones if you wanted "better" results, but some BA areas span multiple zones, so which to pick? I abandoned that route in favor of the, "just choose one and go" option.

m-jamieson Aug 22, 2024
Maintainer

I mean we're talking like a potential error in 10 of the ~8760 hours without any timezone adjustment. I think with your adjustment that goes down to ~5? I think it's fine.

We'll just know that if ever want to use this data for another purpose (like evaluating daily trading on Jan 1 and Dec 31), we'll need a more refined approach.

dt-woods Aug 22, 2024
Collaborator Author

The API calls take a few minutes to complete (especially since the BA interchange has 250k records). Not sure whether we should cache the results locally (e.g., a proxy EBA.zip) to avoid redundant API calls.

m-jamieson Aug 22, 2024
Maintainer

I would think a cache makes sense. Given the timeframe, this data shouldn't change for any given year. Well maybe data > 1 year old. I could see some corrections being made now to 2023 data for example. 2022 would be "final" I think.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding U.S. Electric System Operating Data #254

{{title}}

Replies: 5 comments 16 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Understanding U.S. Electric System Operating Data #254

dt-woods Aug 20, 2024 Collaborator

Replies: 5 comments · 16 replies

dt-woods Aug 20, 2024 Collaborator Author

dt-woods Aug 20, 2024 Collaborator Author

dt-woods Aug 20, 2024 Collaborator Author

dt-woods Aug 20, 2024 Collaborator Author

dt-woods Aug 21, 2024 Collaborator Author

dt-woods Aug 21, 2024 Collaborator Author

dt-woods Aug 22, 2024 Collaborator Author

dt-woods Aug 21, 2024 Collaborator Author

dt-woods Aug 21, 2024 Collaborator Author

dt-woods Aug 21, 2024 Collaborator Author

dt-woods Aug 21, 2024 Collaborator Author

dt-woods Aug 21, 2024 Collaborator Author

dt-woods Aug 22, 2024 Collaborator Author

dt-woods Aug 21, 2024 Collaborator Author

The API Way

dt-woods Aug 22, 2024 Collaborator Author

m-jamieson Aug 22, 2024 Maintainer

dt-woods Aug 22, 2024 Collaborator Author

m-jamieson Aug 22, 2024 Maintainer

dt-woods Aug 22, 2024 Collaborator Author

m-jamieson Aug 22, 2024 Maintainer

dt-woods
Aug 20, 2024
Collaborator

Replies: 5 comments 16 replies

dt-woods
Aug 20, 2024
Collaborator Author

dt-woods Aug 20, 2024
Collaborator Author

dt-woods Aug 20, 2024
Collaborator Author

dt-woods Aug 20, 2024
Collaborator Author

dt-woods Aug 21, 2024
Collaborator Author

dt-woods
Aug 21, 2024
Collaborator Author

dt-woods Aug 22, 2024
Collaborator Author

dt-woods
Aug 21, 2024
Collaborator Author

dt-woods Aug 21, 2024
Collaborator Author

dt-woods Aug 21, 2024
Collaborator Author

dt-woods Aug 21, 2024
Collaborator Author

dt-woods Aug 21, 2024
Collaborator Author

dt-woods Aug 22, 2024
Collaborator Author

dt-woods
Aug 21, 2024
Collaborator Author

dt-woods Aug 22, 2024
Collaborator Author

m-jamieson
Aug 22, 2024
Maintainer

dt-woods Aug 22, 2024
Collaborator Author

m-jamieson Aug 22, 2024
Maintainer

dt-woods Aug 22, 2024
Collaborator Author

m-jamieson Aug 22, 2024
Maintainer