Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ingest process monitoring seems to be not working #232

Closed
randytpierce opened this issue Aug 29, 2023 · 6 comments · Fixed by #301
Closed

ingest process monitoring seems to be not working #232

randytpierce opened this issue Aug 29, 2023 · 6 comments · Fixed by #301
Assignees
Labels
bug Something isn't working VXingest issues related to the VXingest project

Comments

@randytpierce
Copy link
Contributor

It seems that the ingest process monitoring is broken somehow. I see metrics being generated but they do not look correct to me. I see the node_exporter service running and the textfile collector is specified to scrape the metrics directory i.e. --collector.textfile.directory=/data/common/job_metrics but I don't see any data in the graphana dashboard for ingest processes. This seems important so I'm working on it now.

@randytpierce randytpierce added bug Something isn't working VXingest issues related to the VXingest project labels Aug 29, 2023
@randytpierce randytpierce self-assigned this Aug 29, 2023
@randytpierce
Copy link
Contributor Author

Using the command "sudo journalctl -u node_exporter.service -r"
to look at the node_exporter.service log output I can see errors like the following...

Aug 29 19:02:12 adb-cb1.gsd.esrl.noaa.gov node_exporter[1766]: ts=2023-08-29T19:02:12.413Z caller=textfile.go:219 level=error collector=textfile msg="failed to collect textfile data" file=job_v01_metar_ctc_sum_model_hrrr__rap__130_adb_cb1.prom err="failed to parse textfile data from "/data/common/job_metrics/job_v01_metar_ctc_sum_model_hrrr__rap__130_adb_cb1.prom": text format parsing error in line 8: expected float as value, got """
Aug 29 19:01:57 adb-cb1.gsd.esrl.noaa.gov node_exporter[1766]: ts=2023-08-29T19:01:57.419Z caller=textfile.go:219 level=error collector=textfile msg="failed to collect textfile data" file=job_v01_metar_netcdf_obs_adb_cb1.prom err="failed to parse textfile data from "/data/common/job_metrics/job_v01_metar_netcdf_obs_adb_cb1.prom": text format parsing error in line 8: expected float as value, got """

And these errors appear to be in all of these jobs... job_v01_metar_grib2_model_rap__ops__130_adb_cb1.prom
job_v01_metar_grib2_model_hrrr_adb_cb1.prom
job_v01_metar_ctc_sum_model_hrrr__rap__130_adb_cb1.prom
job_v01_metar_netcdf_obs_adb_cb1.prom
job_v01_metar_grib2_model_rap__ops__130_adb_cb1.prom
job_v01_metar_grib2_model_hrrr_adb_cb1.prom
job_v01_metar_ctc_sum_model_hrrr__rap__130_adb_cb1.prom
job_v01_metar_netcdf_obs_adb_cb1.prom

which is pretty much all of the ingest jobs that matter. So there is a bug in the scraper somewhere.

@randytpierce
Copy link
Contributor Author

editing /data/common/job_metrics/job_v01_metar_ctc_sum_model_hrrr__rap__130_adb_cb1.prom and looking at line 8 shows there is no number at the end of the line 8 job_v01_metar_ctc_sum_model_hrrr__rap__130_adb_cb1{ingest_id="ingest_recorded_record_count",log_file="/data/temp_tar/tmp.Frz81qGaqL/job_v01_metar_ctc_sum_model_hrrr__rap__130-2023-08-29:19:00:02.log"}
which is the recorded record count.
Looking into why that is.

@bonnystrong
Copy link

bonnystrong commented Aug 29, 2023 via email

@randytpierce
Copy link
Contributor Author

randytpierce commented Aug 29, 2023 via email

@randytpierce
Copy link
Contributor Author

This was caused by a bug in the import docs routine. fixed now.

@randytpierce
Copy link
Contributor Author

discovered another problem with this. Some of the scraped fields are being parsed wrongly because we have prepended some fields to the newer log messages. Need to adjust the parsing to match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working VXingest issues related to the VXingest project
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants