Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why virtuoso rolls forward with batch size transactions replayed when I restart the virtuoso. #1311

Open
col-ji opened this issue Aug 19, 2024 · 7 comments

Comments

@col-ji
Copy link

col-ji commented Aug 19, 2024

I had a program to run 10 threads of rdf_loader_run(). When they are done, I run CHECKPOINT to commit everything. After finishing the program, I did unix command to stop and start virtuoso, the virtuoso.log showed the following information which I don't understand.

12:57:42 OpenLink Virtuoso Universal Server
12:57:42 Version 07.20.3239-pthreads for Linux as of Jun 12 2024 (d698f2171)
12:57:42 uses OpenSSL 1.1.1k  FIPS 25 Mar 2021
12:57:42 uses parts of PCRE, Html Tidy
12:57:47 Database version 3126
12:57:47 SQL Optimizer enabled (max 1000 layouts)
12:57:48 Compiler unit is timed at 0.000099 msec
12:57:55 Roll forward started
12:57:59     1000 transactions, 5612580 bytes replayed (0 %)
12:58:01     2000 transactions, 11186713 bytes replayed (0 %)

If I did CHECKPOINT in my program, why the log still shows transactions message? The login is not working during roll forward with transactions. How can I avoid that?

Thanks

@TallTed
Copy link
Collaborator

TallTed commented Aug 19, 2024

Were any log entries produced following the load program run, preceding the Unix stop/start? I would expect these to include the checkpoint(), including any errors that may have occurred to prevent it.

@col-ji
Copy link
Author

col-ji commented Aug 19, 2024

Here is the detailed information of my loading program and information of stop/start afterwards in virtuoso.log. BTW, the loading is a pure batch load from the scratch.

As you can see, there are 10 disable checkpoint/Scheduler events messages on 20:21 since I start the 10 threads of rdf_loader_run(). And around 23:39:04pm, the checkpoint started. And no error afterwards.
The next day morning, I did three times of stop and start (systemctl stop/start virtuoso). The first two attempts (time between 09:20:30 and 09:21:03; 09:21:03 and 09:22:37) are okays and login back to virtuoso is pretty fast.

                Sun Aug 18 2024
20:06:23 OpenLink Virtuoso Universal Server
20:06:23 Version 07.20.3239-pthreads for Linux as of Jun 12 2024 (d698f2171)
20:06:23 uses OpenSSL 1.1.1k  FIPS 25 Mar 2021
20:06:23 uses parts of PCRE, Html Tidy
20:06:28 SQL Optimizer enabled (max 1000 layouts)
20:06:28 Compiler unit is timed at 0.000253 msec
20:06:30 Checkpoint started
20:06:30 Checkpoint finished.
20:06:30 Roll forward started
20:06:30 Roll forward complete
20:06:30 Checkpoint started
20:06:30 Checkpoint finished, log reused
20:06:32 HTTP/WebDAV server online at 127.0.0.1:8890
20:06:32 Server online at localhost:1111 (pid 3405189)
20:07:29 Incorrect login for dba from IP [127.0.0.1]
20:09:49 Incorrect login for dba from IP [127.0.0.1]
20:09:49 Error calling DB.DBA.DBEV_LOGIN [42000] : U0002: The object "nagios_test" does not exist.
20:09:49 HTTP server online at 130.14.25.158:8090
20:09:49 HTTP server online at localhost:8090
20:21:03 Checkpoint is disabled.
20:21:03 Scheduler events are disabled.
20:21:03 PL LOG: Loader started
20:21:03 Checkpoint is disabled.
20:21:03 Scheduler events are disabled.
20:21:03 PL LOG: Loader started
20:21:03 Checkpoint is disabled.
20:21:03 Scheduler events are disabled.
20:21:03 PL LOG: Loader started
20:21:03 Checkpoint is disabled.
20:21:03 Scheduler events are disabled.
20:21:03 PL LOG: Loader started
20:21:03 Checkpoint is disabled.
20:21:03 Scheduler events are disabled.
20:21:03 PL LOG: Loader started
20:21:03 Checkpoint is disabled.
20:21:03 Scheduler events are disabled.
20:21:03 PL LOG: Loader started
20:21:03 Checkpoint is disabled.
20:21:03 Scheduler events are disabled.
20:21:03 PL LOG: Loader started
20:21:03 Checkpoint is disabled.
20:21:03 Scheduler events are disabled.
20:21:03 PL LOG: Loader started
20:21:03 Checkpoint is disabled.
20:21:03 Scheduler events are disabled.
20:21:03 PL LOG: Loader started
20:21:03 Checkpoint is disabled.
20:21:03 Scheduler events are disabled.
20:21:03 PL LOG: Loader started

20:21:05 * Monitor: Locks are held for a long time
20:21:24 * Monitor: System is under high load. Adding cluster nodes or using more replicated copies may needed
20:23:24 * Monitor: System is under high load. Adding cluster nodes or using more replicated copies may needed
20:25:26 * Monitor: System is under high load. Adding cluster nodes or using more replicated copies may needed
20:27:28 * Monitor: System is under high load. Adding cluster nodes or using more replicated copies may needed
20:29:28 * Monitor: System is under high load. Adding cluster nodes or using more replicated copies may needed
20:31:28 * Monitor: System is under high load. Adding cluster nodes or using more replicated copies may needed
20:33:32 * Monitor: System is under high load. Adding cluster nodes or using more replicated copies may needed
20:33:48 Write wait on column page 3947391.  Waits should be on the index leaf page, except when col page is held for read by background write
…

23:37:54 PL LOG: No more files to load. Loader has finished,
23:37:58 PL LOG: No more files to load. Loader has finished,
23:37:59 PL LOG: No more files to load. Loader has finished,
23:38:00 PL LOG: No more files to load. Loader has finished,
23:38:03 PL LOG: No more files to load. Loader has finished,
23:38:03 PL LOG: No more files to load. Loader has finished,
23:38:03 PL LOG: No more files to load. Loader has finished,
23:38:07 PL LOG: No more files to load. Loader has finished,
23:38:26 PL LOG: No more files to load. Loader has finished,
23:38:33 PL LOG: No more files to load. Loader has finished,
23:39:04 Checkpoint started
23:41:33 Checkpoint finished, log reused

-- Next day, I did the 1st attempt of stop and start, login back to virtuoso works after restart. Please note there is only " 47 transactions, 24931 bytes replayed (100 %)" at 09:20:59

                Mon Aug 19 2024
09:20:30 Server received signal 15
09:20:30 Initiating quick shutdown
09:20:30 Server shutdown complete

                Mon Aug 19 2024
09:20:47 OpenLink Virtuoso Universal Server
09:20:47 Version 07.20.3239-pthreads for Linux as of Jun 12 2024 (d698f2171)
09:20:47 uses OpenSSL 1.1.1k  FIPS 25 Mar 2021
09:20:47 uses parts of PCRE, Html Tidy
09:20:51 Database version 3126
09:20:52 Unlinked the temp db file /export/home/SSD/VIRTUOSO/database/virtuoso-temp.db as its size (5170MB) was greater than TempDBSize INI (10MB)
09:20:52 SQL Optimizer enabled (max 1000 layouts)
09:20:53 Compiler unit is timed at 0.000096 msec
09:20:59 Roll forward started
09:20:59     47 transactions, 24931 bytes replayed (100 %)
09:20:59 Roll forward complete
09:21:01 Checkpoint started
09:21:01 Checkpoint finished, log reused
09:21:03 HTTP/WebDAV server online at 127.0.0.1:8890
09:21:03 HTTP server online at 130.14.25.158:8090
09:21:03 HTTP server online at localhost:8090

-- 2nd time stop and start, login works after restart. "3 transactions, 185 bytes replayed (100 %)" at 09:22:34

09:21:03 Server online at localhost:1111 (pid 502608)
09:22:17 Server received signal 15
09:22:17 Initiating quick shutdown
09:22:17 Server shutdown complete
                Mon Aug 19 2024
09:22:23 OpenLink Virtuoso Universal Server
09:22:23 Version 07.20.3239-pthreads for Linux as of Jun 12 2024 (d698f2171)
09:22:23 uses OpenSSL 1.1.1k  FIPS 25 Mar 2021
09:22:23 uses parts of PCRE, Html Tidy
09:22:28 Database version 3126
09:22:28 SQL Optimizer enabled (max 1000 layouts)
09:22:29 Compiler unit is timed at 0.000106 msec
09:22:34 Roll forward started
09:22:34     3 transactions, 185 bytes replayed (100 %)
09:22:34 Roll forward complete
09:22:35 Checkpoint started
09:22:35 Checkpoint finished, log reused
09:22:37 HTTP/WebDAV server online at 127.0.0.1:8890
09:22:37 HTTP server online at 130.14.25.158:8090
09:22:37 HTTP server online at localhost:8090
09:22:37 Server online at localhost:1111 (pid 508233)
10:37:49 Server received signal 15

-- 3rd time to stop and restart, can’t login due to transaction replayed.

                Mon Aug 19 2024
10:40:20 OpenLink Virtuoso Universal Server
10:40:20 Version 07.20.3239-pthreads for Linux as of Jun 12 2024 (d698f2171)
10:40:20 uses OpenSSL 1.1.1k  FIPS 25 Mar 2021
10:40:20 uses parts of PCRE, Html Tidy
10:40:25 Database version 3126
10:40:25 SQL Optimizer enabled (max 1000 layouts)
10:40:26 Compiler unit is timed at 0.000098 msec
10:40:32 Roll forward started
10:40:36     1000 transactions, 7419827 bytes replayed (0 %)
10:40:39     2000 transactions, 15430364 bytes replayed (0 %)
10:40:42     3000 transactions, 23417976 bytes replayed (0 %)
10:40:45     4000 transactions, 32823755 bytes replayed (0 %)
10:40:49     5000 transactions, 40135643 bytes replayed (0 %)
10:40:52     6000 transactions, 47783800 bytes replayed (0 %)
10:40:55     7000 transactions, 57319094 bytes replayed (0 %)
10:40:58     8000 transactions, 66312214 bytes replayed (0 %)

@col-ji
Copy link
Author

col-ji commented Aug 19, 2024

The reason I did stop/start is because rdf_loader_run disables checkpoint_interval and scheduler_interval. Restart helps to pick the default setting from ini file. I could run the checkpoint_interval/scheduler_interval after checkpoints from my program. But it still doesn't explain why virtuoso.log shows the "transaction replayed" after restarting virtuoso.

@HughWilliams
Copy link
Collaborator

HughWilliams commented Aug 20, 2024

Where did you obtain the systemctl stop/start virtuoso scripts? These are performing a quick shutdown of the database, and thus there is no checkpoint to commit any pending transactions to the database before stopping, thus on startup whatever is in the transaction logs has to be replayed, which is what you are seeing.

Also, when the scheduler is enabled on restart the Virtuoso RDF Free text index incremental rebuild will be invoked to index all the text for the previously bulk-loaded data, which would grow the transaction log. Then, your last restart, as a checkpoint had not been run over the hour-plus Virtuoso had been running, you would have a sizeable transaction log, which is then being replayed on startup (after the quick shutdown).

You can add the checkpoint_interval/scheduler_interval reenabling in your program after the checkpoint as recommended in the RDF Bulk Loader documentation, but you still would have replay of any pending transaction log on startup following a quick shutdown of the database.

@col-ji
Copy link
Author

col-ji commented Aug 21, 2024

Okay, here is my practice of doing virtuoso load.

  1. define name space
  2. ld_dir or ld_dir_all to set up date file for the the target graphs
  3. Run 10 threads of rdf_loader_run() to load
  4. Do checkpoints after completing all threads of rdf_loader_run()

After this step, everything should be committed to triple store. Should no more active transaction for triple data afterwards except there are some other activities behind.

My observation of stop and restart is that if make each stop/restart attempt close to each other, then the number of transaction replayed is pretty smaller. 45 transactions and 3 transactions in the first 2 above. But if you make it longer to do stop/start, like the 3rd one, wait for other hour to do so, then the number of transactions is getting bigger.

I also did a test of enabling checkpoint_interval after checkpoint. After enabling checkpoint_interval, I run a query status(), then the query is stuck there forever. Unless you do a stop and restart, the query is becoming normal.

Now my question is that is there a soft stop virtuoso available?

@col-ji
Copy link
Author

col-ji commented Aug 21, 2024

One more question,
I am using the following settings to load 2.1TB with 10 threads of rdf_loader_run()

NumberOfBuffers          = 4000000;  
MaxDirtyBuffers          = 3000000

The loading is pretty fast. However I do see the followings in the virtuoso.log. I am wondering if there is a good way to make cache function efficiently?

09:46:13 missed delete of name id cache pubchem/descriptor/CID156183468_Non-hydrogen_Atom_Count 0 (0x974a089d )
09:46:25 missed delete of name id cache pubchem/descriptor/CID156587758_IUPAC_InChI 0 (0x9795a8c3 )
09:46:40 missed delete of name id cache pubchem/descriptor/CID157179985_TPSA 0 (0x98a0deb5 )
09:46:55 missed delete of name id cache pubchem/compound/CID155180105 0 (0x951bb5e8 )

@HughWilliams
Copy link
Collaborator

HughWilliams commented Aug 21, 2024

The soft stop of Virtuoso you refer to, would be to run the Virtuoso shutdown command after performing the checkpoint, which is recommended. Or if you provided the systemctl stop/start virtuoso scripts being used, we can possibly advise on how they can be made to make a more complete shutdown of the database.

As said previously, when the scheduler is enabled, on restart the Virtuoso RDF Free text index incremental rebuild will be invoked, to index all the text for the previously bulk-loaded data, which would grow the transaction log. Then, your last restart, as a checkpoint had not been run over the hour-plus Virtuoso had been running, you would have a sizeable transaction log (being the index building), which is then being replayed on startup (after the quick shutdown).

The missed delete of name id cache ... messages occur as a result of a hash operation occurring in the database to perform the delete operations indicated, with multiple occurring concurrently and some reported as being missed, but these messages can be ignored as, if not removed on one thread, they would have been removed on another.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants